Page 382 - AI Ver 3.0 class 10_Flipbook
P. 382
ORANGE Orange ORANGe oRANGE oRaNGE OrangE
orange
Step 5 Stemming
The process of removing the affixes from the words to reduce them to their root words is called Stemming. This
process helps in normalising the text into its root form but the disadvantage is that it works on all the affixes
irrespective whether a base word is a meaningful word or not. Hence it is a faster process.
Word Affixes Stem
jumped -ed jump
jumping -ing jump
jumper -er jump
flies -es fli
flying -ing fly
In stemming, the resulting stemmed words (obtained after removing affixes) may not always be meaningful. For
example, in this case, "jumped," "jumping," and "jumper" were all reduced to "jump," whereas "flies" was shortened
to "fli," which is not a meaningful word. Stemming does not consider whether the stemmed word makes sense; it
simply removes affixes, making the process faster.
Step 6 Lemmatization
This is a process of removing the affixes from the words to create a meaningful root word. The word we get after
removing the affix is called lemma. Since it always focusses on creating a meaningful lemma, the processing time
is longer and better from stemming.
Word Affixes Lemma
jumped -ed jump
jumping -ing jump
jumper -er jump
flies -es fly
flying -ing fly
Removing
Sentence Stopwords, Converting
Segmentation Tokenisation Special text to a Stemming Lemmatization
Characters and common case
Numbers
380 Touchpad Artificial Intelligence (Ver. 3.0)-X

