Page 293 - AI Ver 1.0 Class 10
P. 293
Some examples of stopwords are:
a an and are as for
a is into in if
on or such the there to
At this stage all the stopwords or special characters like #$%@! or numbers if not needed are removed from the
list of tokens to make it easier for the NLP system to focus on the words that are important for data processing.
Step 4: Converting Text to a Common Case
This is a very important step as we want the same word but different case to be taken as one token so that the
program does not become case sensitive. We generally convert the whole content into a lower case to avoid this
kind of confusion and sensitivity by the system.
ORANGE Orange ORANGe oRANGE oRaNGE OrangE
orange
Step 5: Stemming
The process of removing the affixes from the words to get back its base word is called Stemming. This process
helps in normalising the text into its root form but the disadvantage is that it works on all the affixes irrespective
whether a base word is a meaningful word or not. Hence it is a faster process. For examples:
Before stemming some of the base words with affixes are:
• increases, reserved, planning, programming, engaging, flier
After stemming the base words are:
• increas, reserv, plann, programm, engag, fl
So, we see that some of the above words after stemming do not make any sense and are not considered as base
words.
Word Affixes Stem
healed -ed heal
healing -ing heal
Natural Language Processing 291

