Page 293 - AI Ver 1.0 Class 10
P. 293

Some examples of stopwords are:


                                          a         an        and        are        as       for


                                                a         is        into        in        if



                                         on         or       such        the      there       to


                 At this stage all the stopwords or special characters like #$%@! or numbers if not needed are removed from the
                 list of tokens to make it easier for the NLP system to focus on the words that are important for data processing.


                 Step 4: Converting Text to a Common Case
                 This is a very important step as we want the same word but different case to be taken as one token so that the
                 program does not become case sensitive. We generally convert the whole content into a lower case to avoid this
                 kind of confusion and sensitivity by the system.


                                     ORANGE      Orange    ORANGe      oRANGE     oRaNGE     OrangE








                                                              orange





                 Step 5: Stemming

                 The process of removing the affixes from the words to get back its base word is called Stemming. This process
                 helps in normalising the text into its root form but the disadvantage is that it works on all the affixes irrespective
                 whether a base word is a meaningful word or not. Hence it is a faster process. For examples:

                 Before stemming some of the base words with affixes are:
                    • increases, reserved, planning, programming, engaging, flier

                 After stemming the base words are:
                    • increas, reserv, plann, programm, engag, fl

                 So, we see that some of the above words after stemming do not make any sense and are not considered as base
                 words.

                                               Word              Affixes             Stem


                                             healed                -ed                heal

                                             healing               -ing               heal






                                                                               Natural Language Processing  291
   288   289   290   291   292   293   294   295   296   297   298