Page 382 - AI Ver 3.0 class 10_Flipbook
P. 382

ORANGE     Orange     ORANGe     oRANGE     oRaNGE      OrangE






                                                              orange


               Step 5  Stemming
              The process of removing the affixes from the words to reduce them to their root words is called Stemming. This
              process helps in normalising the text into its root form but the disadvantage is that it works on all the affixes
              irrespective whether a base word is a meaningful word or not. Hence it is a faster process.

                                             Word              Affixes             Stem

                                            jumped               -ed               jump

                                            jumping              -ing              jump

                                            jumper               -er               jump

                                              flies              -es                fli

                                             flying              -ing               fly

              In stemming, the resulting stemmed words (obtained after removing affixes) may not always be meaningful. For
              example, in this case, "jumped," "jumping," and "jumper" were all reduced to "jump," whereas "flies" was shortened
              to "fli," which is not a meaningful word. Stemming does not consider whether the stemmed word makes sense; it
              simply removes affixes, making the process faster.
               Step 6  Lemmatization

              This is a process of removing the affixes from the words to create a meaningful root word. The word we get after
              removing the affix is called lemma. Since it always focusses on creating a meaningful lemma, the processing time
              is longer and better from stemming.


                                             Word              Affixes            Lemma

                                            jumped               -ed               jump

                                            jumping              -ing              jump

                                            jumper               -er               jump

                                              flies              -es                fly

                                             flying              -ing               fly




                                                        Removing
                           Sentence                     Stopwords,    Converting
                         Segmentation    Tokenisation    Special       text to a     Stemming      Lemmatization
                                                      Characters and   common case
                                                        Numbers



                    380     Touchpad Artificial Intelligence (Ver. 3.0)-X
   377   378   379   380   381   382   383   384   385   386   387