Page 296 - AI Ver 1.0 Class 10
P. 296

The graph to represent the text in all documents in corpus will be:




                                                    Stop words



                                            Occurrence     Frequent words







                                                                            Rare/Valuable
                                                                               words


                                                               Value
              Let us now understand the steps with the help of a given example:

              Here are 3 documents containing one sentence each.

              Document 1: I like oranges.

              Document 2: I also like bananas.
              Document 3: Oranges and Bananas are good for health.



              Step 1: Text Normalisation
              Document 1: [I, like, oranges]

              Document 2: [also, bananas]

              Document 3: [and, are, good, for, health]


              Step 2: Create Dictionary
              Here you will create a list of unique words


                         I                  like               oranges               also               bananas

                       and                  are                 good                  for                health



              Create a Document Vector
              In this step, the list of words from the dictionary is written in the top row. Now, for each word in the document 1,
              if it matches with the vocabulary in the dictionary, put a 1 under it. If the same word appears again, increment the
              previous value by 1. And if the word does not occur in that document, put a 0 under it. For example, document 1
              vector will be:
                   I        like    oranges     also     bananas      and        are      good        for      health

                   1         1         1          0          0          0         0         0          0         0




                        294   Touchpad Artificial Intelligence-X
   291   292   293   294   295   296   297   298   299   300   301