Page 296 - AI Ver 1.0 Class 10
P. 296
The graph to represent the text in all documents in corpus will be:
Stop words
Occurrence Frequent words
Rare/Valuable
words
Value
Let us now understand the steps with the help of a given example:
Here are 3 documents containing one sentence each.
Document 1: I like oranges.
Document 2: I also like bananas.
Document 3: Oranges and Bananas are good for health.
Step 1: Text Normalisation
Document 1: [I, like, oranges]
Document 2: [also, bananas]
Document 3: [and, are, good, for, health]
Step 2: Create Dictionary
Here you will create a list of unique words
I like oranges also bananas
and are good for health
Create a Document Vector
In this step, the list of words from the dictionary is written in the top row. Now, for each word in the document 1,
if it matches with the vocabulary in the dictionary, put a 1 under it. If the same word appears again, increment the
previous value by 1. And if the word does not occur in that document, put a 0 under it. For example, document 1
vector will be:
I like oranges also bananas and are good for health
1 1 1 0 0 0 0 0 0 0
294 Touchpad Artificial Intelligence-X

