Page 258 - Ai_C10_Flipbook
P. 258
4. For the given corpus:
Document 1: Amit and Amita are twins.
Document 2: Amit lives with his grandparents in Shimla.
Document 3: Amita lives with her parents in Delhi.
Create a step-by-step Document Vector Table.
Ans. Step 1 Text Normalisation
Document 1: [Amit, and, Amita, are, twins]
Document 2: [Amit, lives, with, his, grandparents, in Shimla]
Document 3: [Amita, lives, with, her, parents, in Delhi]
Step 2 Create Dictionary
Amit and Amita are twins lives
with his grandparents in Shimla her
parents Delhi
Step 3 Create a Document Vector Table.
Amit and Amita are twins lives with his grandparents in Shimla her parents Delhi
1 1 1 1 1 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 1 1 1 1 1 0 0 0
0 0 1 0 0 1 1 0 0 1 0 1 1 1
C. Competency-based/Application-based questions. 21 st Century #Critical Thinking
Skills #Information Literacy
1. You have a dataset with a variety of abbreviations and misspelled words. How would you use text normalisation to
standardise this dataset for further processing?
Ans. To standardise the dataset, text normalisation would involve:
• Correcting misspellings (e.g., "definately" → "definitely").
• Expanding abbreviations (e.g., "btw" → "by the way", "lol" → "laughing out loud").
• Converting to lowercase to ensure consistency.
• Expanding contractions (e.g., "I'm" → "I am"). These actions help reduce variation in the data and improve model
understanding.
2. Consider the following 4 documents in a corpus:
1. Document 1: "I love programming."
2. Document 2: "Programming is fun."
3. Document 3: "I love coding."
4. Document 4: "Coding is awesome."
Prepare the Term Frequency-Inverse Document Frequency (TF-IDF) for the given corpus.
Ans.
Term Frequency
Document i love programming is fun coding awesome
Doc-1 1 1 1 0 0 0 0
Doc-2 0 0 1 1 1 0 0
Doc-3 1 1 0 0 0 1 0
Doc-4 0 0 0 1 0 1 1
256 Artificial Intelligence Play (Ver 1.0)-X

