Page 258 - Ai_C10_Flipbook
P. 258

4.  For the given corpus:
                    Document 1: Amit and Amita are twins.
                    Document 2: Amit lives with his grandparents in Shimla.
                    Document 3: Amita lives with her parents in Delhi.
                     Create a step-by-step Document Vector Table.
                Ans.  Step 1  Text Normalisation
                    Document 1: [Amit, and, Amita, are, twins]
                    Document 2: [Amit, lives, with, his, grandparents, in Shimla]
                    Document 3: [Amita, lives, with, her, parents, in Delhi]
                     Step 2  Create Dictionary
                           Amit            and             Amita            are             twins            lives

                           with             his         grandparents         in            Shimla            her
                          parents          Delhi

                     Step 3  Create a Document Vector Table.
                      Amit  and   Amita   are  twins  lives  with  his   grandparents  in  Shimla  her  parents  Delhi
                       1     1      1      1     1      0     0     0        0        0     0      0      0       0
                       1     0      0      0     0      1     1     1        1        1     1      0      0       0
                       0     0      1      0     0      1     1     0        0        1     0      1      1       1

              C.  Competency-based/Application-based questions.                             21 st  Century   #Critical Thinking
                                                                                                Skills  #Information Literacy
                  1.  You have a dataset with a variety of abbreviations and misspelled words. How would you use text normalisation to
                    standardise this dataset for further processing?
                Ans.  To standardise the dataset, text normalisation would involve:
                    •  Correcting misspellings (e.g., "definately" → "definitely").

                    •  Expanding abbreviations (e.g., "btw" → "by the way", "lol" → "laughing out loud").
                    •  Converting to lowercase to ensure consistency.
                    •  Expanding contractions (e.g., "I'm" → "I am"). These actions help reduce variation in the data and improve model
                       understanding.

                  2.  Consider the following 4 documents in a corpus:
                    1.  Document 1: "I love programming."
                    2.  Document 2: "Programming is fun."
                    3.  Document 3: "I love coding."
                    4.  Document 4: "Coding is awesome."
                    Prepare the Term Frequency-Inverse Document Frequency (TF-IDF) for the given corpus.

                Ans.
                                                           Term Frequency

                       Document         i        love     programming       is         fun      coding     awesome
                         Doc-1         1          1            1            0           0          0           0
                         Doc-2         0          0            1            1           1          0           0
                         Doc-3         1          1            0            0           0          1           0

                         Doc-4         0          0            0            1           0          1           1


                    256     Artificial Intelligence Play (Ver 1.0)-X
   253   254   255   256   257   258   259   260   261   262   263