Page 315 - Touhpad Ai
P. 315

For example, in predicting exam scores, a teacher collects student data, builds a model, and evaluates
                             its prediction accuracy.
                 Question 5
                    30.  Perform tokenization on the following statement. Give all steps.                             (4)
                         Artificial Intelligence helps machines think and learn

                        Ans.
                                 Step      Action                                    Result
                                   1       Clean text             Remove punctuation
                                  2        Split text             [“Artificial”, “Intelligence”, “helps”,  “machines”,
                                                                  “think”, “and”, “learn”]
                                  3        Lowercase              [“artificial”, “intelligence”, “helps”, “machines”,
                                                                  “think”, “and”, “learn”]
                                  4        Remove stopwords       [“artificial”, “intelligence”, “machines”]
                                  5        Output tokens          Used for NLP model

                     31.  Give Python code for the following:                                                          (4)
                         a.   To remove duplicates from the following dataframe
                            data = {
                                ‘Name’: [‘Aarav’, ‘Diya’, ‘Kabir’, ‘Aarav’, ‘Meera’, ‘Kabir’],
                                ‘Age’: [15, 14, 16, 15, 13, 16],
                                ‘City’: [‘Delhi’, ‘Mumbai’, ‘Kolkata’, ‘Delhi’, ‘Chennai’, ‘Kolkata’]
                            }
                            Ans.  # 1. Import Library
                                  import pandas as pd
                                  data = {
                                      ‘Name’: [‘Aarav’, ‘Diya’, ‘Kabir’, ‘Aarav’, ‘Meera’, ‘Kabir’],
                                      ‘Age’: [15, 14, 16, 15, 13, 16],
                                      ‘City’: [‘Delhi’, ‘Mumbai’, ‘Kolkata’, ‘Delhi’, ‘Chennai’, ‘Kolkata’]
                                  }
                                  df = pd.DataFrame(data)
                                  print(“Original DataFrame:”)
                                  print(df)
                         b.     You are working with a sales dataset where product prices are written in different formats — some rows use
                            commas (e.g., 1,200), others use spaces (e.g., `1200), and a few even store prices as text (e.g., “`1200”).
                            What steps would you take to clean and standardize this column for accurate analysis?
                            Ans.  # 2. Remove duplicate rows
                                  df_unique = df.drop_duplicates()

                                      print(“\n✅ DataFrame after removing duplicates:”)
                                      print(df_unique)
                    32.  I would first remove commas, spaces, and currency symbols using the replace() function or
                        str.replace() in Pandas.                                                                      (4)
                        Ans.  Then, I would convert all price values to numeric type using astype(float) or pd.to_numeric() to ensure
                             consistent numerical format. Finally, I would check for missing or invalid entries and fill or remove them
                             before performing any analysis.


                                                                                             Sample Question Paper  313
   310   311   312   313   314   315   316   317   318   319   320