Page 315 - Touhpad Ai
P. 315
For example, in predicting exam scores, a teacher collects student data, builds a model, and evaluates
its prediction accuracy.
Question 5
30. Perform tokenization on the following statement. Give all steps. (4)
Artificial Intelligence helps machines think and learn
Ans.
Step Action Result
1 Clean text Remove punctuation
2 Split text [“Artificial”, “Intelligence”, “helps”, “machines”,
“think”, “and”, “learn”]
3 Lowercase [“artificial”, “intelligence”, “helps”, “machines”,
“think”, “and”, “learn”]
4 Remove stopwords [“artificial”, “intelligence”, “machines”]
5 Output tokens Used for NLP model
31. Give Python code for the following: (4)
a. To remove duplicates from the following dataframe
data = {
‘Name’: [‘Aarav’, ‘Diya’, ‘Kabir’, ‘Aarav’, ‘Meera’, ‘Kabir’],
‘Age’: [15, 14, 16, 15, 13, 16],
‘City’: [‘Delhi’, ‘Mumbai’, ‘Kolkata’, ‘Delhi’, ‘Chennai’, ‘Kolkata’]
}
Ans. # 1. Import Library
import pandas as pd
data = {
‘Name’: [‘Aarav’, ‘Diya’, ‘Kabir’, ‘Aarav’, ‘Meera’, ‘Kabir’],
‘Age’: [15, 14, 16, 15, 13, 16],
‘City’: [‘Delhi’, ‘Mumbai’, ‘Kolkata’, ‘Delhi’, ‘Chennai’, ‘Kolkata’]
}
df = pd.DataFrame(data)
print(“Original DataFrame:”)
print(df)
b. You are working with a sales dataset where product prices are written in different formats — some rows use
commas (e.g., 1,200), others use spaces (e.g., `1200), and a few even store prices as text (e.g., “`1200”).
What steps would you take to clean and standardize this column for accurate analysis?
Ans. # 2. Remove duplicate rows
df_unique = df.drop_duplicates()
print(“\n✅ DataFrame after removing duplicates:”)
print(df_unique)
32. I would first remove commas, spaces, and currency symbols using the replace() function or
str.replace() in Pandas. (4)
Ans. Then, I would convert all price values to numeric type using astype(float) or pd.to_numeric() to ensure
consistent numerical format. Finally, I would check for missing or invalid entries and fill or remove them
before performing any analysis.
Sample Question Paper 313

