Page 237 - Touhpad Ai
P. 237
9. What is the role of StandardScaler() in data standardisation?
Ans. StandardScaler() is used to apply Z-score normalization, where it adjusts the data so the mean is 0 and
standard deviation is 1, useful in ML models.
10. Why is it important to clean and standardise data before combining datasets from different sources?
Ans. Without cleaning and standardisation, different formats, spellings, or missing values can cause errors
when merging datasets. Clean and consistent data ensures smooth integration, correct comparisons,
and more accurate results in AI models and analysis.
B. Long answer type questions.
1. What is data cleaning? Describe the steps involved in cleaning data with examples.
Ans. Data cleaning means fixing problems in raw data such as missing values, duplicates, or spelling mistakes.
The steps include identifying issues, fixing or removing wrong data, formatting it properly, and validating
the final output. For example, we may remove rows with missing names, correct misspelled cities like
"Delhii", and ensure dates follow the same format.
2. Describe the steps to work with a dataset from Kaggle using Pandas.
Ans. First, the dataset is downloaded from Kaggle. Then it is loaded using pd.read_csv(). We can use head(),
info(), and describe() to explore the data. Cleaning is done using dropna(), fillna(), and standardisation
methods like .str.strip() and to_datetime().
3. What is data transformation? Explain it with two examples.
Ans. Data transformation changes data into a suitable format for analysis. For example, converting “Yes” to 1
and “No” to 0 is a transformation. Another example is converting weight from pounds to kilograms using
a formula in Pandas.
4. Explain the difference between data cleaning and data transformation.
Ans. Data cleaning fixes wrong or missing data, like removing duplicates or correcting typos. Data
transformation changes the format or structure of the data, such as converting units or reformatting
dates. Both steps are essential before analysis or modelling.
5. Explain Z-score normalization and Min-Max scaling. When is each used?
Ans. Z-score normalization converts data to a mean of 0 and standard deviation of 1, useful for machine
learning models. Min-Max scaling changes values to a fixed range (usually 0–1), useful for comparing
features with different scales.
21 st
C. Competency-based questions: HOTS Century #Experiential Learning
Skills
1. You are given a dataset of students’ exam scores. Some rows have blank values in the “City” column and
duplicate entries. Describe the steps you would take to clean this dataset using Pandas.
Ans. First, I would use drop_duplicates() to remove repeated entries. Then, I would check for blanks using
isnull() and fill them using fillna() with a default city name or remove the rows using dropna(). This would
make the data cleaner for analysis.
2. A school wants to analyse student attendance data from three branches. The formats of roll numbers, names,
and dates differ across files. How will you make this data usable?
Ans. I would standardise roll numbers to a format like “STU2025_001”, use .str.title() to format names,
and apply pd.to_datetime() for dates. This ensures that all three datasets follow a common structure,
enabling smooth merging and comparison.
Theoretical and Practical Aspects of Data Processing 235

