Page 234 - Touhpad Ai
P. 234
AT A GLANCE
¥ Data cleaning is the process of fixing messy data by removing duplicates, correcting spelling errors, filling
missing values, and fixing wrong formats.
¥ We learned the steps of data cleaning: identifying problems, fixing the data, formatting it properly, and
finally validating it for correctness.
¥ Using Pandas in Python, we can clean data with functions like dropna(), fillna(), drop_duplicates(),
replace(), and str.strip().
Kaggle is a free online platform with thousands of real-world datasets for practice and analysis.
¥
Data transformation means changing how data looks or is structured—like converting date formats, units,
¥
or replacing words with numbers for easier analysis.
Data standardization is about making data follow a fixed, uniform format so it becomes easy to combine,
¥
compare, and analyse from different sources.
We can standardise values using:
¥
o Z-score normalization (to make data mean = 0),
o Scale adjustment (like converting pounds to kilograms),
o Feature scaling (changing values to range 0–1),
o and text/date formatting (like making “YES” → “Yes”).
Real-life examples of data standardization include:
¥
o Student records having consistent roll numbers and phone formats.
o Product sizes written in full instead of just ‘S’, ‘M’, ‘L’.
o Dates and currency written in the same style across all data.
EXERCISE
Solved Questions
SECTION A (Objective Type Questions)
AI QUIZ
A. Tick (ü) the correct option.
1. Choose the Python library that is commonly used for cleaning data.
a. Matplotlib b. NumPy
c. Pandas d. Seaborn
2. Identify the function that checks for missing values in a DataFrame.
a. isnull() b. dropna()
c. fillna() d. isempty()
232 Touchpad Artificial Intelligence - XI

