Page 213 - Touhpad Ai
P. 213
Pandas provides built-in functions to:
u Remove duplicate rows
u Fill or drop missing values
u Fix wrong data formats
u Trim extra spaces
u Replace incorrect values
Using these techniques, data scientists prepare data that is accurate and organised, which helps in building better
AI models.
Removing Duplicate Values
Duplicate rows can cause confusion in analysis. Pandas makes it easy to remove them.
Program 21: To removing duplicate values from DataFrame
import pandas as pd
data = {'Name': ['Aman', 'Riya', 'Aman'], 'Age': [17, 18, 17]}
df = pd.DataFrame(data)
print("Original Data")
print(df)
df_cleaned = df.drop_duplicates()
print("Cleaned Data (after removing duplicate values)")
print(df_cleaned)
Output:
Original Data
Name Age
0 Aman 17
1 Riya 18
2 Aman 17
Cleaned Data (after removing duplicate values)
Name Age
0 Aman 17
1 Riya 18
Handling Missing Values
Missing values can be found as NaN (Not a Number) in the dataset. We can do any of the following to handle
missing values:
u Check for missing values: Use df.isnull() function
u Remove rows with missing values: Use df.dropna() function
Program 22: To handle missing values in a DataFrame
import pandas as pd
# Create sample data with missing values
data = {
'Name': ['Aman', 'Riya', 'Karan', 'Sia', None],
'Age': [17, None, 16, 18, 17]
}
df = pd.DataFrame(data)
Theoretical and Practical Aspects of Data Processing 211

