Page 219 - Touhpad Ai
P. 219
'Electronic': 'Electronics',
'Clths': 'Clothes'
})
# Step 7: Convert 'Date_of_Purchase' to standard date format
df['Date_of_Purchase'] = pd.to_datetime(df['Date_of_Purchase'], errors='coerce', dayfirst=
True)
# Final cleaned dataset
print("\n Cleaned Dataset:")
print(df)
Output:
Original Dataset:
Customer_Name Phone_Number City Product_Category Date_of_Purchase
Meera 9876543210 delhi Electronics 2023-01-15
Raj None Delhi Clothes 15/01/2023
Amit 9123456780 mumbai Electronic 2023.01.15
Meera 9876543210 delhi Electronics 2023-01-15
Diya 9988776655 Mumbai Clothes 2023/01/15
Rajat None delhi Clths 15-01-2023
Cleaned Dataset:
Customer_Name Phone_Number City Product_Category Date_of_Purchase
Meera 9876543210 Delhi Electronics 2023-01-15
Raj Not Available Delhi Clothes NaT
Amit 9123456780 Mumbai Electronics NaT
Diya 9988776655 Mumbai Clothes NaT
Rajat Not Available Delhi Clothes NaT
AI REBOOT
Answer the following questions:
1. What is data cleaning and why is it important?
2. List any three common problems found in raw data.
3. What function is used to remove duplicate rows in Pandas?
4. Write the code to fill missing values in a column named Marks with 0.
Introduction to Kaggle
Kaggle is an online platform for data science, machine learning, and AI enthusiasts. It provides a collaborative
environment where users can find and share datasets, explore notebooks (code scripts), participate in competitions,
and learn new data science skills through tutorials and community discussions.
Theoretical and Practical Aspects of Data Processing 217

