Page 219 - Touhpad Ai
P. 219

'Electronic': 'Electronics',
                      'Clths': 'Clothes'
                  })
                  # Step 7: Convert 'Date_of_Purchase' to standard date format
                  df['Date_of_Purchase'] = pd.to_datetime(df['Date_of_Purchase'], errors='coerce', dayfirst=
                  True)
                  # Final cleaned dataset
                  print("\n Cleaned Dataset:")
                  print(df)
                  Output:
                  Original Dataset:

                  Customer_Name         Phone_Number        City           Product_Category        Date_of_Purchase
                  Meera                 9876543210          delhi          Electronics             2023-01-15
                  Raj                   None                Delhi          Clothes                 15/01/2023
                  Amit                  9123456780          mumbai         Electronic              2023.01.15
                  Meera                 9876543210          delhi          Electronics             2023-01-15
                  Diya                  9988776655          Mumbai         Clothes                 2023/01/15
                  Rajat                 None                delhi          Clths                   15-01-2023
                  Cleaned Dataset:

                  Customer_Name         Phone_Number        City           Product_Category        Date_of_Purchase
                  Meera                 9876543210          Delhi          Electronics             2023-01-15
                  Raj                   Not Available       Delhi          Clothes                 NaT
                  Amit                  9123456780          Mumbai         Electronics             NaT
                  Diya                  9988776655          Mumbai         Clothes                 NaT
                  Rajat                 Not Available       Delhi          Clothes                 NaT

                     AI REBOOT


                    Answer the following questions:
                    1.  What is data cleaning and why is it important?

                    2.  List any three common problems found in raw data.


                    3.  What function is used to remove duplicate rows in Pandas?

                    4.  Write the code to fill missing values in a column named Marks with 0.






                 Introduction to Kaggle

                 Kaggle is an online platform for data science, machine learning, and AI enthusiasts. It provides a collaborative
                 environment where users can find and share datasets, explore notebooks (code scripts), participate in competitions,
                 and learn new data science skills through tutorials and community discussions.

                                                                      Theoretical and Practical Aspects of Data Processing  217
   214   215   216   217   218   219   220   221   222   223   224