Page 217 - Touhpad Ai
P. 217

Program 25: To replace wrong values in a DataFrame with correct values

                    import pandas as pd
                    # Sample dataset with incorrect gender entries
                    data = {
                      'ID': [1, 2, 3],
                      'Name': ['Arjun', 'Priya', 'Nalini'],
                      'Gender': ['Male', 'Feemale', Feemale]  # "Feemale" is a typo error
                    }


                    # Create DataFrame
                    df = pd.DataFrame(data)


                    # Show the original DataFrame
                    print("Original DataFrame:")
                    print(df)


                    # Correct spelling mistakes in the 'Gender' column using replace()
                    df['Gender'] = df['Gender'].replace('Feemale', 'Female')


                    # Show the updated DataFrame
                    print("\nUpdated DataFrame:")
                    print(df)
                   Output:
                    Original DataFrame:

                          ID      Name     Gender
                    0      1     Arjun        Male
                    1      2     Priya  Feemale
                    2      3    Nalini  Feemale
                    Updated DataFrame:
                          ID      Name     Gender
                    0      1     Arjun        Male
                    1      2     Priya     Female
                    2      3    Nalini     Female

                                                                                             21 st
                       VIDEO SESSION                                                        Century   #Experiential Learning
                                                                                             Skills
                      Scan the QR code or visit the following link to watch the video:
                      Real World Data Cleaning in Python Pandas (Step By Step)
                      https://www.youtube.com/watch?v=iaZQF8SLHJs&t=1s
                      After watching the video, answer the following questions:

                      •  What were the most common problems found in the raw dataset used in the video?

                      •  How did cleaning the data help improve the quality or usefulness of the dataset?





                                                                      Theoretical and Practical Aspects of Data Processing  215
   212   213   214   215   216   217   218   219   220   221   222