Page 222 - Touhpad Ai
P. 222

Merging or Joining DataFrames
              Following functions are used to merge or join DataFrames:

                                Function                                           Description
               pd.merge(df1, df2, on='common_column')      Merges two DataFrames using a common column (similar to
                                                           SQL joins).
               df1.join(df2, on='common_column', how='left')  Joins two DataFrames, keeping all records from the left DataFrame.

              Data Cleaning and Preprocessing

              Data cleaning ensures that your dataset is accurate, consistent, and ready for analysis. It involves handling missing
              data, duplicates, incorrect formats, and outliers.

              Handling Missing Values
              Following functions help to handle missing values in the dataset:
                            Function                                           Description

               df.isnull().sum()                   Checks how many missing (NaN) values are present in each column.
               df.dropna()                         Removes rows containing missing values.
               df.fillna(value)                    Fills missing values with a specified value (mean, median, mode, etc.).

              Handling Duplicates
              Following functions help to handle duplicate values from the dataset:

                                    Function                                           Description
               df.duplicated().sum()                               Checks the total number of duplicate rows.
               df.drop_duplicates()                                Removes all duplicate rows from the DataFrame.

              Data Type Conversion
              Following functions help ensure correct data types for analysis:

                                    Function                                           Description
               pd.to_numeric()                                     Converts values to numeric type.
               pd.to_datetime()                                    Converts values to date/time type.
               pd.Categorical()                                    Converts values to categorical type.

              String Manipulation
              Following functions are used for cleaning and transforming text data:
                                    Function                                           Description

               df['column'].str.lower()                            Converts text to lowercase.
               df['column'].str.strip()                            Removes extra spaces from strings.
               df['column'].str.replace(old, new)                  Replaces text within strings.

              Outlier Handling
              Outliers are extreme values that can distort analysis. They can be identified and managed using:
              u  Box plots, Z-scores, or Interquartile Range (IQR).
              u  Remove extreme outliers or cap them at threshold values.






                 220    Touchpad Artificial Intelligence - XI
   217   218   219   220   221   222   223   224   225   226   227