Page 228 - Touhpad Ai
P. 228

Data Transformation and Standardisation

              Data transformation  and standardisation  are important  steps  in preparing data  before performing analysis  or
              building models.
              Transformation means changing how the data appears or is organised — for example, adjusting values, formats, or
              structures to make them consistent and meaningful. On the other hand, standardisation means converting data into
              a common scale or format, often by adjusting it.
              These processes help improve the quality of data, ensure consistency, and make data analysis or model building more
              accurate and effective.

              Data Transformation

              Data transformation refers to converting data from one format, structure, or type to another. It helps make the data
              easier to interpret, improves its quality, and allows integration of data from different sources.
              Example:
              u  Changing date formats (like from 12/07/2025 to 07/12/2025)
              u  Converting units (like miles to kilometers)

              u  Turning categorical text into numerical values (like “Yes” to 1, “No” to 0)
              u  Filling missing values using averages or other imputation methods
              u  Aggregating data (for example, calculating yearly sales from monthly sales)
              Let us learn to do the above in Python.

                  Program 26: To change date format

                 import pandas as pd
                 # Sample data

                 df = pd.DataFrame({'Date': ['07/12/2024', '08/12/2024']})
                 # Convert to datetime and change format
                 df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
                 print(df)

                 Output:

                             Date
                 0    2024-12-07
                 1    2024-12-08

                  Program 27: To convert unit from miles to kilometers
                 import pandas as pd

                 # 1 mile = 1.60934 kilometers
                 df = pd.DataFrame({'Miles': [15, 20, 25]})
                 df['Kilometers'] = df['Miles'] * 1.60934
                 print(df)








                 226    Touchpad Artificial Intelligence - XI
   223   224   225   226   227   228   229   230   231   232   233