Page 230 - Touhpad Ai
P. 230

df = pd.DataFrame(data)
                 # Group by year and sum sales

                 yearly_sales = df.groupby('Year')['Sales'].sum().reset_index()
                 print(yearly_sales)
                 Output:
                        Year      Sales

                 0      2023        500
                 1      2024        520

              Data Standardisation
              Data standardisation is the process of converting data into a consistent and uniform format so that it becomes easier
              to combine, analyse, and share. When we collect data from various sources (like websites, apps, files, etc.), it often
              appears in different formats. To make effective use of it, we must standardise it. This process follows fixed rules (like
              how dates are displayed, how numbers are measured, and how names are written) to remove confusion and improve
              data quality.

              Key Features of Data Standardisation
              Some key features of data standardisation are:


                            Consistent Format              Clear Definitions            Improved Accuracy




                                            Better Integration         Easy & Accurate Analysis


              The description of these key features of data standardisation is as follows:

              u  Consistent format: Ensures data from different sources follows the same structure.Example: Some dates might
                  appear as 12-07-2025, others as 2025/07/12. Standardisation converts all into one format, such as DD-MM-YYYY.

              u  Clear definitions: Each data element has a clear and consistent meaning. Example: The “Price” column always
                  represents the amount in Indian Rupees, not mixed with other currencies.
              u  Improved accuracy: Removes mismatches and inconsistencies, improving reliability. Example: Converting “Yes”,
                  “YES”, “Y”, and “yes” into a single value — “Yes”.
              u  Better integration: Makes it easier to merge data from multiple sources. Example: School records from Delhi and
                  Mumbai can be combined easily if both follow the same standard.
              u  Easy and accurate analysis: Clean and standardised data supports accurate graphs, reports, and predictions.

              Methods of Data Standardisation
              When data is inconsistent or follows different formats, it can lead to incorrect analysis and poor decisions-making.
              To avoid this, it's important to follow a common format or standard for all data. Following a uniform format enables
              analysts to compare, combine, and study data more efficiently and accurately. We can standardise data using the
              following methods:
              Z-Score Normalization (Standardisation)
              Z-score normalization converts data so that the mean becomes 0 and the standard deviation becomes 1. This helps
              compare values measured on different scales.




                 228    Touchpad Artificial Intelligence - XI
   225   226   227   228   229   230   231   232   233   234   235