Page 308 - AI Ver 3.0 Class 11
P. 308

•  Outliers (extreme values): Outliers are data points that deviate significantly from most of the dataset, typically
                   due to errors or uncommon occurrences. Managing outliers includes detecting and excluding them, transforming
                   the data, or applying robust statistical techniques to minimise their influence.
                 •  Inconsistent Data: Inconsistent data, such as typographical errors or variations in data types, is rectified to
                   ensure uniformity and coherence across the dataset.
                 •  Duplicate Data: Duplicate data is identified and eliminated to maintain data integrity and accuracy.



                                          2.   Data Transformation: This process involves converting data into a format suitable
                                              for analysis. Common techniques include normalisation, standardisation, and
                                              discretisation. Normalisation scales the data to a common range, standardisation
                                              adjusts the data to have a zero mean and unit variance, and discretisation converts
                                              continuous data into discrete categories. Existing features may also be adjusted as
                            15%
                                              necessary.
                             25%
                               15%
                                45%


              3.  Data Reduction: This process decreases the data volume, making analysis
                 easier while yielding the same or nearly the same results. It also helps to save
                 storage space. Common data reduction techniques include dimensionality
                 reduction (reducing the number of features in a dataset) and data compression.








                                             4.  Data Integration and Normalisation: Data from multiple sources or formats
                                                is combined or aggregated (data is presented in the form of a summary).
                                                Subsequently, the data is normalised to ensure uniform scale and distribution
                                                across all features, enhancing the effectiveness of machine learning models.
                                                Data integration is a key component of data management.








              5.   Feature Selection: This step involves choosing a subset of important features
                 from the dataset. Feature selection is commonly done to eliminate irrelevant
                 or redundant features from the dataset.




              Data in Modelling and Evaluation
              Once data preprocessing is complete, it’s divided into two sets: the Training data and the Testing data.

                                                                Data




                                                    Training 70%                    Testing 30%


                    306     Touchpad Artificial Intelligence (Ver. 3.0)-XI
   303   304   305   306   307   308   309   310   311   312   313