Page 309 - Ai_V3.0_c11_flipbook
P. 309

The Training data is utilised to teach machine learning models, while the Testing data assesses how well the trained
                 models perform. During modelling, suitable machine learning algorithms are selected based on the problem type
                 (e.g., classification, regression, clustering) and dataset characteristics.

                                                    Training data vs. Testing data

                     Feature                   Training data                                Testing data
                                 Training data is a learning phase. The more   Testing Data is used to check the performance of
                  Purpose        training data the model has, the better it can   the model.
                                 make predictions.

                                 The model learns from the training data to  The  testing  data  is  not  exposed  to  the  model
                  Exposure
                                 make accurate predictions.                 before evaluation. Testing data is the new data.

                                 The distribution of the training data should be
                  Distribution   like the distribution of the real-world data that   The distribution of the testing data may be
                                 the model will be used in.                 entirely different from the real-world data.
                                                                            The size of the testing data is smaller than the
                                 The training data is larger in size as the model
                  Size           needs to analyse and observe the patterns for   training data because it is used to evaluate the
                                 making accurate predictions.               performance of the model that has been trained
                                                                            on the training data.


                 Various techniques like train-test split, cross-validation, and error analysis are employed to gauge the model’s
                 generalisation ability and pinpoint areas for enhancement. In the Train Test Split technique, dataset is divided into
                 two sets: Training and Testing. It trains the model with the training data and assesses its performance using the
                 testing data. Cross Validation ensures consistent model performance across different data subsets. You will study
                 these in detail in class XII.
                 Different evaluation techniques are applied depending on the data type. For classification problems, metrics such as
                 accuracy, precision, recall, F1-score, and ROC curve are commonly used. For regression tasks, metrics like Mean Squared
                 Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared are frequently used.
                 In today’s era, having proficiency in handling data is crucial. With the rise of Artificial Intelligence, understanding data
                 allows us to leverage information effectively. It’s akin to have a map for navigating a large city; being adept with data
                 empowers us to make informed decisions and utilise technology wisely.


                           At a Glance


                       •  Data literacy involves the ability to find and use data proficiently.
                       •  Data can be structured, semi-structured, or unstructured.

                       •  AI data analysis employs AI techniques and Data Science to enhance the processes of cleaning, inspecting,
                       and modelling over both structured and unstructured data.
                       •  Data collection means gathering data from many sources, both offline and online.

                       •  Primary and secondary are the two main sources from where data is collected.
                       •  Primary data is obtained directly from the source and has not been previously published or analysed by
                       others.
                       •  Secondary data can be obtained from research articles, books, reports, and internet databases.

                       •  The method used to measure a collection of data is known as the level of measurement.

                                                                   Data Literacy—Data Collection to Data Analysis  307
   304   305   306   307   308   309   310   311   312   313   314