Page 144 - Ai_C10_Flipbook
P. 144

Splitting the Training Set Data for Evaluation


              Splitting the training set data is a crucial step in model evaluation, allowing for a systematic assessment of the
              model’s performance by creating distinct datasets for training, validation, and testing. Let’s learn more about
              splitting the training set using the Train-Test split in detail.
              Train-Test Split


              It’s a model evaluation technique that reveals how the model performs on new data. This technique is used in
              machine learning algorithms to evaluate the performance of the model by dividing the dataset into two subsets,
              the Training subset and the Testing subset. The train-test procedure is appropriate when there is a sufficiently
              large dataset available.

              Training subset is used for model training, where it learns patterns from the data. Typically, this subset comprises
              70% to 80% of the dataset. Testing subset is used to evaluate the model's generalisation ability on unseen data. It
              typically consists of 20% to 30% of the dataset.


                                                                               10000 labelled
                                                                               data for image
                                                  Testing set                classification model

                                                  Training set
                                                                 7000 labelled data       3000 labelled data
                                                                  used for training        used for testing


              Need of Train-Test Split


              The training dataset is used to make the model learn how to recognise patterns and relationships in the data. Once
              the model is trained, the test dataset is used to evaluate its performance. The inputs from the test set are given
              to the model, which makes predictions. These predictions are then compared with the actual expected results.
              The goal is to understand how well the model can perform on new, unseen data that wasn’t part of the training
              process. It provides an unbiased estimate of performance of the machine learning model in real world scenarios
              and ensures the model can perform efficiently on the unseen data, rather than on the trained data.

                         Dataset
                                       Training Data
                                                                                   ?



                                     Train The ML
                                      Algorithm
                                                                                                   Successful Model




                                           Model                     Prediction
                                         Input Data



                         Testing
                          Data

                                                    ML Algorithm

                    142     Artificial Intelligence Play (Ver 1.0)-X
   139   140   141   142   143   144   145   146   147   148   149