Page 144 - Ai_C10_Flipbook
P. 144
Splitting the Training Set Data for Evaluation
Splitting the training set data is a crucial step in model evaluation, allowing for a systematic assessment of the
model’s performance by creating distinct datasets for training, validation, and testing. Let’s learn more about
splitting the training set using the Train-Test split in detail.
Train-Test Split
It’s a model evaluation technique that reveals how the model performs on new data. This technique is used in
machine learning algorithms to evaluate the performance of the model by dividing the dataset into two subsets,
the Training subset and the Testing subset. The train-test procedure is appropriate when there is a sufficiently
large dataset available.
Training subset is used for model training, where it learns patterns from the data. Typically, this subset comprises
70% to 80% of the dataset. Testing subset is used to evaluate the model's generalisation ability on unseen data. It
typically consists of 20% to 30% of the dataset.
10000 labelled
data for image
Testing set classification model
Training set
7000 labelled data 3000 labelled data
used for training used for testing
Need of Train-Test Split
The training dataset is used to make the model learn how to recognise patterns and relationships in the data. Once
the model is trained, the test dataset is used to evaluate its performance. The inputs from the test set are given
to the model, which makes predictions. These predictions are then compared with the actual expected results.
The goal is to understand how well the model can perform on new, unseen data that wasn’t part of the training
process. It provides an unbiased estimate of performance of the machine learning model in real world scenarios
and ensures the model can perform efficiently on the unseen data, rather than on the trained data.
Dataset
Training Data
?
Train The ML
Algorithm
Successful Model
Model Prediction
Input Data
Testing
Data
ML Algorithm
142 Artificial Intelligence Play (Ver 1.0)-X

