Page 128 - Artificial Intellegence_v2.0_Class_12
P. 128
raining et
Data of Original
Data)
o rain odel
est et
of Original
Data)
o valuate odel
After raining
he train test split technique can be used to test machine learning algorithms for classification and regression problems.
he technique divides the provided dataset into t o subsets
• he training dataset is used to fine tune the machine learning model and train the algorithm.
• he test data set is a set of data used to impartially evaluate ho ell the final model fits the training data set. he
test data set is sometimes kno n as a holdout data set if the data in it has never been used in training.
R easons f or C hoosing T r ain T est S plit Evaluation
• he goal is to estimate the machine learning model's performance on ne data that as not used to train the model.
his is ho e ant to use the model in the real orld. o put it another ay, e ant to fit it to e isting data ith
kno n inputs and outputs, then generate predictions for fresh cases in the future here e don't kno the e pected
outcome or goal values.
• Another reason to employ the train test split assessment process, other than dataset si e is computational efficiency.
ome models are e tremely e pensive to train, making a repeated evaluation as employed in other techniques,
impossible. Deep neural net ork models are one e ample. he train test approach is idely employed in this situation.
• ometimes, a pro ect may already have a model orking efficiently and large dataset, but still may require an
overvie of model performance quickly. Again, the train test split procedure is selected in this situation.
andom selection is also used to divide samples from the original training dataset into t o subsets. his ensures
that the train and test datasets are re ective of the original dataset. hen the dataset available is small, the
train test procedure is not appropriate. he reason for this is that there ill not be enough data in the training dataset
for the model to learn an appropriate mapping of inputs to outputs. here ill also be insufficient data in the test set to
evaluate the model's performance appropriately.
Configuring the Train Test Split
he si e of the train and test sets is the procedure's key configuration parameter. or either the train or test datasets, this
is usually given as a percentage bet een and . or e ample, a training set ith a si e of . ) means that the
test set ill get the remaining percentage of . ).
here is no such thing as an ideal split percentage. A data scientist determines a split that suits the pro ect's goals
taking into account the follo ing factors
• ost of training the model • he computational cost of assessing the model
• epresentativeness of the training set • epresentativeness of the test set
plit percentages used commonly are otal number of e amples
• rain , est
• rain , est raining et est et
• rain , est
Touchpad Artificial Intelligence (Ver. 2.0)-XII

