Page 128 - Artificial Intellegence_v2.0_Class_12
P. 128

raining  et
                              Data                                                   of Original
                                                                                    Data)
                                                                                 o  rain  odel

                                                          est  et
                                                          of Original
                                                          Data)
                                                     o  valuate  odel
                                                      After  raining

         he train test split technique can be used to test machine learning algorithms for classification and regression problems.
         he technique divides the provided dataset into t o subsets
        •    he training dataset is used to fine tune the machine learning model and train the algorithm.
        •    he test data set is a set of data used to impartially evaluate ho   ell the final model fits the training data set.  he
            test data set is sometimes kno n as a holdout data set if the data in it has never been used in training.


        R easons f or  C hoosing  T r ain T est S plit Evaluation

        •     he goal is to estimate the machine learning model's performance on ne  data that  as not used to train the model.
             his is ho   e  ant to use the model in the real  orld.  o put it another  ay,  e  ant to fit it to e isting data  ith
            kno n inputs and outputs, then generate predictions for fresh cases in the future  here  e don't kno  the e pected
            outcome or goal values.

        •    Another reason to employ the train test split assessment process, other than dataset si e is computational efficiency.
             ome  models  are  e tremely  e pensive  to  train,  making  a  repeated  evaluation  as  employed  in  other  techniques,
            impossible. Deep neural net ork models are one e ample.  he train test approach is  idely employed in this situation.
        •        ometimes, a pro ect may already have a model  orking efficiently and large dataset, but still may require an
            overvie  of model performance quickly. Again, the train test split procedure is selected in this situation.
         andom  selection  is  also  used  to  divide  samples  from  the  original  training  dataset  into  t o  subsets.   his  ensures
        that  the  train  and  test  datasets  are  re ective  of  the  original  dataset.   hen  the  dataset  available  is  small,  the
        train test procedure is not appropriate.  he reason for this is that there  ill not be enough data in the training dataset
        for the model to learn an appropriate mapping of inputs to outputs.  here  ill also be insufficient data in the test set to
        evaluate the model's performance appropriately.

        Configuring the Train Test Split
         he si e of the train and test sets is the procedure's key configuration parameter.  or either the train or test datasets, this
        is usually given as a percentage bet een   and  .  or e ample, a training set  ith a si e of  .       ) means that the
        test set  ill get the remaining percentage of  .       ).
         here is no such thing as an ideal split percentage. A data scientist determines a split   that suits the pro ect's goals
        taking into account the follo ing factors
        •    ost of training the model               •    he computational cost of assessing the model
        •    epresentativeness of the training set   •    epresentativeness of the test set
         plit percentages used commonly are                                  otal number of e amples
        •   rain     ,  est
        •   rain     ,  est                                                  raining  et            est  et
        •   rain     ,  est







                      Touchpad Artificial Intelligence (Ver. 2.0)-XII
   123   124   125   126   127   128   129   130   131   132   133