Page 125 - Data Science class 11
P. 125

the dark treatment bags. The last thirty seeds were placed inside the other dark treatment bag. Care was taken to make
            the four growth setups as identical as possible. With two growth setups for the same condition, their results could be
            compared to ensure similar handling. Three days later, the length of the radish seedlings for the germinating seeds was
            measured in millimeters.
            The data were noted in a summary format like the one shown here.

                   1-Light    x,x,2,3,5,5,5,5,5,7,7,7,8,8,8,9,10,10,10,10,10,10,10,10,14,15,15,20,21,21

                   2-Mixed    x,x,3,4,5,9,10,10,10,10,10,11,13,15,15,15,17,20,20,20,20,20,21,21,22,22,23,25,25,27

                   3-Dark     x,5,8,8,10,10,14,15,15,15,20,20,20,20,20,22,25,25,25,25,26,30,30,35,35,35,35,36,37,38

                   4-Dark     x,5,8,8,10,10,10,11,14,15,15,15,16,20,20,20,20,20,24,25,29,30,30,30,30,31,33,35,35,40
                                Fig. 2.4.1: Table showing Radish Seedling length after 3 days (sorted)

            The table shows the sorted values. In each treatment type, some seeds did not germinate. Such values were considered
            missing values and were recorded as “x”. Thus, there were 114 observations (28 for light treatment, 28 for mixed
            treatment, 58 for dark treatment).
            It would have been more engaging, if the students were encouraged to discuss whether excluding the seeds that did
            not germinate could add bias to the conclusions. In this scenario, because the number of seeds in each category was
            roughly the same, the missing values likely happened by chance.
            Conversely, if all the missing values were in one category, it would have suggested that that category's conditions
            hindered the growth, and so missing data were not accidental.
            The data could be represented in another table with each observation (seed) on a separate row and each variable on
            a separate column for analysis purposes.

                                   Seed #       Growth Bag        Treatment       Length (mm)
                                      1              1              1-Light             x
                                      2              1              1-Light             x
                                      3              1              1-Light             2
                                     …..             …..              …..              …..
                                    118              4              3-Dark             35
                                    119              3              3-Dark             35
                                    120              4              3-Dark             40

                                     Fig. 2.4.2 Long Format Listing of Radish Seedling Lengths

            In the above table, the observed units are individual seeds. Growth bag indicates the bag (1, 2, 3, or 4) in which the
            seed is present, and treatment indicates the treatment (1-Light, 2- Mixed, or 3-Dark) the seeds are receiving. Both
            growth bag and treatment are said to be categorical variables. On the other hand, length is a quantitative variable,
            measuring the length of the seedlings in millimeters.

            When the mean, median, and standard deviation were calculated on the seed samples exposed to different treatments
            (1-Light, 2-Mixed, or 3-Dark), the results produced were as shown in the table here:
                          Treatment               n              Mean           Median         StdDev.

                          1-Light                28               9.64           9.50            5.03
                          2-Mixed                28              15.82            16             6.76

                          3-Dark                 28              21.86            20             9.75

                                             Fig. 2.4.3 Treatment Summary Statistics

                                                                                              Assessing Data   123
   120   121   122   123   124   125   126   127   128   129   130