Page 125 - Data Science class 11
P. 125
the dark treatment bags. The last thirty seeds were placed inside the other dark treatment bag. Care was taken to make
the four growth setups as identical as possible. With two growth setups for the same condition, their results could be
compared to ensure similar handling. Three days later, the length of the radish seedlings for the germinating seeds was
measured in millimeters.
The data were noted in a summary format like the one shown here.
1-Light x,x,2,3,5,5,5,5,5,7,7,7,8,8,8,9,10,10,10,10,10,10,10,10,14,15,15,20,21,21
2-Mixed x,x,3,4,5,9,10,10,10,10,10,11,13,15,15,15,17,20,20,20,20,20,21,21,22,22,23,25,25,27
3-Dark x,5,8,8,10,10,14,15,15,15,20,20,20,20,20,22,25,25,25,25,26,30,30,35,35,35,35,36,37,38
4-Dark x,5,8,8,10,10,10,11,14,15,15,15,16,20,20,20,20,20,24,25,29,30,30,30,30,31,33,35,35,40
Fig. 2.4.1: Table showing Radish Seedling length after 3 days (sorted)
The table shows the sorted values. In each treatment type, some seeds did not germinate. Such values were considered
missing values and were recorded as “x”. Thus, there were 114 observations (28 for light treatment, 28 for mixed
treatment, 58 for dark treatment).
It would have been more engaging, if the students were encouraged to discuss whether excluding the seeds that did
not germinate could add bias to the conclusions. In this scenario, because the number of seeds in each category was
roughly the same, the missing values likely happened by chance.
Conversely, if all the missing values were in one category, it would have suggested that that category's conditions
hindered the growth, and so missing data were not accidental.
The data could be represented in another table with each observation (seed) on a separate row and each variable on
a separate column for analysis purposes.
Seed # Growth Bag Treatment Length (mm)
1 1 1-Light x
2 1 1-Light x
3 1 1-Light 2
….. ….. ….. …..
118 4 3-Dark 35
119 3 3-Dark 35
120 4 3-Dark 40
Fig. 2.4.2 Long Format Listing of Radish Seedling Lengths
In the above table, the observed units are individual seeds. Growth bag indicates the bag (1, 2, 3, or 4) in which the
seed is present, and treatment indicates the treatment (1-Light, 2- Mixed, or 3-Dark) the seeds are receiving. Both
growth bag and treatment are said to be categorical variables. On the other hand, length is a quantitative variable,
measuring the length of the seedlings in millimeters.
When the mean, median, and standard deviation were calculated on the seed samples exposed to different treatments
(1-Light, 2-Mixed, or 3-Dark), the results produced were as shown in the table here:
Treatment n Mean Median StdDev.
1-Light 28 9.64 9.50 5.03
2-Mixed 28 15.82 16 6.76
3-Dark 28 21.86 20 9.75
Fig. 2.4.3 Treatment Summary Statistics
Assessing Data 123

