Page 149 - Data Science class 10
P. 149
2.3.2. Collect/Consider the Data
This step can be called as acknowledging variability while designing for differences. Data collection designs must
acknowledge variability in data. There are just a few techniques, such as statistical process control and random
sampling, that are used to reduce and detect variability in data. While others are used to induce variability to
test treatments, such as design of experiments. The latter strategy chooses experimental designs that take into
account the distinctions between groups receiving various treatments. The purpose of assigning participants to
groups at random is to minimise discrepancies between groups caused by variables that are not modified or
under control throughout the experiment. In all designs, the main statistical focus is to look for, account for and
explain variability. The data must be questioned once it is available, regardless of whether it was obtained directly
from the source or through another. For example, questions about how the variables differ by type, the possible
outcomes of each of the variables, and how the data was collected is necessary to clarify whether the data is useful
for answering the statistical investigative question. The scope of generalisability and the possible limitations in
analysis and interpretation is impacted by the data collection design.
2.3.3. Analyse the Data
To effectively respond to statistical questions, data must be correctly arranged, condensed, and represented.
Additionally, the data you gather typically vary (they are not all the same), and you will need to take the sources of
this variation into consideration.
This step can also be called as accounting of variability while the distributions. We attempt to comprehend the
variability of the data when we analyse it. Reasoning about distributions is key to accounting for and describing
variability at all development levels. Graphical displays and numerical summaries are used to explore, describe, and
compare the variability of distributions.
Wisden hailed Bradman as, "the greatest phenomenon in the history of cricket, indeed in the history of all ball
games". Statistician Charles Davis analysed the statistics for several prominent sportsmen by comparing the
number of standard deviations that they stand above the mean for their sport. The top performers in his selected
sports were:
Athlete Sport Statistic Standard deviations
Bradman Cricket Batting average 4.4
Pele Football Goals per game 3.7
Ty Cobb Baseball Points per game 3.6
Jack Nicklaus Golf Major titles 3.5
Michael Jordan Basketball Points per game 3.4
The statistics show that "no other athlete dominates an international sport to the extent that Bradman does in cricket.
For example, the batting averages of Indian Cricket Team for a particular year can be displayed in two comparative
horizontal plots. These graphs show the variability of each teams’ distributions of batting averages.
B Kumar
KD Karthik
RR Pant
KM Jadhav
MS Dhoni
S Dhawan
RG Sharma
0 10 20 30 40 50 60 70 80 90
Average
Distributions in Data Science 147

