Page 149 - Data Science class 10
P. 149

2.3.2. Collect/Consider the Data

            This step can be called as acknowledging variability while designing for differences. Data collection designs must
            acknowledge variability in data. There are just a few techniques, such as statistical process control and random
            sampling, that are used to reduce and detect variability in data. While others are used to induce variability to
            test treatments, such as design of experiments. The latter strategy chooses experimental designs that take into
            account the distinctions between groups receiving various treatments. The purpose of assigning participants to
            groups at random is to minimise discrepancies between groups caused by variables that are not modified or
            under control throughout the experiment. In all designs, the main statistical focus is to look for, account for and
            explain variability. The data must be questioned once it is available, regardless of whether it was obtained directly
            from the source or through another. For example, questions about how the variables differ by type, the possible
            outcomes of each of the variables, and how the data was collected is necessary to clarify whether the data is useful
            for answering the statistical investigative question. The scope of generalisability and the possible limitations in
            analysis and interpretation is impacted by the data collection design.


            2.3.3. Analyse the Data
            To effectively respond to statistical questions, data must be correctly arranged,  condensed, and represented.
            Additionally, the data you gather typically vary (they are not all the same), and you will need to take the sources of
            this variation into consideration.
            This step can also be called as accounting of variability while the distributions. We attempt to comprehend the
            variability of the data when we analyse it. Reasoning about distributions is key to accounting for and describing
            variability at all development levels. Graphical displays and numerical summaries are used to explore, describe, and
            compare the variability of distributions.
            Wisden hailed Bradman as, "the greatest phenomenon in the history of cricket, indeed in the history of all ball
            games". Statistician  Charles  Davis analysed  the  statistics for  several  prominent sportsmen by  comparing  the
            number of standard deviations that they stand above the mean for their sport. The top performers in his selected
            sports were:

                                Athlete             Sport            Statistic     Standard deviations

                          Bradman            Cricket             Batting average           4.4
                          Pele               Football            Goals per game            3.7
                          Ty Cobb            Baseball            Points per game           3.6

                          Jack Nicklaus      Golf                Major titles              3.5
                          Michael Jordan     Basketball          Points per game           3.4

            The statistics show that "no other athlete dominates an international sport to the extent that Bradman does in cricket.
            For example, the batting averages of Indian Cricket Team for a particular year can be displayed in two comparative
            horizontal plots. These graphs show the variability of each teams’ distributions of batting averages.


                             B Kumar
                           KD Karthik
                              RR Pant
                           KM Jadhav
                            MS Dhoni
                            S Dhawan
                           RG Sharma
                                      0     10     20     30     40     50     60     70     80     90
                                                             Average


                                                                                 Distributions in Data Science  147
   144   145   146   147   148   149   150   151   152   153   154