Page 148 - Data Science class 10
P. 148

The term "variability" describes how dispersed a set of data is. In statistics, "variability" refers to the variation that
        data points within a data collection exhibit when compared to one another or the mean. The range, IQR, variance,
        and standard deviation are popular measurements of variability.

        Four things make a problem statistical: the way in which you ask the question, the role and nature of the data, the
        particular ways in which you examine the data, and the types of interpretations you make from the investigation.
        A statistics problem solving process typically contains four components:
        1. Planning the problem (Ask a Question)

        2. Collect Data
        3. Analyse Data
        4. Interpret Results

        All the activities in this chapter are built on this four-step method for resolving statistical issues. As you look at
        various statistical issues, this technique will become more and more familiar to you.
        Let us understand each step-in detail now.

        2.3.1. Formulate Statistical Investigative Questions

        This is often referred to as starting the process with an expectation of variability.
        Investigations are successful when statistical questions are created that account
        for variability. For instance, each of the following statistical investigative questions
        foresees variability and can result in a thorough data collection process and
        subsequent data analysis:
           • How fast can my plant grow?
           • Do plants exposed to more sunlight grow faster?
           • How does sunlight affect the growth of a plant?
        For plants to survive, three things are essential: carbon dioxide, water, and sunlight. The plants use the energy from
        the sun to turn carbon dioxide, soil nutrients, and water into food through a process known as photosynthesis! In
        this project, we will:

        1.  watch as seeds sprout and monitor plant growth as 'Basil' (a seasoning herb) plants appear.
        2.  follow the development of basil seeds under three different lighting conditions—full sun, some sun, and limited/
          no sun—and see how photosynthesis works! Think about these crucial questions before we start:
           • Can a seed grow/germinate/develop into a plant with limited or no sunlight?
           • How do you think a seed will grow with some or partial sunlight?
           • What do you think the plant will look like after two weeks of growth?
           • What will be the difference between the three sunlight exposure plants?
           • How do you think the plants will be alike?
        In contrast, the question 'How tall is the plant?' is answered with a single height, it is therefore not a statistical
        investigative question. 'How tall is the plant?' is a question we ask to collect data.
        The statistical investigation question 'Do plants exposed to more sunshine grow more rapidly?' could be answered
        with a variety of additional data collection inquiries. We anticipate an answer based on measurements of plant
        heights that vary because there will be varying heights for the various sunshine exposures.
        The use of questioning is prevalent across all four steps of the statistical problem-solving process, even though
        statistical investigative questions are the starting point of worthwhile investigations. These questioning techniques
        will be used at various levels throughout the instances.
        A statistical investigative query should include other characteristics in addition to accounting for variability. The variables
        of interest must be obvious, the group or population that the question is focused on must be obvious, the question's
        intent should be obvious – is it asking for a description of the data, is it comparing a variable across two or more groups,
        is it looking at an association between two variables? The question should be about the entire group (anticipating
        variability) and not about a single person (providing a deterministic answer), and the variables must be obvious.

          146   Touchpad Data Science-X
   143   144   145   146   147   148   149   150   151   152   153