Page 119 - Data Science class 11
P. 119

Statistics and statistical inference put data at the center of analysis to draw inferences. With inferential statistics, you
            take data from samples and make generalisations about a population. For example, you might stand in a mall and
            ask a sample of 100 people, if they like shopping at a given place. This is where you can use sample data to answer
            research questions.
            Inferential Conclusion

            As the name suggests, inferential statistical analysis is the process to draw inferences or conclusions. It allows users to
            infer trends about a larger population, based on the samples after analysis. In other words, it takes data from a sample
            and then makes conclusions about a group of people.
            Data assessment Process

            Data Assessment helps in minimising risks, reducing effort and optimising cost. It also helps in easily meeting the
            information requirements of data owners, revision, auditors and management.
            Following are the elements of data assessment process:

               • Assess
               • Understand redundant data within an individual application and across multiple disparate applications
               • Review and quantify missing, erroneous, and inconsistent data
               • Gather metrics summarizing identified data errors
               • Compare legacy data against data governance standards
            The most common basic statistics terms you will come across are the mean, mode and median. These are known as
            “Measures of Central Tendency.” Central Tendency is defined as the statistical measure that identifies a single value
            as a representative of an entire distribution. It tries to provide an accurate description of the entire data. The three
            measures of Central Tendency are Mean, Median and Mode. Also important in this early chapter of statistics is the
            shape of a distribution. This tells us something about how data is spread around the mean or median.


            2.3.1 concept of correlation and causation
            In statistics, two or more variables are related if their values alter, so that the rise or fall in one variable's value is either
            directly or inversely proportional to the rise or fall in the other variable's value.
             In statistics, correlation describes the direction of a relationship between two or more variables. However, we cannot
            assume that change in one variable gives rise to the change in the other variables. E.g., an increase sales of winter care
            products in the United States of America is correlated to the increase in summer care products in Australia.

            On the other side, causation shows that one events' occurrence originates from the other events' occurrence. E.g.,
            How different human activities, livestock farming, rising emissions, and cutting down trees in forests, ultimately affect
            temperature.

            Concept of Correlation
            Correlation is a statistical term describing the degree to which two variables move in coordination with one-another.
            Correlation means association—more precisely, it is a measure of the extent to which two variables are related.
            Correlation is a relationship or connection between two variables where whenever one changes, the other is also likely
            to change. But when a change in one variable does not cause the other to change—that is a correlation, not causation.
            Correlation is used to describe the linear relationship between two continuous variables (e.g., height and weight).
            In  general,  correlation  tends  to  be  used  when  there  is  no  identified  response  variable.  It  measures  the  strength
            (qualitatively) and direction of the linear relationship between two or more variables.

            Three Types of Correlation
            Broadly speaking, there are three different types of correlations: positive, negative, and zero or no correlation.

               • Positive correlation: It is a relationship between two variables in which both variables move along the same direction.
              In other words, when one variable is directly proportional to the other variable. An example of positive correlation
              would be height and weight, as taller people usually have more weight (with exception of course).


                                                                                              Assessing Data   117
   114   115   116   117   118   119   120   121   122   123   124