Page 119 - Data Science class 11
P. 119
Statistics and statistical inference put data at the center of analysis to draw inferences. With inferential statistics, you
take data from samples and make generalisations about a population. For example, you might stand in a mall and
ask a sample of 100 people, if they like shopping at a given place. This is where you can use sample data to answer
research questions.
Inferential Conclusion
As the name suggests, inferential statistical analysis is the process to draw inferences or conclusions. It allows users to
infer trends about a larger population, based on the samples after analysis. In other words, it takes data from a sample
and then makes conclusions about a group of people.
Data assessment Process
Data Assessment helps in minimising risks, reducing effort and optimising cost. It also helps in easily meeting the
information requirements of data owners, revision, auditors and management.
Following are the elements of data assessment process:
• Assess
• Understand redundant data within an individual application and across multiple disparate applications
• Review and quantify missing, erroneous, and inconsistent data
• Gather metrics summarizing identified data errors
• Compare legacy data against data governance standards
The most common basic statistics terms you will come across are the mean, mode and median. These are known as
“Measures of Central Tendency.” Central Tendency is defined as the statistical measure that identifies a single value
as a representative of an entire distribution. It tries to provide an accurate description of the entire data. The three
measures of Central Tendency are Mean, Median and Mode. Also important in this early chapter of statistics is the
shape of a distribution. This tells us something about how data is spread around the mean or median.
2.3.1 concept of correlation and causation
In statistics, two or more variables are related if their values alter, so that the rise or fall in one variable's value is either
directly or inversely proportional to the rise or fall in the other variable's value.
In statistics, correlation describes the direction of a relationship between two or more variables. However, we cannot
assume that change in one variable gives rise to the change in the other variables. E.g., an increase sales of winter care
products in the United States of America is correlated to the increase in summer care products in Australia.
On the other side, causation shows that one events' occurrence originates from the other events' occurrence. E.g.,
How different human activities, livestock farming, rising emissions, and cutting down trees in forests, ultimately affect
temperature.
Concept of Correlation
Correlation is a statistical term describing the degree to which two variables move in coordination with one-another.
Correlation means association—more precisely, it is a measure of the extent to which two variables are related.
Correlation is a relationship or connection between two variables where whenever one changes, the other is also likely
to change. But when a change in one variable does not cause the other to change—that is a correlation, not causation.
Correlation is used to describe the linear relationship between two continuous variables (e.g., height and weight).
In general, correlation tends to be used when there is no identified response variable. It measures the strength
(qualitatively) and direction of the linear relationship between two or more variables.
Three Types of Correlation
Broadly speaking, there are three different types of correlations: positive, negative, and zero or no correlation.
• Positive correlation: It is a relationship between two variables in which both variables move along the same direction.
In other words, when one variable is directly proportional to the other variable. An example of positive correlation
would be height and weight, as taller people usually have more weight (with exception of course).
Assessing Data 117

