Page 288 - Artificial Intellegence_v2.0_Class_11
P. 288
Correlation is not Causation
The correlation is a statistical method that indicates whether a pair of variables has a linear relationship and will change
together. It does not state the reasons of the relationship, but it tells that a relationship does exist.
Causation shows that an event is the direct result of the occurrence of another event, i.e. a causal relationship exists
between the two events. This is also called cause and effect. For example, a speeding car leads to an accident. The
accident is due to causation.
Causation takes a step ahead than correlation. It states that any change in the value of one variable will definitely cause
a change in the value of the second variable. This that one variable makes the other happen. This is also called as cause
and effect.
In statistics, the phrase "correlation does not imply causation"
means that the relationship between two variables cannot
be reasonably deduced based solely on their observed
association.
• "Correlation is not causation" means that if two things are Causation Causation
related, does not, necessarily mean that one thing leads to
the other.
• For example, just because Indians tend to eat more in cold
weather and less in hot weather does not mean that cold
weather leads to crazy shopping for eatables. Correlation
• Another example is due to less RAM, our mobile phone Sale of fans Consumption of
freezes. This means no playing games or text messaging ice-creams
through the phone.
At a Glance
• Regression is a Supervised machine learning algorithm.
• Linear regression is an algorithm used to predict a relationship between two different variables.
• A scatter plot is a graph which uses Cartesian coordinates to display values for mainly two variables in a
dataset.
• To plot the points on the scatterplot, you show each one as an ordered pair.
• Outliers are data points on the scatterplot that do not follow the pattern of the dataset.
• Regression focuses on predicting the value of the variable that is dependent on the second variable.
• The equation for linear regression is given by y = mx + b where m is the slope and b is the y-intercept.
• The line that passes close to most of the data points is called the ‘Line of Best Fit’ or ‘Regression Line’.
• The vertical distance between the observed responses in the dataset and the line of best fit is called the
residual error (e). Each data point has one residual.
• Correlation is used to express an association between two quantitative variables.
• The correlation coefficient is measured on a scale that varies from + 1 to –1.
• "Correlation is not causation" means that if two things are related, does not, necessarily mean that one thing
leads to the other.
• Crosstabs determine a relationship between two variables. This relationship is exhibited in tabular form.
286 Touchpad Artificial Intelligence (Ver. 2.0)-XI

