Page 339 - AI_Ver_3.0_class_11
P. 339
The scatterplot may look something like one of the following:
Positive Correlation Negative Correlation No Correlation
3. In statistics, outliers are data points that are significantly different from other observations. Outliers may be due
to measurement irregularity or may indicate experimental error; the latter are sometimes excluded from the data
set. Outliers can cause serious problems in statistical analysis. The data should not have any significant outliers.
Outliers are single data points within your dataset that do not follow the usual pattern. The following scatterplots
highlight the potential impact of outliers:
r = 0.39 r = 0.69
Outlier Outlier removed
Outliers can have a great impact on the line of best fit and the Pearson correlation coefficient, leading to very
difficult inferences regarding the data. Therefore, it is best to have no outliers or keep them to a minimum.
4. The variables should be normally distributed (approximately).
Correlation is not Causation
The correlation is a statistical method that indicates whether a pair of variables has a linear relationship and will
change together. It does not state the reasons for the relationship, but it tells that a relationship exists.
Causation shows that an event is the direct result of the occurrence of another event, i.e. a causal relationship exists
between the two events. This is also called cause and effect. For example, a speeding car leads to an accident. The
accident is due to causation.
Causation takes a step ahead of correlation. It states that any change in the value of one variable will definitely cause
a change in the value of the second variable. This means that one variable makes the other happen. This is also called
cause and effect.
In statistics, the phrase "correlation does not imply causation" means that the relationship between two variables
cannot be reasonably deduced based solely on their observed association.
● "Correlation is not causation" means that if two things are related, does not, necessarily mean that one thing leads
to the other.
● For example, just because Indians tend to eat more in cold weather and less in hot weather does not mean that
cold weather leads to crazy shopping for eatables.
Machine Learning Algorithms 337

