Page 339 - AI_Ver_3.0_class_11
P. 339

The scatterplot may look something like one of the following:



                           Positive Correlation          Negative Correlation         No Correlation














                 3.   In statistics, outliers are data points that are significantly different from other observations. Outliers may be due
                    to measurement irregularity or may indicate experimental error; the latter are sometimes excluded from the data
                    set. Outliers can cause serious problems in statistical analysis. The data should not have any significant outliers.
                    Outliers are single data points within your dataset that do not follow the usual pattern. The following scatterplots
                    highlight the potential impact of outliers:





                                            r = 0.39                      r = 0.69






                                                     Outlier                    Outlier removed



                      Outliers can have a great impact on the line of best fit and the Pearson correlation coefficient, leading to very
                    difficult inferences regarding the data. Therefore, it is best to have no outliers or keep them to a minimum.
                 4.  The variables should be normally distributed (approximately).


                 Correlation is not Causation
                 The correlation is a statistical method that indicates whether a pair of variables has a linear relationship and will
                 change together. It does not state the reasons for the relationship, but it tells that a relationship exists.

                 Causation shows that an event is the direct result of the occurrence of another event, i.e. a causal relationship exists
                 between the two events. This is also called cause and effect. For example, a speeding car leads to an accident. The
                 accident is due to causation.

                 Causation takes a step ahead of correlation. It states that any change in the value of one variable will definitely cause
                 a change in the value of the second variable. This means that one variable makes the other happen. This is also called
                 cause and effect.

                 In statistics, the phrase "correlation does not imply causation" means that the relationship between two variables
                 cannot be reasonably deduced based solely on their observed association.

                 ●   "Correlation is not causation" means that if two things are related, does not, necessarily mean that one thing leads
                    to the other.
                 ●   For example, just because Indians tend to eat more in cold weather and less in hot weather does not mean that
                    cold weather leads to crazy shopping for eatables.

                                                                                  Machine Learning Algorithms   337
   334   335   336   337   338   339   340   341   342   343   344