Page 286 - Artificial Intellegence_v2.0_Class_11
P. 286
Coefficient, r
Strength of Association Positive Negative
Small .1 to .3 –0.1 to –0.3
Medium .3 to .5 –0.3 to –0.5
Large .5 to 1.0 –0.5 to –1.0
Note that the strength of the association of the variables depends on what you are measuring and sample sizes.
Example 1: The ages and incomes of five people are given below. Calculate the Pearson Coefficient. What does it depict?
Age (x) Income (y)
20 2000
30 40000
40 49000
50 61000
60 75000
Solution: To calculate the coefficient, we need to calculate the following values.
x y xy x 2 y 2
20 2000 40000 400 4000000
30 4500 135000 900 20250000
40 5700 228000 1600 32490000
50 6800 340000 2500 46240000
60 8000 480000 3600 64000000
2
2
∑x=200 ∑y=27000 ∑xy=1223000 ∑x =9000 ∑y =166980000
Putting the values in the formula,
5 (1223000) – (200) (27000)
r = [(5) 9000 – (200) ] [(5) (166980000) – (27000) 2
2
715000
=
727667.5065
= 0.98
This value represents a positive strong relationship between the two variables. As the age of a person increases, the
person’s salary also goes up!
Example 2: Amit is a model student good in both academics and sports. However, after some time, he reduced his
sports activity and thus observed that he is scoring lesser marks in tests. To investigate his hypothesis, he noted down
how he scored in his tests, based on how many hours he played any sport before appearing in the school tests. He
gathered this data to check the correlation between hours of sports he played and his tests scores. He thus, calculated
the Pearson Correlation Coefficient = 0.95. Explain what this value means.
Solution: 0.95 shows a positive and strong strength of association between the two variables. This means that Amit scored
better marks if he played before his exams. If Amit reduced, his playing hours, the marks he scored also reduced.
Assumptions
There are four assumptions for Pearson's correlation coefficient which are as follows. If any of these four requirements
are not met, analysis of data using Pearson's correlation coefficient might not yield a valid result:
1. The data type of the two variables should be continuous. Examples of such continuous variables include
height (measured in feet and inches), temperature (measured in °C), salary (measured in INR), study time (measured
in hours), intelligence (measured through IQ score), exam performance (measured from 0 to 100), sales (measured
in number of transactions every month), etc.
284 Touchpad Artificial Intelligence (Ver. 2.0)-XI

