Page 278 - Artificial Intellegence_v2.0_Class_11
P. 278
The scatterplot shows a positive relationship between the two variables—studying increases (marks) percentage.
However,
• there is a student who studies 3 hours but still achieves a low percentage which is called outlier.
• outliers are data points that do not follow the logic of the dataset.
Regression—Finding the Line
When we make a distribution in which there is an involvement of more than one variable, then such an analysis is called
Regression Analysis. Regression generally focuses on predicting the value of the variable that is dependent on the other.
Let us consider two variables x and y.
y – Regression or Dependent Variable
x – Independent Variable or Predictor
Therefore, if we use a simple linear regression line: y=mx+b
model where y depends on x, then the regression
line of y on x is: y (dependent) variable
y = mx + b
where, ŷ 2
y
• x is the independent variable. 1 y 2
• y is the dependent variable.
• m is the slope of the line. y-intercept x
• b is the y-intercept. 1 x (independent) variable
Least Squares Method—Finding the Line of Best Fit
Consider the following example where marks of 10 students are shown which they scored after a certain number of
hours of study:
No. of Hours Studied Marks
2 44
9 98
5 80
3 75
7 70
1 63
8 53
6 92
2.5 71
4 65
276 Touchpad Artificial Intelligence (Ver. 2.0)-XI

