Page 188 - Artificial Intellegence_v2.0_Class_11
P. 188
• The scatterplot is a valuable tool for calculating correlation. Variable relationships can be categorised in a
variety of ways, including positive or negative, strong or weak, linear or nonlinear.
Data Dimensionality
The number of properties or features in a dataset is referred to as the dimension of the dataset. High dimensional data
is a term used to describe a dataset having many properties, often one hundred or more.
The following Dataset has 5 columns so its dimensionality is 5.
Studentid Sname Gender Age Marks
S103 Abhishek M 12 78
S114 Sanat F 14 56
K124 Ishita F 10 89
In machine learning, while performing classification or clustering of the data, we need to choose what all
dimensionalities/columns we want to use in order to get meaningful information.
Simple Linear Equation and Regression
Linear regression shows the relationship between two variables
by fitting a linear equation to the observed data. One variable is line: y=mx+b
considered an independent variable and the other a dependent
variable. For example, we may wish to use a linear regression model y (dependent) variable
to understand the relation between an individual’s weight with their
height. A linear regression line has an equation of the form: ŷ 2
Y = mX + b where: y 1 y 2
• Y is the dependent variable.
• m is the y-intercept. y-intercept
• X is the independent variable. x 1 x (independent) variable
• b is the slope of the line.
You can use simple linear regression when you want to know:
• How strong is the relationship is between two variables (e.g., the relationship between rainfall and soil erosion)?
• The value of the dependent variable at a certain value of the independent variable (e.g., what is the amount of soil
erosion at a certain level of rainfall).
Least Square Method
The “least squares method” is a form of mathematical regression analysis used to determine the line of best fit for a
data set.
The function of the regression model is to determine the linear
function between the variables X and Y, which can best describe
the relationship between the two variables. In linear regression, it Simple Linear Regression
is assumed that Y can be calculated from a certain combination of
input variables. The relationship between the input variable (X) and
the target variable (Y) can be represented by drawing a line through y
the points on the graph. This line represents the function that best
describes the relationship between X and Y (for example, every
time X increases by 3, Y increases by 2). The goal is to find the best
“regression line”, or function that best fits the data.
The line of best-fit is the line that passes close to most of the data x
points. The line of best-fit is used to make predictions about the data.
186 Touchpad Artificial Intelligence (Ver. 2.0)-XI

