Page 188 - Artificial Intellegence_v2.0_Class_11
P. 188

•  The scatterplot is a valuable tool for calculating correlation. Variable relationships can be categorised in a
                 variety of ways, including positive or negative, strong or weak, linear or nonlinear.

              Data Dimensionality
              The number of properties or features in a dataset is referred to as the dimension of the dataset. High dimensional data
              is a term used to describe a dataset having many properties, often one hundred or more.
              The following Dataset has 5 columns so its dimensionality is 5.

                                             Studentid    Sname    Gender Age     Marks
                                               S103      Abhishek     M      12     78
                                               S114       Sanat       F      14     56
                                               K124       Ishita      F      10     89

              In  machine  learning,  while  performing  classification  or  clustering  of  the  data,  we  need  to  choose  what  all
              dimensionalities/columns we want to use in order to get meaningful information.

                      Simple Linear Equation and Regression

              Linear regression shows the relationship between two variables
              by  fitting  a  linear  equation  to  the  observed  data.  One  variable  is   line: y=mx+b
              considered an independent variable and the other a dependent
              variable. For example, we may wish to use a linear regression model   y (dependent)   variable
              to understand the relation between an individual’s weight with their
              height. A linear regression line has an equation of the form:                 ŷ 2
              Y = mX + b where:                                                    y 1      y 2
                 • Y is the dependent variable.
                 • m is the y-intercept.                                   y-intercept
                 • X is the independent variable.                                       x 1    x (independent) variable
                 • b is the slope of the line.
              You can use simple linear regression when you want to know:

                 • How strong is the relationship is between two variables (e.g., the relationship between rainfall and soil erosion)?
                 • The value of the dependent variable at a certain value of the independent variable (e.g., what is the amount of soil
                 erosion at a certain level of rainfall).


              Least Square Method
              The “least squares method” is a form of mathematical regression analysis used to determine the line of best fit for a
              data set.
              The function of the  regression model  is  to determine the linear
              function  between  the  variables  X  and  Y,  which  can  best  describe
              the relationship between the two variables. In linear regression, it     Simple Linear Regression
              is assumed that Y can be calculated from a certain combination of
              input variables. The relationship between the input variable (X) and
              the target variable (Y) can be represented by drawing a line through   y
              the points on the graph. This line represents the function that best
              describes  the  relationship  between  X  and  Y  (for  example,  every
              time X increases by 3, Y increases by 2). The goal is to find the best
              “regression line”, or function that best fits the data.
              The line of best-fit is the line that passes close to most of the data                  x
              points. The line of best-fit is used to make predictions about the data.

                    186     Touchpad Artificial Intelligence (Ver. 2.0)-XI
   183   184   185   186   187   188   189   190   191   192   193