Page 157 - Data Science class 11
P. 157

So, if we have values x1, x2, ..., xn in a data set, within representing the number of values in this data set, the sample
                                    _
            mean, usually denoted by x (pronounced x bar), is:
                                                             –   Sx
                                                             x =
                                                                 h
            This formula is usually written in a slightly different manner using the Greek capital letter, pronounced “sigma”, which
            means “sum of...”.

            When not to use the mean
            Mean is specifically susceptible to the influence of outliers. Outliers are values that are too small or too large in
            numerical value in a data set. For example, consider the wages of staff at a factory below:
               Staff       1        2        3        4        5         6        7        8        9        10
               Salary     15k      18k      16k      14k      15k       15k      12k      17k      90k      95k

            First, let us calculate the mean.
            Using the formula as stated above,

            Mean Salary= (15+18+16+14+15+15+12+17+90+95)/10= 307/10 = 30.7
            So, the mean salary for these ten staff is $30.7k. However, analysing the raw data indicates that this mean value might
            not be the best way to accurately reflect the salary of a worker, as most workers have salaries in the $12k to $18k
            range. The mean is being skewed by the two large salaries. Therefore, in this situation, we need a better measure of
            central tendency. Taking the median would be a better measure of central tendency in this situation.

            Median
            The median is the middle value in a data set that has been arranged in order of magnitude. The median, unlike the
            mean, is less affected by outliers and skewed data. Suppose we have to calculate the median for the data below:

                 60       50       85        51       32       11       51       50       82       43        91
            We first need to rearrange that data into an order of magnitude (smallest first):

                 11       32       43        50       50       51       51       60       82       85        91

            The median value '51' is represented by a middle mark. It is the middle mark since there are five scores that lie before
            it and five scores that lie after it. Here, you have an odd number of values, so you can easily find the middle value, but
            if you have an even number of values, we can take the middle two values and find their average.
            Let us look at the example below:

                      60       50       85       51       32       11        51       50       82       43
            We again rearrange that data  into an order of magnitude (smallest first):

                      11       32       43       50       50       51        51       60       82       85

            The average of the the 5th and 6th values in this data set gives you the value for your median, which is 50.5.

            Mode

            The mode is the most frequently occurring value in a data set. It represents the highest bar in a bar chart or histogram.
            You can, therefore, usually think of mode as being the most popular option among the given options. Here is an
            example of mode:






                                                                                              Randomisation    155
   152   153   154   155   156   157   158   159   160   161   162