Page 157 - Data Science class 11
P. 157
So, if we have values x1, x2, ..., xn in a data set, within representing the number of values in this data set, the sample
_
mean, usually denoted by x (pronounced x bar), is:
– Sx
x =
h
This formula is usually written in a slightly different manner using the Greek capital letter, pronounced “sigma”, which
means “sum of...”.
When not to use the mean
Mean is specifically susceptible to the influence of outliers. Outliers are values that are too small or too large in
numerical value in a data set. For example, consider the wages of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
First, let us calculate the mean.
Using the formula as stated above,
Mean Salary= (15+18+16+14+15+15+12+17+90+95)/10= 307/10 = 30.7
So, the mean salary for these ten staff is $30.7k. However, analysing the raw data indicates that this mean value might
not be the best way to accurately reflect the salary of a worker, as most workers have salaries in the $12k to $18k
range. The mean is being skewed by the two large salaries. Therefore, in this situation, we need a better measure of
central tendency. Taking the median would be a better measure of central tendency in this situation.
Median
The median is the middle value in a data set that has been arranged in order of magnitude. The median, unlike the
mean, is less affected by outliers and skewed data. Suppose we have to calculate the median for the data below:
60 50 85 51 32 11 51 50 82 43 91
We first need to rearrange that data into an order of magnitude (smallest first):
11 32 43 50 50 51 51 60 82 85 91
The median value '51' is represented by a middle mark. It is the middle mark since there are five scores that lie before
it and five scores that lie after it. Here, you have an odd number of values, so you can easily find the middle value, but
if you have an even number of values, we can take the middle two values and find their average.
Let us look at the example below:
60 50 85 51 32 11 51 50 82 43
We again rearrange that data into an order of magnitude (smallest first):
11 32 43 50 50 51 51 60 82 85
The average of the the 5th and 6th values in this data set gives you the value for your median, which is 50.5.
Mode
The mode is the most frequently occurring value in a data set. It represents the highest bar in a bar chart or histogram.
You can, therefore, usually think of mode as being the most popular option among the given options. Here is an
example of mode:
Randomisation 155

