Page 173 - Data Science class 10
P. 173
The mean of the sample means is 75 and the standard deviation of the sample means is 2.5, with the standard
deviation of the sample means computed as follows:
s 8
X s = = = 2.5
n 10
If we were to take samples of n=5 instead of n=10, we would get a similar distribution, but the variation among
the sample means would be larger. In fact, when we did this we got a sample mean = 75 and a sample standard
deviation = 3.6.
3.7. WHY IS THE CENTRAL LIMIT THEOREM IMPORTANT?
According to the Central Limit Theorem, as sample sizes grow, the sampling distribution's form will always tend
towards normalcy, regardless of how the population is distributed. This is helpful, as any research never knows
which mean in the sampling distribution is the same as population mean, however, by selecting many random
samples from population, the sample means will cluster together, allowing the researcher to make a good estimate
of the population mean. Having said that, as the sample size grows, the error will always decrease. Some practical
implementations of the Central Limit Theorem include:
1. Voting polls estimate the count of people who support a particular election candidate. The results of news
channels that come with confidence intervals are all calculated using the Central Limit Theorem.
2. The Central Limit Theorem can also be used to calculate the mean family income for a specific region.
Activity 3
Read about how population is measured for your city and how the Central Limit Theorem can help in
counting the large group of population?
Recap
• Preference is an act of choosing of or having special liking for, one person or thing rather than another or others.
• Prejudice means preconceived opinion that is not based on reason or actual experience.
• The difference between prejudice and partiality is that prejudice is a harm, a damage while partiality is preference.
• This partiality, preference and prejudice towards a set of data is called as a bias.
• Statistical bias is anything that leads to a systematic difference between the true parameters of a population and the
statistics used to estimate those parameters.
• In Data Science, bias is a deviation from the expected outcome in the data.
• Bias can be introduced at any stage from defining and capturing the dataset to run the analytics.
• A population is a group of phenomena that have something in common.
• A sample is a smaller unbiased group of members of a population selected to represent the population.
• A random sample is one in which every member of a population has an equal chance of being selected.
• A parameter is a characteristic of a population.
• Probability is all about counting randomness.
• A sampling error is a deviation in the sampled value versus the true population value.
Identifying Patterns 171

