Page 156 - Data Science class 11
P. 156
4.2 SOMe StatiStiCaL terMS
Ideally, statisticians would compile data about the entire population through an operation called a census. This is
usually conducted by governmental statistical institutes. This data about the population is summarised via statistics
known as "descriptive statistics." Descriptive statistics provide information about our immediate group of data.
Numerical descriptors like mean and standard deviation for continuous data types (like income) and frequency and
percentage for categorical data (like race) are very useful.
For example, we can calculate the mean and standard deviation of the exam marks for a group of, say, 100 students,
and this can provide valuable information.
Before we proceed any further, let us understand some statistical terms that are used quite frequently in surveys and
sampling methods.
• Population: Any group of data like this, which includes all the data you are interested in, is called a population. A
population can be small or big, as long as it includes all the data of interest. For example, if you are interested in only
the exam marks of 100 students, the 100 students would represent the population.
• Parameters: Properties of populations, like the mean or standard deviation, are called parameters, as these
properties represent the whole population (i.e., everybody you are interested in). In applying statistics to a problem,
it is generally accepted to begin with a population or process to be studied. Populations can be varied topics, like
"all the people living in a country” or “every atom composing a crystal”.
• Descriptive statistics: are applied to both populations and parameters.
• Sample: When a census is not viable, a selected subset of the population, called a sample, is studied. Once a sample
that represents a section of the population is studied, data is collected for the sample members in an observational
or experimental setting.
4.2.1 Central tendency
A measure of central tendency is a single value that attempts to describe a set of data by identifying the central
position within that set of data. Therefore, measures of central tendency are also known as measures of central
location. They can also be classified as summary statistics.
In statistics, we do compare individual scores to the overall group of scores to correctly analyse the result. For this,
measures of central tendency play an important role. Individual scores by themselves may mean little, but when
a comparison of this individual score is made with respect to the other individual scores in the group, a better
understanding of where this value stands amongst other values is established.
For example, if you say you saw an insect of length 9.8 mm, it does not mean anything by itself. However, if you say
that the normal length of the insect is about 6 mm and the maximum recorded length ever is 10.4 mm, then it can be
inferred that you saw a particularly large insect. So, it is necessary to be able to quantify the "normal length" as used
above, and this is what central tendency is all about.
Measures of Central Tendency
There are three measures of central tendency: mean, median and mode.
Each of these measures describes a different indication of the typical or central value in the distribution.
The mean is the measure of central tendency that you are most familiar with. Under different conditions, some
measures of central tendency become more relevant for use than others.
Mean
The mean (or average) is the most popular measure of central tendency. It can be used with both discrete and
continuous data, though it is mostly used with continuous data. The mean is equal to the sum of all the values in the
data set divided by the number of values in that data set.
154 Touchpad Data Science-XI

