Page 115 - Data Science class 10
P. 115
USE OF STATISTICS IN
DATA SCIENCE
01
Learning Outcome
1.1. What are Subsets? 1.2. Some Methods of Subsetting
1.3. One-Way Frequency Table 1.4. Two-way frequency Table
1.5. Central Tendency 1.6. Standard Deviation
In the previous classes of data science, you learnt how important data is to our daily lives. You have also learnt
the importance of organising, gathering, analysing, and visualising data. The two methods used by researchers to
produce data are observational study and randomised experiment. In both cases, the researcher is examining a
population, which is a group of experimental subjects or units from which he wants to draw a conclusion.
A statistic is a quantitative feature of a sample that frequently helps estimate or test the population constraint
(for example, a sample mean or proportion). Population, sample, parameter, and statistic are four main terms in
statistics.
Data science is a science of exploring available data and utilising it in your day-to-day transactions. It is a field
that works with and examines large amounts of data to give meaningful information that can be used for making
decisions and solving problems. Data science contains work in computation, statistics, data mining, analytics and
programming.
It all comes down to applying the proper statistical analysis techniques when processing and gathering data
samples to find patterns and trends. The mean, median and relative frequency, standard deviation, regression,
hypothesis testing, and sample size determination are the five options available for this analysis.
In this chapter, you will learn about some statistical concepts including what datasets and their subsets are as well
as what mean, median, and relative frequency are. We will also look at how these are used in the context of data
science.
1.1. WHAT ARE SUBSETS?
In the real world, we frequently run into circumstances where we have a lot of data available to us while we think
about any issue. However, we do not need to take into account the complete set of data for analysis. As a result,
we can choose a specific subset of the data for our research rather than using the entire dataset. This division of a
small set of data from a large set of data is known as a Subset.
A set A is called a subset of another set B if all elements of the set A are present in the set B. In
other words, set A is present inside set B. B
If you are bringing in data into R or Python, then you must know about Data Frames. A Data A
Frame is a two-dimensional data structure in which data is arranged in tabular form, i.e., in the
form of rows and columns. The procedure of selecting a set of desired rows and columns from a
data frame is known as subsetting.
Use of Statistics in Data Science 113

