Page 115 - Data Science class 10
P. 115

USE OF STATISTICS IN

                DATA SCIENCE


                                                                                       01








                       Learning Outcome



                     1.1.  What are Subsets?                            1.2.  Some Methods of Subsetting
                     1.3.  One-Way Frequency Table                      1.4.  Two-way frequency Table
                     1.5.  Central Tendency                             1.6.  Standard Deviation



            In the previous classes of data science, you learnt how important data is to our daily lives. You have also learnt
            the importance of organising, gathering, analysing, and visualising data. The two methods used by researchers to
            produce data are observational study and randomised experiment. In both cases, the researcher is examining a
            population, which is a group of experimental subjects or units from which he wants to draw a conclusion.

            A statistic is a quantitative feature of a sample that frequently helps estimate or test the population constraint
            (for example, a sample mean or proportion).  Population, sample, parameter, and statistic are four main terms in
            statistics.
            Data science is a science of exploring available data and utilising it in your day-to-day transactions. It is a field
            that works with and examines large amounts of data to give meaningful information that can be used for making
            decisions and solving problems. Data science contains work in computation, statistics, data mining, analytics and
            programming.

            It all comes down  to applying  the proper  statistical analysis techniques when  processing and  gathering  data
            samples to find patterns and trends. The mean, median and relative frequency, standard deviation, regression,
            hypothesis testing, and sample size determination are the five options available for this analysis.
            In this chapter, you will learn about some statistical concepts including what datasets and their subsets are as well
            as what mean, median, and relative frequency are. We will also look at how these are used in the context of data
            science.


            1.1. WHAT ARE SUBSETS?

            In the real world, we frequently run into circumstances where we have a lot of data available to us while we think
            about any issue. However, we do not need to take into account the complete set of data for analysis. As a result,
            we can choose a specific subset of the data for our research rather than using the entire dataset. This division of a
            small set of data from a large set of data is known as a Subset.
            A set A is called a subset of another set B if all elements of the set A are present in the set B. In
            other words, set A is present inside set B.                                                         B
            If you are bringing in data into R or Python, then you must know about Data Frames.  A Data    A
            Frame is a two-dimensional data structure in which data is arranged in tabular form, i.e., in the
            form of rows and columns. The procedure of selecting a set of desired rows and columns from a
            data frame is known as subsetting.



                                                                               Use of Statistics in Data Science  113
   110   111   112   113   114   115   116   117   118   119   120