Page 171 - Data Science class 10
P. 171

2.   Selection Error: Selection error occurs when the survey is self-selected, or when only those participants who
               are interested in the survey respond to the questions. Researchers can attempt to overcome selection error by
               finding ways to encourage participation.
            3.  Sample Frame Error: A sample frame error occurs when a sample is selected from the wrong population data.
            4.   Non-response Error: A non-response error occurs when a useful response is not obtained from the surveys
               because researchers were unable to contact  potential respondents (or  potential respondents refused to
               respond).

            3.5.2. Eliminating Sampling Errors

            By increasing the sample size, sampling mistakes can be reduced. As the sample size increases, the sample gets
            closer to the actual population, which decreases the potential for deviations from the actual population. Consider
            that the average of a sample of 10 varies more than the average of a sample of 100. Additionally, measures can be
            taken to guarantee that the sample accurately reflects the overall population.

            Researchers might attempt to reduce sampling  errors  by replicating their study. This might be achieved by
            continually taking the same measurements, using more than one subject or multiple groups, or by undertaking
            multiple studies.

            Random sampling is an additional way to minimize the occurrence of sampling errors. Random sampling establishes
            a systematic approach to selecting a sample. For example, rather than choosing participants to be interviewed
            haphazardly, a researcher might choose those whose names appear 1st, 10th, 20th, 30th, 40th, and so on, on the list.


            3.6. THE CENTRAL LIMIT THEOREM

            Regardless of how the population distribution is shaped, the Central Limit Theorem asserts that sample distributions
            tend to resemble normal distributions as sample sizes increase. The Central Limit Theorem is a statistical theory
            stating that given a significantly large sample size from a population with finite variance all samples taken from
            the same population will have a mean that is nearly equal to the population mean. This holds true regardless of
            whether the source population is normal or skewed provided that the sample size is significantly large.
            Let us now understand the Central Limit Theorem with the help of an example.

            Consider that there are 50 houses in your area. And each house has 5 people. Our task is to calculate average
            weight of people in your area. The usual approach that majority follow is:

            1.  Measure the weights of all people in your area
            2.  Add all the weights
            3.   Divide the total sum of weights with the total number of people to calculate the average However, the question
               over here is, what if the size of data is enormous?

            Does this way of calculating the average make sense? Of course, the answer is no. It will take a long time and be
            quite exhausting to weigh everyone. As a workaround, we have an alternative approach that we can take.

            1.   Draw groups of individuals at random from your area to start. We'll refer to this as a sample. We will draw
               multiple samples in this case, each consisting of 30 people.

            2.  Calculate the individual mean of each sample set.
            3.  Calculate the mean of these sample means.

            4.  To add up to this, a histogram of sample mean weights of people will resemble a normal distribution.
            This is what the Central Limit Theorem is all about. Now let us move ahead and understand what the formula for
            the central limit theorem is. The most used formula is:
                                                             m  = m
                                                              x

                                                                                          Identifying Patterns   169
   166   167   168   169   170   171   172   173   174   175   176