Page 168 - Data Science class 10
P. 168

People naturally tend to base decisions on information that is already available to us or things we hear about often
        without looking at alternatives that might be useful. We thereby confine ourselves to a relatively narrow subset of
        information.

        How can you recognise, that the data which you are going to use, is biased?
        If you notice the following, the source may be biased:
        1.  Heavily opinionated or one-sided

        2.  Relies on unsupported or unsubstantiated claims
        3.  Presents highly selected facts that lean to a certain outcome
        4.  Pretends to present facts, but offers only opinion

        5.  Uses extreme or inappropriate language
        6.   Data coming from any organisation which are religion based, belong to a particular cast and race or creed or a
           political party


        3.4. PROBABILITY FOR STATISTICS
        Probability is all about counting randomness. It is the fundamentals of how statistical predictions are made. We can
        use probability to predict how likely or unlikely particular events may be. We can also, if needed, consider informal
        predictions beyond the scope of the data which we have analysed.  In statistics, probability is a very important tool.
        There are two problems and nature of their solution that will illustrate the difference.

        When a dice is thrown, the possible outcomes are 1, 2, 3, 4, 5 and 6.
        The sample space will be S = {1,2,3,4,5,6}.
        Probability of an event E is given by the formula


                                                Number of favourable outcomes
                                          P(E)=
                                                   Total number of outcomes
        Where total number of outcomes is the number of elements in the sample space. Prime numbers are the numbers
        that have factors as 1 and that number itself. To determine the chances of receiving a prime number, we have to
        check the number of prime numbers in the sample space and substitute the values in the formula for probability.

        3.4.1. Populations, Samples, Parameters, and Statistics
        The field of inferential statistics enables you to make educated guesses about the numerical characteristics of large
        groups. You may test generalisations about these groups using only a tiny sample of their members according to
        the logic of sampling.
        Often, researchers want to know things about populations but do not have data for every person or thing in the
        population. If a business's customer service department wants to find out whether or not its clients were happy, it
        would not be practical (or perhaps even possible) to contact every individual who purchased a product. Instead,
        the company might select a sample of the population.
        A sample is an unbiased, objective group of people chosen to represent the entire population. In order to use
        statistics to learn things about the population, the sample must be random. Every member of a population has
        an equal probability of being chosen in a random sampling. The most commonly used sample is a simple random
        sample. It requires that every possible sample of the selected size has an equal chance of being used.
        A parameter is a characteristic of a population. A statistic is a characteristic of a sample. Inferential statistics let
        you make an informed prediction about a population parameter based on a statistic computed from a sample
        randomly drawn from that population.




          166   Touchpad Data Science-X
   163   164   165   166   167   168   169   170   171   172   173