Page 166 - Data Science class 10
P. 166

Putting all vehicle classes together, the trend is nearly linear for engine sizes under 4.5 litres, but then the relationship
        between fuel efficiency and engine size is (unexpectedly) nonlinear. Numerous company operations in the actual
        world are quite nonlinear. For example, time series (such as sales or demand over time) are usually cyclical. This
        nonlinearity can have serious consequences or tangible benefits, depending upon the context. The main cause
        of stakeholders being frequently shocked by real facts is linearity bias. Even experienced experts in a field may
        assume a business process is linear (like the units sold versus discount rate) when in fact it is not. So getting it right
        can provide an advantage.
        Making a graph is the easiest (and most effective) way to demonstrate the impacts of linearity bias. The visual
        results may surprise those whose intuition says the quantities should be linear, but it is hard to argue with; the
        graph is evidence of the real, underlying relationship found in the data.


        3.3.3. Confirmation Bias
        Perception has a direct and literal impact during the analysis of data. This belief results in a phenomenon known as
        confirmation bias, which can distort the facts. Confirmation bias is something which does not happen due to the
        lack of data availability. It is a phenomenon wherein data scientists or analysts tend to lean towards data that is in
        alignment with their beliefs, views, and opinions.
        They often focus knowledge from facts that expedites their proposal or hypothesis while filtering information;
        the moment they discover information that even marginally refutes their speculation, they turn away from it.
        Information that doesn't meet a data scientist's predefined view must be discarded.
        It is important to take in new data with an open mind. This phenomenon is progressively normal among authoritative
        organisations who want to assign importance to their own perceptions. Confirmation bias frequently results in
        poor business outcomes, which is the reason you should pay special attention to non-confirming proof.





           Just because two variables move in tandem doesn’t necessarily mean that one causes the other. This principle has
           been hilariously demonstrated by numerous examples.

        An old joke says that if you torture the data long enough, it will confess. With enough work, you can distort data
        to make it say what you want it to say.
        Everybody has beliefs, and that's okay. It all comes with being a person. What’s not OK, though, is when we let
        those beliefs inadvertently come into the way we form our hypotheses.
        By looking at fire department data, you notice that, as more firefighters are dispatched to a fire, the more damage
        is ultimately done to a property. You may conclude that more firemen are inflicting greater harm as a result.

        In another famous example, an academic who was investigating the cause of crime in New York City in the 1980s
        found a strong correlation between the number of serious crimes committed and the amount of ice cream sold
        by street vendors. But, should we conclude that eating ice cream drives people to crime? We should naturally be
        suspicious that there was an unobserved variable causing both because this makes little sense. During the summer,
        crime rates are the highest, and this is also when most ice cream is sold. Ice cream sales don’t cause crime, nor
        does crime increase ice cream sales.
        Recommendation to Overcome: One way to fight this bias is to critically examine all your beliefs and try to find
        disconcerting evidence about each of your theories. By that, I mean actively seeking out evidence by going to places
        where you don’t normally go, talking to people you don’t normally talk to, and generally keeping an open mind.

        3.3.4. Recall Bias

        Recall Bias is a type of measurement bias. It frequently occurs during the data labelling phase of any project.
        When you inconsistently classify comparable types of data, you have this form of bias. Thus, resulting in lower
        accuracy. For example, let us say we have a team labelling images of damaged laptops. The damaged laptops are
        tagged across labels as damaged, partially damaged, and undamaged. Now, if a team member marks an image as
        damaged and another one that is identical to it as somewhat damaged, your data will obviously be inconsistent.

          164   Touchpad Data Science-X
   161   162   163   164   165   166   167   168   169   170   171