Page 164 - Data Science class 10
P. 164
Bias is the tendency to favour one thought over another and maybe to ignore competing ideas.
Activity 1
Look up the words ‘prejudice’ and ‘bias’ in a dictionary or search engine and write their definitions.
Here are some examples of prejudice and bias.
The report blames most crime in the town on teenagers, without any evidence, as the writer is prejudiced against
young people.
Somebody’s aunt is biased towards dogs that are black, like her own, and she is always more friendly to them than
to other dogs.
An article biased towards riding a motorcycle would show facts about the good gas mileage, fun, and agility.
3.2.4. Bias in Data Science Applications
Bias in data science is a deviation from the predicted result in the data. Fundamentally, you can also call bias as
error in the data. However, it is observed that this error is indistinct and goes unnoticed. So, question to be asked
is 'Why does the bias occur in first place?'.
Bias basically occurs because of sampling and estimation. If we had complete knowledge of every entity in our
database and had information on every potential entity, our data would never have any bias. However, data science
is often not conducted in carefully controlled conditions. It is mostly done of the “found data”, i.e., the information
gathered without regard to modelling. That is the reason why this data is very likely to have biases. Next question
that may arise in your mind is 'Why does the bias really matter?'. Well, the answer is that predictive models often
consider only the data that is used for training. In fact, they know no other reality other than the data that is fed
in their system. Naturally, model accuracy and fidelity are affected if biased data are fed into the system. Biased
models can also tend to discriminate against certain groups of people. Therefore, it is very important to eliminate
the bias to avoid these risks.
The data used to train the machine learning models is a major source of these biases. The fact is that almost all big
datasets, generated by systems powered by ML/AI-based models are known to be biased.
3.3. HOW TO IDENTIFY THE PARTIALITY, PREFERENCE AND PREJUDICE?
For better results and trade practice our method must be free from all sorts of biases.
Common statistical and cognitive bias can be grouped into the following categories:
1. Selection Bias 4. Recall Bias
2. Linearity Bias 5. Survivor Bias
3. Confirmation Bias 6. Availability bias
3.3.1. Selection Bias
Sample selection bias is a type of bias caused by choosing non-random data for statistical analysis. The bias results
from a sample selection problem when a section of the data is systematically disregarded because of a certain
trait. The exclusion of the subset can influence the statistical significance of the test, and it can bias the estimates
of parameters of the statistical model.
Maintaining the old sample puts bias into the results. However, selection bias can be mitigated with the help
of various strategies. When the data sample is created, the sampling strategy should be documented, and any
constraints of the procedure ought to be properly expressed. This documentation will highlight the probability of
bias in selection after the model has been developed and used.
162 Touchpad Data Science-X

