Page 188 - Touhpad Ai
P. 188
10. What does the sns.boxplot() function do in seaborn library?
Ans. The sns.boxplot() function creates a box plot to show the distribution and outliers in the data.
B. Long answer type questions.
1. Explain with an example how data visualization makes complex data easy to understand.
Ans. Imagine trying to understand rainfall patterns across different states in India over the last 20 years using
only a table filled with numbers. It would take hours to figure out which states receive heavy rainfall and
which face droughts. Now, imagine the same data shown on a color-coded map of India—dark blue for
high rainfall and yellow for low. Within seconds, the pattern becomes clear. That’s the power of data
visualization.
2. How did the government use data visualization during the COVID-19 pandemic?
Ans. During the pandemic in India, government dashboards used graphs and maps to show real-time
updates on vaccination rates across states. Without such visual, it would have been very difficult for
both citizens and authorities to track updates and make timely decisions. That is why data visualization
is so important today.
3. What are some common steps involved in cleaning and preparing data?
Ans. The following steps are taken to clean/prepare the data:
u Missing Data: Missing data refers to the absence of certain values in the dataset, which can result
from various causes. To handle missing data, strategies include removing rows or columns with missing
values, imputing missing values with estimates, or utilising algorithms that can manage missing data.
u Outliers (extreme values): Outliers are data points that deviate significantly from most of the dataset,
typically due to errors or uncommon occurrences. Managing outliers includes detecting and excluding
them, transforming the data, or applying robust statistical techniques to minimise their influence.
u Inconsistent Data: Inconsistent data, such as typographical errors or variations in data types, is
rectified to ensure uniformity and coherence across the dataset.
u Duplicate Data: Duplicate data is identified and eliminated to maintain data integrity and accuracy.
4. What are two widely used methods for dimensionality reduction?
Ans. The two widely used methods for dimensionality reduction are:
u Principal Component Analysis (PCA): It transforms data into a new coordinate system where the
largest variance lies along the first axis (principal component). It captures the most important patterns
in fewer dimensions.
u Linear Discriminant Analysis (LDA): It focuses on maximising class separability, commonly used in
classification tasks.
5. What are features in data science, and why is feature selection important?
Ans. In data science, the terms features, attributes, and variables are often used interchangeably to describe
the measurable properties of data. A feature represents one aspect of an observation. For example, in a
dataset about houses, features could include number of rooms, square footage, location, and price.
Features can be quantitative (numeric, like price or height) or qualitative (categorical, like colour or brand).
The choice of features determines the dimensionality and complexity of the dataset.
The selection of meaningful features is called feature selection, and it plays a crucial role in data analysis
and model performance. Irrelevant or redundant features increase dimensionality without adding useful
information, making computation slower and visualization harder.
186 Touchpad Artificial Intelligence - XI

