Page 299 - Touhpad Ai
P. 299
plt.ylabel('Number of Passengers')
plt.show()
Output:
Interquartile Range to Detect Outliers in Data
Outliers are data points that differ significantly from the rest of the dataset, and their presence can negatively impact the
analysis results. The Interquartile Range (IQR) is a method used to identify outliers by assessing the spread of data. In this
article, we will explore how it works.
Detecting Outliers with IQR
The Interquartile Range (IQR) is used to measure the spread or variability of a dataset by dividing it into quartiles. The
data is first arranged in ascending order and then split into four equal parts. The values Q1 (25th percentile), Q2 (50th
percentile or median), and Q3 (75th percentile) separate the dataset into these four parts.
For a dataset with 2n or 2n+1 data points:
Q2 is the median of the entire dataset.
u
Q1 is the median of the lowest n data points.
u
Q3 is the median of the highest n data points.
u
The IQR is calculated as:
IQR = Q3 - Q1
Any data point that falls below Q1 - 1.5 × IQR or above Q3 + 1.5 × IQR is considered an outlier.
4. Write a Python program demonstrating how to identify outliers in a dataset and handle them using outlier detection
techniques.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = {
'title': ['Inception', 'Titanic', 'Avatar', 'The Dark Knight', 'The Godfather','The
Shawshank Redemption', 'Pulp Fiction', 'The Lord of the Rings: The Return of the
King',
Practical File 297

