Page 220 - Ai_V3.0_c11_flipbook
P. 220
Reboot
1. Fill in the blanks.
a. The two data structures that are supported by Pandas are ………………………. and ………………………. .
b. The ………………………. library in Python excels in creating N-dimension data objects.
c. The statement to install NumPy is ………………………. .
d. You can check the shape of an array by using the ………………………. method in NumPy.
2. Answer the following questions:
a. What is a DataFrame in Pandas?
……………………….……………………….……………………….……………………….……………………..................….……………………….
b. Give one advantage of using NumPy arrays over lists.
……………………….……………………….……………………….……………………….……………………..................….……………………….
Understanding Missing Values
Understanding missing values in a DataFrame is crucial for data analysis, cleaning and preprocessing. Missing values,
often denoted as NaN (Not a Number) or None, can occur due to various reasons such as data entry errors, incomplete
data, or data transformation processes. For example, while reviews for a product online, some customers may not
provide feedback on every aspect of the product if they did not use all of its features. Dealing with missing values is an
essential step in data cleaning and preprocessing.
The isnull() function in Pandas is used to detect missing or NaN (Not a Number) values within a DataFrame. It returns
a DataFrame of the same shape as the original DataFrame, where each element is a boolean value indicating whether it's
missing (True) or not (False). The isnull() function returns True for missing values and False for non-missing values.
Employee Name Salary Employee Name Salary
Rohit 50000 False False
Pankaj 52000 isnull() False False
Vivan NaN False True
Chirag 53000 False False
Sanjay NaN False True
You can count the number of missing values in each column or row.
• • df.isnull().sum(): Returns the number of missing values in each column.
• • df.isnull().sum(axis=1): Returns the number of missing values in each row.
There are several ways by which you can handle missing values in DataFrame:
• • Imputation: Imputation involves replacing missing values with a specific value. Common strategies include replacing
missing values with the mean, median, or mode of the column. This method helps in retaining the structure of the
dataset and avoids losing valuable information.
• Pandas provides methods like fillna() to perform imputation. For example, you can fill the missing values in a
•
DataFrame df with the mean of each column using df.fillna(df.mean()).
218 Touchpad Artificial Intelligence (Ver. 3.0)-XI

