Page 49 - Informatics_Practices_Fliipbook_Class12
P. 49
2.3.2 Skipping Columns
Sometimes, all the columns in a CSV file may not be relevant for an application. For this purpose, the keyword usecols
is used to specify the list of required columns. For example, suppose we wish to construct a DataFrame comprising
only the first column Product (column at index 0) and Price (column at index 2). This can be achieved by specifying
the list of required columns using the keyword argument usecols, as shown below:
>>> import pandas as pd
>>> groceryDF = pd.read_csv('Grocery.csv', usecols=[0, 2])
>>> print(groceryDF)
output:
Product Price
0 Bread 20
1 Milk 60
2 Biscuit 20
3 Bourn-Vita 70
4 Soap 40
5 Brush 30
6 Detergent 80
7 Tissues 30
Alternatively, the list of columns (in the CSV file) to be retained in the DataFrame may be specified in the form of
column names, as shown below.
>>> import pandas as pd
>>> groceryDF = pd.read_csv('Grocery.csv', usecols=['Product', 'Price'])
>>> print(groceryDF)
output:
Product Price
0 Bread 20
1 Milk 60
2 Biscuit 20
3 Bourn-Vita 70
4 Soap 40
5 Brush 30
6 Detergent 80
7 Tissues 30
Sometimes, the CSV file may have some rows that do not contain the relevant information. Obviously such information
should not form part of Pandas DataFrame. For this purpose, the keyword skiprows is used to specify the list of
irrelevant rows. For example, to skip the second, fourth, and fifth row while reading the CSV file in the DataFrame, we
specify skiprows = [2, 4, 5], as shown below:
>>> import pandas as pd
>>> groceryDF = pd.read_csv('Grocery.csv', skiprows=[2, 4, 5])
>>> print(groceryDF)
output:
Product Category Price Quantity
0 Bread Food 20 2
1 Biscuit Food 20 2
2 Brush Hygiene 30 2
3 Detergent Household 80 1
4 Tissues Hygiene 30 5
Note that the row 0 refers to the first row. In case of the file Grocery.csv, the first row comprises the names of the
columns (header) row of csv file. The content of the file Grocery.csv is shown below:
Product,Category,Price,Quantity
Bread,Food,20,2
Data Handling using Pandas DataFrame 35

