Page 49 - Informatics_Practices_Fliipbook_Class12
P. 49

2.3.2 Skipping Columns

            Sometimes, all the columns in a CSV file may not be relevant for an application. For this purpose, the keyword usecols
            is used to specify the list of required columns. For example, suppose we wish to construct a DataFrame comprising
            only the first column Product (column at index 0) and Price (column at index 2). This can be achieved by specifying
            the list of required columns using the keyword argument usecols, as shown below:
             >>> import pandas as pd
             >>> groceryDF = pd.read_csv('Grocery.csv', usecols=[0, 2])
             >>> print(groceryDF)
            output:
                       Product  Price
                 0       Bread     20
                 1        Milk     60
                 2     Biscuit     20
                 3  Bourn-Vita     70
                 4        Soap     40
                 5       Brush     30
                 6   Detergent     80
                 7     Tissues     30

            Alternatively, the list of columns (in the CSV file) to be retained in the DataFrame may be specified in the form of
            column names, as shown below.
             >>> import pandas as pd
             >>> groceryDF = pd.read_csv('Grocery.csv', usecols=['Product', 'Price'])
             >>> print(groceryDF)
            output:
                       Product  Price
                 0       Bread     20
                 1        Milk     60
                 2     Biscuit     20
                 3  Bourn-Vita     70
                 4        Soap     40
                 5       Brush     30
                 6   Detergent     80
                 7     Tissues     30

            Sometimes, the CSV file may have some rows that do not contain the relevant information. Obviously such information
            should not form part of Pandas DataFrame. For this purpose, the keyword skiprows is used to specify the list of
            irrelevant rows. For example, to skip the second, fourth, and fifth row while reading the CSV file in the DataFrame, we
            specify skiprows = [2, 4, 5], as shown below:

             >>> import pandas as pd
             >>> groceryDF = pd.read_csv('Grocery.csv', skiprows=[2, 4, 5])
             >>> print(groceryDF)
            output:
                      Product   Category  Price  Quantity
                 0      Bread       Food     20         2
                 1    Biscuit       Food     20         2
                 2      Brush    Hygiene     30         2
                 3  Detergent  Household     80         1
                 4    Tissues    Hygiene     30         5

            Note that the row 0 refers to the first row. In case of the file Grocery.csv, the first row comprises the names of the
            columns (header) row of csv file. The content of the file Grocery.csv is shown below:
                 Product,Category,Price,Quantity

                 Bread,Food,20,2

                                                                             Data Handling using Pandas DataFrame  35
   44   45   46   47   48   49   50   51   52   53   54