Page 51 - Informatics_Practices_Fliipbook_Class12
P. 51

pd.read_csv(): Reads CSV files in Pandas DataFrame. It takes the path of the CSV file as input and returns a
                  Pandas DataFrame object.
                  •   Keyword argument delimiter or sep of pd.read_csv(): Specify the delimiter explicitly.
                  •   Keyword argument usecols of pd.read_csv(): Specify the list of required columns.
                  •   Keyword  argument  dtype of  pd.read_csv()  method:  Specify  the  new  data  types  in  the  form  of  a
                      dictionary (column_name:data_type).



            2.4 Dimensions of a DataFrame

            We know that Pandas DataFrame is a two-dimensional tabular structure. The attribute ndim of Pandas yields the
            number of dimensions of a DataFrame, as shown below:
             >>> import pandas as pd
             >>> #Reading a CSV file into a DataFrame
             >>> groceryDF = pd.read_csv('Grocery.csv')
             >>> print(groceryDF.ndim)
                 2

            When a DataFrame is constructed using a csv file, we do not know the number of rows and the number of columns in
            the DataFrame. The shape attribute of the DataFrame returns a tuple of the form (n_rows, n_cols) as shown
            below:

             >>> print(groceryDF.shape)
                 (8, 4)
             >>> (nRows, nCols) = groceryDF.shape
             >>> print("Number of rows:", nRows)
                 Number of rows: 8
             >>> print("Number of columns:", nCols)
                 Number of columns: 4
            Note that the attribute shape of a DataFrame being a tuple, we could have directly accessed the number of rows in the

            DataFrame groceryDF as groceryDF.shape[0] and the number of columns in the DataFrame as groceryDF.
            shape[1]:
             >>> nRows = groceryDF.shape[0]
             >>> nCols = groceryDF.shape[1]
             >>> print("Number of rows:", nRows)
                 Number of rows: 8
             >>> print("Number of columns:", nCols)
                 Number of columns: 4
            To know the labels along the rows (row labels) and labels used across columns (column names) of the dataframe, we
            use the attributes index and columns, respectively.

             >>> # Retrieve row labels
             >>> print('Row Labels:', groceryDF.index)
                 Row Labels: RangeIndex(start=0, stop=8, step=1)
             >>> # Retrieve column labels
             >>> print('Column names:', groceryDF.columns)
                 Column names: Index(['Product', 'Category', 'Price', 'Quantity'], dtype='object')
            As shown in the output, there are 8 row labels ranging from 0 to 7. Also, there are four columns, namely, 'Product',
            'Category', 'Price', and 'Quantity'.








                                                                             Data Handling using Pandas DataFrame  37
   46   47   48   49   50   51   52   53   54   55   56