Page 63 - Informatics_Practices_Fliipbook_Class12
P. 63

Consider the DataFrame employeeDF and display the summary statistics describing the same.




            2.7.1 Aggregating Data: Calculating summary statistics

            Aggregation plays a vital role in data analysis as it allows us to summarise the large datasets and derive usefu insights.
            Pandas provides various aggregation functions, such as sum(), mean(), max(), and min() which allow us to obtain
            various summary statistics for different groupings of data.

            1.  sum(): Returns the sum of values in a column or across specified axis in a DataFrame.
            2.  mean(): Returns the average value of a column or across specified axis in a DataFrame.
            3.  max(): Returns the maximum value in a column or across specified axis in a DataFrame.
            4.  min(): Returns the minimum value in a column or across specified axis in a DataFrame.

            For example, using the above-mentioned methods, we compute the total price, average price, minimum price or
            maximum price for the grocery DataFrame (groceryDF):

             >>> print("Total Price:", groceryDF['Price'].sum())
             >>> print("Average Price:", groceryDF['Price'].mean())
             >>> print("Maximum Price:", groceryDF['Price'].max())
             >>> print("Minimum Price:", groceryDF['Price'].min())
                 Total Price: 350
                 Average Price: 43.75
                 Maximum Price: 80
                 Minimum Price: 20
            When applied on the entire DataFrame, the above-mentioned methods return the result of aggregation column-wise.
            For example, method min() returns the smallest value (numerical or text) from each column as shown below:
             >>> groceryDF.min()
            output:
                 Product     Biscuit
                 Category       Food
                 Price            20
                 Quantity          1
                 dtype: object
            Note that 'Biscuit' is lexicographically the smallest value in the Prodcut column ('Biscuit' < 'Bournvita'
            < ... < 'Tissue'), 'Food' is the smallest value in the Category column, 20 is the smallest value in the Price
            column, and 1 is the least value in the Quantity column. The min operation is applied columnwise, i.e., along
            axis 0.
             >>> print("Minimum value in each column:")
             >>> print(groceryDF.min(axis = 0))
                 Minimum value in each column:
                 Product     Biscuit
                 Category       Food
                 Price            20
                 Quantity          1
                 dtype: object
            By default axis along which the comparison is to be performed is set to 'index' or 0 which denotes that the
            operation is to be applied column-wise across rows. Alternatively, we may apply the operation row-wise across columns
            by setting the axis to 'columns' or 1 as shown below:

             >>> print("Minimum value in each row:")
             >>> print(groceryDF.min(axis=1))
                 Minimum value in each row:
                                                                             Data Handling using Pandas DataFrame  49
   58   59   60   61   62   63   64   65   66   67   68