Page 63 - Informatics_Practices_Fliipbook_Class12
P. 63
Consider the DataFrame employeeDF and display the summary statistics describing the same.
2.7.1 Aggregating Data: Calculating summary statistics
Aggregation plays a vital role in data analysis as it allows us to summarise the large datasets and derive usefu insights.
Pandas provides various aggregation functions, such as sum(), mean(), max(), and min() which allow us to obtain
various summary statistics for different groupings of data.
1. sum(): Returns the sum of values in a column or across specified axis in a DataFrame.
2. mean(): Returns the average value of a column or across specified axis in a DataFrame.
3. max(): Returns the maximum value in a column or across specified axis in a DataFrame.
4. min(): Returns the minimum value in a column or across specified axis in a DataFrame.
For example, using the above-mentioned methods, we compute the total price, average price, minimum price or
maximum price for the grocery DataFrame (groceryDF):
>>> print("Total Price:", groceryDF['Price'].sum())
>>> print("Average Price:", groceryDF['Price'].mean())
>>> print("Maximum Price:", groceryDF['Price'].max())
>>> print("Minimum Price:", groceryDF['Price'].min())
Total Price: 350
Average Price: 43.75
Maximum Price: 80
Minimum Price: 20
When applied on the entire DataFrame, the above-mentioned methods return the result of aggregation column-wise.
For example, method min() returns the smallest value (numerical or text) from each column as shown below:
>>> groceryDF.min()
output:
Product Biscuit
Category Food
Price 20
Quantity 1
dtype: object
Note that 'Biscuit' is lexicographically the smallest value in the Prodcut column ('Biscuit' < 'Bournvita'
< ... < 'Tissue'), 'Food' is the smallest value in the Category column, 20 is the smallest value in the Price
column, and 1 is the least value in the Quantity column. The min operation is applied columnwise, i.e., along
axis 0.
>>> print("Minimum value in each column:")
>>> print(groceryDF.min(axis = 0))
Minimum value in each column:
Product Biscuit
Category Food
Price 20
Quantity 1
dtype: object
By default axis along which the comparison is to be performed is set to 'index' or 0 which denotes that the
operation is to be applied column-wise across rows. Alternatively, we may apply the operation row-wise across columns
by setting the axis to 'columns' or 1 as shown below:
>>> print("Minimum value in each row:")
>>> print(groceryDF.min(axis=1))
Minimum value in each row:
Data Handling using Pandas DataFrame 49

