Page 61 - Informatics_Practices_Fliipbook_Class12
P. 61
1. Consider the groceryDF DataFrame and retrieve the details of the products (rows) which are from
the Clothes category.
2. Consider the employeeDF DataFrame and retrieve the following data:
• Details of employees earning more than 90000
• Details of employees working in Accounts department
2.7 Descriptive Statistics
Oftentimes, we are interested in summary information about the data stored in a DataFrame. This information includes
statistics such as number of values in the DataFrame, mean, median, and standard deviation. Pandas provide with a
powerful method describe() which returns a DataFrame comprising the following statistics about each column
in the DataFrame that store numerical data:
1. Count: It denotes the number of non-null values. Null values often provide a clue to the data analyst for taking a
suitable action. For example, the null values may simply be ignored or replaced by average value in the column.
2. Mean: It denotes the average value of numbers in a column. As average represents the centre point of the data, it
is called a measure of central tendency of the data.
3. Standard Deviation: It measures the spread or dispersion of the values around the mean. A higher standard
deviation indicates greater variability in the data. A low value of the standard deviation indicates that most of the
values are close to the mean.
4. Minimum and Maximum: As indicated by the terms minimum and maximum, they denote the minimum and
maximum values in a column, respectively. Together, these minimum and maximum values indicate the interval
from which the values in a column are drawn.
5. Quartiles: 25%, 50%, and 75% represents the first quartile (25th percentile), median (50th percentile), and third
quartile (75th percentile), respectively. 25%, 50%, 75% quartile values are often denoted by Q1, Q2, and Q3
respectively. Q1 indicates that 25% of the values do not exceed Q1. Similarly, Q2 indicates that 50% of the values
do not exceed Q2. Therfore, Q2 is also called the median. Finally, Q3 indicates that 75% of the values do not exceed
Q3. These values help understand the data distribution and identify potential outliers.
>>> groceryDF = pd.read_csv('Grocery.csv')
>>> print(groceryDF)
Product Category Price Quantity
0 Bread Food 20 2
1 Milk Food 60 5
2 Biscuit Food 20 2
3 Bourn-Vita Food 70 1
4 Soap Hygiene 40 4
5 Brush Hygiene 30 2
6 Detergent Household 80 1
7 Tissues Hygiene 30 5
type(summary): <class 'pandas.core.frame.DataFrame'>
>>> print(groceryDF.describe())
output:
Price Quantity
count 8.000000 8.000000
mean 43.750000 2.750000
std 23.260942 1.669046
min 20.000000 1.000000
25% 27.500000 1.750000
50% 35.000000 2.000000
75% 62.500000 4.250000
max 80.000000 5.000000
Data Handling using Pandas DataFrame 47

