Page 62 - Informatics_Practices_Fliipbook_Class12
P. 62
Sometimes, we need summary statistics only on a specific column. For example, we may examine obtain the summary
statistics on Price by selecting the Price column as follows, as follows:
>>> groceryDF.describe()['Price']
output:
count 8.000000
mean 43.750000
std 23.260942
min 20.000000
25% 27.500000
50% 35.000000
75% 62.500000
max 80.000000
Name: Price, dtype: float64
By default, the describe() method provides summary of the columns having only numerical values. However,
we can include the summary information about the columns comprising aribitrary object type values, by specifying
include = 'all' as the keyword argument. For each non-numeric column, the summary information includes:
1. Count: It denotes the number of non-null values in the column. As mentioned above, Null values serve as pointer
to missing data for the data analyst.
2. Unique: It indicates the number of unique values in the column, thus indicating the level of diversity in the
categorical data.
3. Top: It denotes the most frequent value (called mode) in the column, i.e., the value that occurs most often in the
column.
4. Frequency: It denotes the frequency of the most frequent value in each column.
>>> groceryDF.describe(include='all')
output:
Product Category Price Quantity
count 8 8 08.000000 8.000000
unique 8 3 NaN NaN
top Bread Food NaN NaN
freq 1 4 NaN NaN
mean NaN NaN 43.750000 2.750000
std NaN NaN 23.260942 1.669046
min NaN NaN 20.000000 1.000000
25% NaN NaN 27.500000 1.750000
50% NaN NaN 35.000000 2.000000
75% NaN NaN 62.500000 4.250000
max NaN NaN 80.000000 5.000000
Note that in the column Category, Food appears most frequently (4 times). So, it is shown as the top value in the
column. However, the column Product, each value appears only once. So, the first value Bread is shown as the top
value in the column. Further, note that the count of items in the columns Price and Quantity is shown as floating
point numbers. Indeed, as mean, std, and the quartile values are floating point values, the type of the entire column
in the DataFrame groceryDF.describe() is set as Float, as shown below:
>>> groceryDF.describe().dtypes
output:
Price float64
Quantity float64
dtype: object
Pandas provide with a powerful method describe() which returns a DataFrame comprising the statistics- count,
mean, standard deviation, minimum, maximum, and quartiles, about each column in the DataFrame that store
numerical data. By specifying include = 'all' as the keyword argument, the summary information includes
count, unique, top, and frequency statistics for each non-numeric column.
48 Touchpad Informatics Practices-XII

