Page 67 - Informatics_Practices_Fliipbook_Class12
P. 67
2.7.2 Distinct Values in a Column and their Count of Occurrence
Suppose, the store owner is interested in knowing distinct categories from which the products are being bought. Recall
that a column of a DataFrame is a Pandas series. So, the Series method unique() returns the distinct elements in a
column. The unique values are returned in the order in which they appear in the original DataFrame column and the
missing values (NaN) are ignored in the output.
>>> groceryDF['Category'].unique()
output:
array(['Food', 'Hygiene', 'Household'], dtype=object)
Next, suppose, we want to know the count of the items purchased in each category. The method value_counts()
returns the count of occurrences of each unique value in the Category column of the dataframe groceryDF, as
shown below:
>>> groceryDF['Category'].value_counts()
output:
Food 4
Hygiene 3
Household 1
Name: Category, dtype: int64
In above example, the value_counts() method is applied to the Category column of the DataFrame which has
returned a Series object displaying the count of products brought from each unique category. Sometimes, we are
interested to know only the maximum, minimum, total count, and average count across the categories. For example,
while ignoring rows having NaN value in the column Category, we may want to know:
1. What is the minimum number of purchases across categories?
2. What is the minimum number of purchases across categories?
3. What is the total number of purchases across categories?
4. What is the average number of purchases across categories?
It may be achieved as follows:
>>> counts = groceryDF['Category'].value_counts()
>>> print('Minimum number of purchases across categories:', counts.min())
>>> print('Maximum number of purchases across categories:', counts.max())
>>> print('Total number of valid purchases across categories:', counts.sum())
>>> print('Average number of purchases across categories:', round(counts.mean(), 2))
Minimum number of purchases across categories: 1
Maximum number of purchases across categories: 4
Total number of valid values across categories: 8
Average number of purchases across categories: 2.67
Series method unique() returns the distinct elements in a column. The unique values are returned in the
order in which they appear in the original DataFrame column and the missing values (NaN) are ignored in the
output.
The method value_counts() returns the count of occurrences of each unique value in a column of the
dataframe.
1. Consider the DataFrame employeeDF and identify distinct departments in which employees
of the company are working.
2. Determine the number of employees working in each department.
Data Handling using Pandas DataFrame 53

