Page 67 - Informatics_Practices_Fliipbook_Class12
P. 67

2.7.2 Distinct Values in a Column and their Count of Occurrence

            Suppose, the store owner is interested in knowing distinct categories from which the products are being bought. Recall
            that a column of a DataFrame is a Pandas series. So, the Series method unique() returns the distinct elements in a
            column. The unique values are returned in the order in which they appear in the original DataFrame column and the
            missing values (NaN) are ignored in the output.

             >>> groceryDF['Category'].unique()
            output:
                 array(['Food', 'Hygiene', 'Household'], dtype=object)
            Next, suppose, we want to know the count of the items purchased in each category. The method value_counts()
            returns the count of occurrences of each unique value in the Category column of the dataframe groceryDF, as
            shown below:
             >>> groceryDF['Category'].value_counts()
            output:
                 Food         4
                 Hygiene      3
                 Household    1
                 Name: Category, dtype: int64
            In above example, the value_counts() method is applied to the Category column of the DataFrame which has
            returned a Series object displaying the count of products brought from each unique category. Sometimes, we are
            interested to know only the maximum, minimum, total count, and average count across the categories. For example,
            while ignoring rows having NaN value in the column Category, we may want to know:
            1.  What is the minimum number of purchases across categories?
            2.  What is the minimum number of purchases across categories?

            3.  What is the total number of purchases across categories?
            4.  What is the average number of purchases across categories?
            It may be achieved as follows:

             >>> counts = groceryDF['Category'].value_counts()
             >>> print('Minimum number of purchases across categories:', counts.min())
             >>> print('Maximum number of purchases across categories:', counts.max())
             >>> print('Total number of valid purchases across categories:', counts.sum())
             >>> print('Average number of purchases across categories:', round(counts.mean(), 2))
                 Minimum number of purchases across categories: 1
                 Maximum number of purchases across categories: 4
                 Total number of valid values across categories: 8
                 Average number of purchases across categories: 2.67

                  Series  method  unique()  returns  the  distinct  elements  in  a  column.  The  unique  values  are  returned  in  the
                  order in which they appear in the original DataFrame column and the missing values (NaN) are ignored in the
                  output.
                  The method  value_counts()  returns  the  count  of  occurrences  of  each  unique  value  in  a  column  of  the
                  dataframe.




                     1.   Consider  the  DataFrame  employeeDF  and  identify  distinct  departments  in  which  employees
                       of the company are working.
                     2.  Determine the number of employees working in each department.





                                                                             Data Handling using Pandas DataFrame  53
   62   63   64   65   66   67   68   69   70   71   72