Page 66 - Informatics_Practices_Fliipbook_Class12
P. 66

output:
                         English  Mathematics        Economics      History      Psychology            Name
              RollNo
              301              78              90            92           86               79          Arya
              302              68              88            78           77               70        Rashmi
              303              57              65            55           60               62         Naira
              304              45              40            50           55               51     Samridhi
        Note that function call studentDF.max(axis=1) still computes the maximum marks across the subjects for each
        student as shown below. It is so because the string column Name is ignored while computing maximum value.
         >>> studentDF.max(axis=1)
                <ipython-input-118-49c2fff99bae>:1: FutureWarning: Dropping of nuisance columns in
              DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version
              this will raise TypeError.  Select only valid columns before calling the reduction.
                studentDF.max(axis=1)
        output:
              RollNo
              301    92
              302    88
              303    65
              304    55
              dtype: int64
        Suppose, a unversity admits the students based on their marks in four subjects (including English) in which they
        performed the best in the board examination. So, to award admission to an undergraduate course, we need to find the
        marks of the students in the best performed 4 subjects including English. This may be achieved as follows:
        1.   Create a single column DataFrame engMarks comprising the marks obtained by the students in English.
        2.   Write  a  function  getBest3  which  arranges  the  DataFrame  studentDF  in  descending  order  by  setting
           ascending=False and picking top 3 values.

        3.   Concatenate the DataFrames engMarks and best3Marks, row wise to yield best4Marks.
         >>> engMarks = studentDF['English']
         >>> def getBest3(df):
         >>>   return df.sort_values(ascending=False)[:3]
         >>>  best3Marks = studentDF[['Mathematics', 'Economics', 'History', 'Psychology']].
              apply(getBest3, axis=1)
         >>> best4Marks = pd.concat( [engMarks, best3Marks], axis = 1)
         >>> print(best4Marks)
                      English  Economics  History  Mathematics  Psychology
              RollNo
              301          78       92.0     86.0         90.0         NaN
              302          68       78.0     77.0         88.0         NaN
              303          57        NaN     60.0         65.0        62.0
              304          45       50.0     55.0          NaN        51.0

               Pandas provides various aggregation functions, such as sum(), mean(), max(), and min() which allow us to
               obtain various summary statistics for different groupings of data.
               Default axis along which the comparison is to be performed is set to 'index' or 0 which denotes that the
               operation is to be applied column-wise across rows. Alternatively, we may apply the operation row-wise across
               columns by setting the axis to 'columns' or 1.




                 Consider the DataFrame employeeDF and apply the aggregate functions sum(), mean(), max(),
                 and min() to the Salary column.


          52   Touchpad Informatics Practices-XII
   61   62   63   64   65   66   67   68   69   70   71