Page 99 - Informatics_Practices_Fliipbook_Class12
P. 99

The syntax for using the function median() is:

                  df.median()
                  Standard Deviation
                  The standard deviation measures the amount of variation or dispersion in a set of values.
                  The std() function computes  and returns the standard deviation along the requested axis.
                  The syntax for using the std() function is:

                  df.std()

                  Variance
                  Variance is a statistical measure of the dispersion of values in a dataset, calculated as the average of the squared differences
                  from the mean.
                  The var() function computes variance and returns the variance over the requested axis. The syntax for using the var()
                  function is:

                  df.var()
              13.  What do you understand by the term MODE? Name the function which is used to calculate it.
             Ans.   The mode is the value(s) that appear most frequently in a dataset. It represents the central tendency of values that occur
                  with the highest frequency.  In Pandas, you can calculate the mode using the df. mode() function.
              14.  Write the purpose of Data aggregation.
             Ans.   Aggregation allows us to summarize the large datasets and derive usefu insights in data analysis. Pandas provides various
                  aggregation functions, such as sum(), mean(), max(), and min() which allow us to obtain various summary statistics
                  for differeent groupings of data
              15.  Explain the concept of GROUP BY with help on an example.
             Ans.   In Pandas, the groupby operation is used to split a DataFrame into groups based on some criteria and then apply a
                  function to each group independently. The result is usually a new DataFrame or a Series with aggregated data.
                  Here's a brief explanation with an example:
                  Consider a DataFrame with information about students and their scores in different subjects:
                  import pandas as pd
                  data = {
                      'Subject': ['Math', 'English', 'Math', 'English', 'Math', 'English'],
                      'Student': ['Alice', 'Bob', 'Alice', 'Bob', 'Alice', 'Bob'],
                      'Score': [80, 75, 85, 90, 88, 92]
                  }


                  df = pd.DataFrame(data)
                  print(df)

                         Subject     Student    Score

                   0       Math       Alice       80
                   1     English       Bob        75
                   2       Math       Alice       85
                   3     English       Bob        90
                   4       Math       Alice       88
                   5     English       Bob        92






                                                                             Data Handling using Pandas DataFrame  85
   94   95   96   97   98   99   100   101   102   103   104