Page 79 - Informatics_Practices_Fliipbook_Class12
P. 79

CSV files can be read in Pandas DataFrame using the method pd.read_csv() of Pandas.
              Ø
                  The method pd.read_csv() takes the path of the CSV file as input and returns a Pandas DataFrame object.
              Ø
                  Keyword  argument  delimiter or  sep of  pd.read_csv()  method  is  used  to  specify  the  delimiter
              Ø
                  explicitly.
                  Keyword argument usecols of pd.read_csv() method is used to specify the list of required columns.
              Ø
                  Keyword argument dtype of pd.read_csv() method is used to specify the new data types in the form
              Ø
                  of a dictionary comprising entries of the form column_name:data_type.
                  The attribute ndim of Pandas DataFrame yields the number of dimensions of a DataFrame.
              Ø
                  Attribute shape of the DataFrame returns a tuple of the form (n_rows, n_cols).
              Ø
                  The attribute index yields the labels along the rows (row labels) of a dataframe. The attribute columns
              Ø
                  yields labels used across columns (column names) of a dataframe.
                  The info() method of a DataFrame provides the following summary information about a DataFrame:
              Ø
                  •  The number of rows, and the row indexes.
                  •  The names of columns, the number of non-null entries in each column, and the type objects in each column

                  The method head() returns first n records of a DataFrame, while the method tail() returns last n records
              Ø
                  of the DataFrame.
                  By default, row and column indexes begin with 0.
              Ø
                  The loc method is used to access elements in a Pandas DataFrame using row labels which may be integer
              Ø
                  labels or string labels.
                  The iloc method is used to access elements in a Pandas DataFrame using positional indexing by specifying
              Ø
                  an integer index or a slice for rows and columns of interest.
                  The keyword argument index_col of Pandas DataFrame can be used to set a column as the row labels.
              Ø
                  Method pd.set_index() of Pandas DataFrame is used to explicitly set a column as the index after reading
              Ø
                  a .csv file into a DataFrame.
                  Boolean  indexing  is  a  powerful  technique  for  filtering  data  from  a  Pandas  DataFrame  based  on  some
              Ø
                  condition.

                  Pandas  provide  with  a  powerful  method  describe()  which  returns  a  DataFrame  comprising  the
              Ø
                  statistics-count, mean, standard deviation, minimum, maximum, and quartiles, about each column in the
                  DataFrame  that  store  numerical  data.  By  specifying  include='all'  as  the  keyword  argument,  the
                  summary information includes count, unique, top, and frequency statistics for each non-numeric column.
                  Pandas provides various aggregation functions, such as sum(),  mean(),  max(), and min() which
              Ø
                  allow us to obtain various summary statistics for different groupings of data.

                  1.  sum(): Returns the sum of values in a column or across specified axis in a DataFrame.
                  2.  mean(): Returns the average value of a column or across specified axis in a DataFrame.
                  3.  max(): Returns the maximum value in a column or across specified axis in a DataFrame.
                  4.  min(): Returns the minimum value in a column or across specified axis in a DataFrame.

                  Default axis along which the operation is to be performed is set to 'index' or 0 which denotes that the
              Ø
                  operation is to be applied column-wise across rows. Alternatively, we may apply the operation row-wise
                  across columns by setting the axis to 'columns' or 1.

                  Pandas allows us to set an attribute as an index using the method set_index().
              Ø



                                                                             Data Handling using Pandas DataFrame  65
   74   75   76   77   78   79   80   81   82   83   84