Page 79 - Informatics_Practices_Fliipbook_Class12
P. 79
CSV files can be read in Pandas DataFrame using the method pd.read_csv() of Pandas.
Ø
The method pd.read_csv() takes the path of the CSV file as input and returns a Pandas DataFrame object.
Ø
Keyword argument delimiter or sep of pd.read_csv() method is used to specify the delimiter
Ø
explicitly.
Keyword argument usecols of pd.read_csv() method is used to specify the list of required columns.
Ø
Keyword argument dtype of pd.read_csv() method is used to specify the new data types in the form
Ø
of a dictionary comprising entries of the form column_name:data_type.
The attribute ndim of Pandas DataFrame yields the number of dimensions of a DataFrame.
Ø
Attribute shape of the DataFrame returns a tuple of the form (n_rows, n_cols).
Ø
The attribute index yields the labels along the rows (row labels) of a dataframe. The attribute columns
Ø
yields labels used across columns (column names) of a dataframe.
The info() method of a DataFrame provides the following summary information about a DataFrame:
Ø
• The number of rows, and the row indexes.
• The names of columns, the number of non-null entries in each column, and the type objects in each column
The method head() returns first n records of a DataFrame, while the method tail() returns last n records
Ø
of the DataFrame.
By default, row and column indexes begin with 0.
Ø
The loc method is used to access elements in a Pandas DataFrame using row labels which may be integer
Ø
labels or string labels.
The iloc method is used to access elements in a Pandas DataFrame using positional indexing by specifying
Ø
an integer index or a slice for rows and columns of interest.
The keyword argument index_col of Pandas DataFrame can be used to set a column as the row labels.
Ø
Method pd.set_index() of Pandas DataFrame is used to explicitly set a column as the index after reading
Ø
a .csv file into a DataFrame.
Boolean indexing is a powerful technique for filtering data from a Pandas DataFrame based on some
Ø
condition.
Pandas provide with a powerful method describe() which returns a DataFrame comprising the
Ø
statistics-count, mean, standard deviation, minimum, maximum, and quartiles, about each column in the
DataFrame that store numerical data. By specifying include='all' as the keyword argument, the
summary information includes count, unique, top, and frequency statistics for each non-numeric column.
Pandas provides various aggregation functions, such as sum(), mean(), max(), and min() which
Ø
allow us to obtain various summary statistics for different groupings of data.
1. sum(): Returns the sum of values in a column or across specified axis in a DataFrame.
2. mean(): Returns the average value of a column or across specified axis in a DataFrame.
3. max(): Returns the maximum value in a column or across specified axis in a DataFrame.
4. min(): Returns the minimum value in a column or across specified axis in a DataFrame.
Default axis along which the operation is to be performed is set to 'index' or 0 which denotes that the
Ø
operation is to be applied column-wise across rows. Alternatively, we may apply the operation row-wise
across columns by setting the axis to 'columns' or 1.
Pandas allows us to set an attribute as an index using the method set_index().
Ø
Data Handling using Pandas DataFrame 65

