Page 221 - Touhpad Ai
P. 221

Basic DataFrame Manipulation
                 Although you have already studied most of these functions in previous topics, the following section provides a brief
                 outline of commonly used pandas in-built functions for manipulating Kaggle datasets and other data in Python.


                 Inspecting Data
                 These functions help to understand the structure, content, and summary of your DataFrame:
                        Function                                           Description
                  df.head()              Displays the first few rows (default = 5) of the DataFrame.
                  df.tail()              Displays the last few rows (default = 5) of the DataFrame.
                  df.info()              Provides concise information about the DataFrame such as column names, data types,
                                         and non-null counts.
                  df.describe()          Generates descriptive statistics for numerical columns (mean, median, std, etc.).
                  df.shape               Returns the number of rows and columns in the DataFrame.
                  df.columns             Returns a list of column names.
                  df.dtypes              Displays data types of all columns.

                 Selecting Data

                 You can extract specific data (rows, columns, or filtered subsets) from a DataFrame using the following methods:
                                 Function                                           Description
                  df['column_name']                       Selects a single column.
                  df[['col1', 'col2']]                    Selects multiple columns.
                  df.loc[row_index, 'column_name']        Selects data using labels (row and column names).
                  df.iloc[row_index, column_index]        Selects data using integer positions (row and column numbers).
                  df[df['column'] > value]                Filters rows based on a condition.


                 Adding or Removing Columns
                 Following functions are used to add or remove columns in the dataset:
                                       Function                                           Description
                  df['new_column'] = values                           Adds a new column to the DataFrame.
                  df.drop('column_name', axis=1, inplace=True)        Removes an existing column permanently.

                 Modifying Data

                 Following functions are used to modify data  in the dataset:
                                  Function                                           Description
                  df['column'].fillna(value, inplace=True)  Fills missing (NaN) values in a column with a given value.
                  df['column'].astype(dtype)                Converts a column to a different data type.
                  df['column'].apply(function)              Applies a custom or built-in function to all values in a column.

                 Sorting Data
                 The df.sort_values('column_name', ascending=False) function sorts data based on a column in descending order
                 (ascending= True for ascending order).

                 Grouping Data

                 The df.groupby('column_name').mean() function groups data by a column and calculates the mean of each group.





                                                                      Theoretical and Practical Aspects of Data Processing  219
   216   217   218   219   220   221   222   223   224   225   226