Page 221 - Touhpad Ai
P. 221
Basic DataFrame Manipulation
Although you have already studied most of these functions in previous topics, the following section provides a brief
outline of commonly used pandas in-built functions for manipulating Kaggle datasets and other data in Python.
Inspecting Data
These functions help to understand the structure, content, and summary of your DataFrame:
Function Description
df.head() Displays the first few rows (default = 5) of the DataFrame.
df.tail() Displays the last few rows (default = 5) of the DataFrame.
df.info() Provides concise information about the DataFrame such as column names, data types,
and non-null counts.
df.describe() Generates descriptive statistics for numerical columns (mean, median, std, etc.).
df.shape Returns the number of rows and columns in the DataFrame.
df.columns Returns a list of column names.
df.dtypes Displays data types of all columns.
Selecting Data
You can extract specific data (rows, columns, or filtered subsets) from a DataFrame using the following methods:
Function Description
df['column_name'] Selects a single column.
df[['col1', 'col2']] Selects multiple columns.
df.loc[row_index, 'column_name'] Selects data using labels (row and column names).
df.iloc[row_index, column_index] Selects data using integer positions (row and column numbers).
df[df['column'] > value] Filters rows based on a condition.
Adding or Removing Columns
Following functions are used to add or remove columns in the dataset:
Function Description
df['new_column'] = values Adds a new column to the DataFrame.
df.drop('column_name', axis=1, inplace=True) Removes an existing column permanently.
Modifying Data
Following functions are used to modify data in the dataset:
Function Description
df['column'].fillna(value, inplace=True) Fills missing (NaN) values in a column with a given value.
df['column'].astype(dtype) Converts a column to a different data type.
df['column'].apply(function) Applies a custom or built-in function to all values in a column.
Sorting Data
The df.sort_values('column_name', ascending=False) function sorts data based on a column in descending order
(ascending= True for ascending order).
Grouping Data
The df.groupby('column_name').mean() function groups data by a column and calculates the mean of each group.
Theoretical and Practical Aspects of Data Processing 219

