Page 223 - Touhpad Ai
P. 223
Data Transformation
Following functions are used to make data more suitable for modelling or analysis:
Technique Description
Normalization Scales data between 0 and 1.
Standardisation Converts data to have a mean of 0 and standard deviation of 1.
Binning Groups continuous data into categories or intervals.
Creating and Manipulating DataFrames from Kaggle Datasets
Once you’ve downloaded a dataset (usually in CSV format), you can use Python and the Pandas library to work with it.
For example, we want to study the usage of AI tools by college students in India, we will search for this dataset on
Kaggle and download it. Let us now study and analyse this dataset using Pandas.
Scan the QR code or visit the link to get dataset: https://www.kaggle.com/datasets/rakeshkapilavai/ai-
tool-usage-by-indian-college-students-2025
1. Import Libraries
import pandas as pd
pd.set_option('display.max_columns', None) # Show all columns
2. Load the Dataset into a DataFrame
df = pd.read_csv('Students.csv') # Replace with your file path
3. View the First Few Rows
print(df.head())
Output:
Student_Name College_Name Stream \
0 Aarav Indian Institute of Information Technology Engineering
1 Vivaan Government Ram Bhajan Rai NES College, Jashpur Commerce
2 Aditya Dolphin PG Institute of BioMedical & Natural Science
3 Vihaan Shaheed Rajguru College of Applied Sciences for Arts
4 Arjun Roorkee College of Engineering Science
Year_of_Study AI_Tools_Used Daily_Usage_Hours \
0 4 Gemini 0.9
1 2 ChatGPT 3.4
2 2 Copilot 3.6
3 2 Copilot 2.9
4 1 Gemini 0.9
Use_Cases Trust_in_AI_Tools Impact_on_Grades \
0 Assignments, Coding Help 2 2
1 Learning new topics 3 -3
2 MCQ Practice, Projects 5 0
3 Content Writing 5 2
4 Doubt Solving, Resume Writing 1 3
Do_Professors_Allow_Use Preferred_AI_Tool Awareness_Level \
0 No Copilot 9
1 Yes Other 6
2 No Gemini 1
3 Yes Gemini 5
4 Yes Other 8
Theoretical and Practical Aspects of Data Processing 221

