Page 225 - Touhpad Ai
P. 225
Awareness_Level
count 3614.000000 3614.000000
mean 0.003320 5.828445
std 2.370706 2.925481
min -5.000000 1.000000
25% -2.000000 3.000000
50% 0.000000 6.000000
75% 2.000000 8.000000
max 5.000000 10.000000
6. Clean the Data
u Check for missing values:
print(df.isnull().sum())
u Drop duplicates:
df = df.drop_duplicates()
u Remove rows with missing critical columns (e.g., tool name or study year):
df = df.dropna(subset=['AI_Tools_Used', 'Year_of_Study'])
u Standardise text (lowercase tool names):
df['AI_Tools_Used'] = df['AI_Tools_Used'].str.lower().str.strip()
Now that data cleaning is over, let us now explore and analyse this dataset.
1. How many students responded?
print("Total responses:", df.shape[0])
Output:
Total responses: 3614
2. Percentage of students using AI tools vs Not using
# Step 1: Confirm the column is lowercase and stripped (already done during cleaning)
df['AI_Tools_Used'] = df['AI_Tools_Used'].str.strip().str.lower()
# Step 2: Create 'AI_User' column - assumes any entry is a 'Yes' since non-users were
removed
df['AI_User'] = 'Yes' # All remaining rows have valid AI tool usage
# Step 3: Calculate usage percentage
usage_counts = df['AI_User'].value_counts(normalize=True) * 100
# Step 4: Print result
print("Percentage of Students Using AI Tools:")
print(usage_counts)
Output:
Percentage of Students Using AI Tools:
AI_User
Yes 100.0
Name: proportion, dtype: float64
3. Top 5 AI tools used
from collections import Counter
# Step 1: Drop missing values in AI_Tools_Used (already handled, but safe)
ai_tools_series = df['AI_Tools_Used'].dropna()
# Step 2: Split each row on comma, flatten the list, and strip spaces
all_tools = ai_tools_series.str.split(',').sum()
Theoretical and Practical Aspects of Data Processing 223

