Page 225 - Touhpad Ai
P. 225

Awareness_Level
                   count         3614.000000               3614.000000
                   mean              0.003320              5.828445
                   std               2.370706              2.925481
                   min              -5.000000              1.000000
                   25%              -2.000000              3.000000
                   50%               0.000000              6.000000
                   75%               2.000000              8.000000
                   max               5.000000              10.000000
                 6.  Clean the Data
                   u Check for missing values:
                      print(df.isnull().sum())
                   u Drop duplicates:
                      df = df.drop_duplicates()
                   u Remove rows with missing critical columns (e.g., tool name or study year):
                      df = df.dropna(subset=['AI_Tools_Used', 'Year_of_Study'])
                   u Standardise text (lowercase tool names):
                      df['AI_Tools_Used'] = df['AI_Tools_Used'].str.lower().str.strip()
                 Now that data cleaning is over, let us now explore and analyse this dataset.
                 1.  How many students responded?
                    print("Total responses:", df.shape[0])
                   Output:
                    Total responses: 3614
                 2.  Percentage of students using AI tools vs Not using
                    # Step 1: Confirm the column is lowercase and stripped (already done during cleaning)
                   df['AI_Tools_Used'] = df['AI_Tools_Used'].str.strip().str.lower()
                    # Step 2: Create 'AI_User' column - assumes any entry is a 'Yes' since non-users were
                   removed
                   df['AI_User'] = 'Yes' # All remaining rows have valid AI tool usage
                   # Step 3: Calculate usage percentage
                   usage_counts = df['AI_User'].value_counts(normalize=True) * 100
                   # Step 4: Print result
                   print("Percentage of Students Using AI Tools:")
                   print(usage_counts)
                   Output:
                    Percentage of Students Using AI Tools:
                   AI_User
                   Yes      100.0
                    Name:   proportion, dtype: float64
                 3.  Top 5 AI tools used
                   from collections import Counter
                   # Step 1: Drop missing values in AI_Tools_Used (already handled, but safe)
                   ai_tools_series = df['AI_Tools_Used'].dropna()
                   # Step 2: Split each row on comma, flatten the list, and strip spaces
                   all_tools = ai_tools_series.str.split(',').sum()

                                                                      Theoretical and Practical Aspects of Data Processing  223
   220   221   222   223   224   225   226   227   228   229   230