Page 234 - Touhpad Ai
P. 234

AT A GLANCE

                  ¥   Data cleaning is the process of fixing messy data by removing duplicates, correcting spelling errors, filling
                      missing values, and fixing wrong formats.

                  ¥   We learned the steps of data cleaning: identifying problems, fixing the data, formatting it properly, and
                      finally validating it for correctness.

                  ¥   Using Pandas in Python, we can clean data with functions like dropna(), fillna(), drop_duplicates(),
                      replace(), and str.strip().

                      Kaggle is a free online platform with thousands of real-world datasets for practice and analysis.
                  ¥
                      Data transformation means changing how data looks or is structured—like converting date formats, units,
                  ¥
                      or replacing words with numbers for easier analysis.
                      Data standardization is about making data follow a fixed, uniform format so it becomes easy to combine,
                  ¥
                      compare, and analyse from different sources.
                      We can standardise values using:
                  ¥
                    o    Z-score normalization (to make data mean = 0),
                    o    Scale adjustment (like converting pounds to kilograms),

                    o    Feature scaling (changing values to range 0–1),
                    o    and text/date formatting (like making “YES” → “Yes”).
                      Real-life examples of data standardization include:
                  ¥
                    o    Student records having consistent roll numbers and phone formats.

                    o    Product sizes written in full instead of just ‘S’, ‘M’, ‘L’.
                    o    Dates and currency written in the same style across all data.











                                                            EXERCISE



                                                                                                 Solved Questions
                                                SECTION A   (Objective Type Questions)
            AI   QUIZ
              A.  Tick (ü) the correct option.

                  1.  Choose the Python library that is commonly used for cleaning data.
                      a.  Matplotlib                                  b.  NumPy
                      c.  Pandas                                      d.  Seaborn

                  2.  Identify the function that checks for missing values in a DataFrame.
                      a.  isnull()                                    b.  dropna()

                      c.  fillna()                                    d.  isempty()


                 232    Touchpad Artificial Intelligence - XI
   229   230   231   232   233   234   235   236   237   238   239