Page 196 - Touhpad Ai
P. 196

5
                                                                                                           UNIT




              THEORETICAL AND


              PRACTICAL ASPECTS OF


              DATA PROCESSING





              LEARNING OBJECTIVES
                 Introduction to Data Cleaning
                 Introduction to NumPy
                 Introduction to Pandas
                 Introduction to Kaggle
                 Data Transformation and Standardisation
















              Introduction to Data Cleaning

              In the field of Artificial Intelligence (AI) and data science, we deal with large volumes of data collected from multiple
              sources such as websites, mobile applications, sensors, surveys, and social media platforms. Before this data can be
              used to train machine learning models or make predictions, it must be clean, consistent, and accurate. This essential
              process is known as data cleaning or data preprocessing.
              Raw data is often messy or unstructured. It may contain missing values, incorrect or inconsistent entries, spelling
              errors, duplicate records, or improperly formatted data types (for example, dates stored as text). If such data is used
              without cleaning, AI models may produce inaccurate predictions and lead to poor decision-making.
              Therefore, data cleaning is a critical step in every data science workflow, ensuring that the dataset used for analysis
              and model training is reliable and ready for accurate insights.

              Data cleaning (also called data cleansing or data preprocessing) is the process of detecting and correcting errors in a
              dataset to make it accurate, consistent, and ready for analysis. It helps to:


                  Remove errors and        Fill or fix missing values   Organise the data       Make the data reliable
                       mistakes                                             properly               and ready to use

              Data cleaning is important because it helps to:
              u  Ensures accuracy in analysis and predictions
              u  Makes data consistent and easy to understand
              u  Improves performance of machine learning models
              u  Saves time and reduces errors in decision-making


                 194    Touchpad Artificial Intelligence - XI
   191   192   193   194   195   196   197   198   199   200   201