Page 196 - Touhpad Ai
P. 196
5
UNIT
THEORETICAL AND
PRACTICAL ASPECTS OF
DATA PROCESSING
LEARNING OBJECTIVES
Introduction to Data Cleaning
Introduction to NumPy
Introduction to Pandas
Introduction to Kaggle
Data Transformation and Standardisation
Introduction to Data Cleaning
In the field of Artificial Intelligence (AI) and data science, we deal with large volumes of data collected from multiple
sources such as websites, mobile applications, sensors, surveys, and social media platforms. Before this data can be
used to train machine learning models or make predictions, it must be clean, consistent, and accurate. This essential
process is known as data cleaning or data preprocessing.
Raw data is often messy or unstructured. It may contain missing values, incorrect or inconsistent entries, spelling
errors, duplicate records, or improperly formatted data types (for example, dates stored as text). If such data is used
without cleaning, AI models may produce inaccurate predictions and lead to poor decision-making.
Therefore, data cleaning is a critical step in every data science workflow, ensuring that the dataset used for analysis
and model training is reliable and ready for accurate insights.
Data cleaning (also called data cleansing or data preprocessing) is the process of detecting and correcting errors in a
dataset to make it accurate, consistent, and ready for analysis. It helps to:
Remove errors and Fill or fix missing values Organise the data Make the data reliable
mistakes properly and ready to use
Data cleaning is important because it helps to:
u Ensures accuracy in analysis and predictions
u Makes data consistent and easy to understand
u Improves performance of machine learning models
u Saves time and reduces errors in decision-making
194 Touchpad Artificial Intelligence - XI

