Page 139 - Robotics and AI class 10
P. 139
Steps in Machine Learning Role of Data/Information
The Steps in Machine Learning are broadly categorised as follows: Data and information are the foundational elements that drive decision-making, innovation, and progress in today's
• Data Collection: As explained earlier, the data collection is the base in the process of Machine Learning. The digital age. They play crucial roles across various domains, from business and science to healthcare and government.
reliability and the quality of data needs to be assured for your model to predict accurately. What is Data?
• Data Preparation and Wrangling: The data collected needs to be prepared and made in a structured manner Data is a piece of raw information or facts and statistics collected together for reference or analysis. They are raw
so that the correlation between the variables and classes can be understood. The process would require data facts that need to be processed to get meaningful information. It can be in the form of text, numbers, audio and
randomization and cleaning. The cleaning would involve removal of irrelevant data, duplicate data, missing video clips. Whenever we want the AI project to be able to predict an output, we need to train it with a data set
values, followed by restructuring of data by adjustment of rows and columns, or their index numbers. Once the first. Data plays an important part of an AI project as it creates the base on which the AI project is built.
data is cleaned and converted to a usable format, the data need to be split, one set to be used as Training Data For example:
and the other as Testing Data.
An Event Management Company wants to organize a T-20 Corporate Cricket match. We want to use an Artificial
• Model Selection: The Model Selection or Model Building is determined based on the outcome you want to Intelligence Model to be able to predict the weather forecast for a specific geo-location, based on previous
achieve. It is a build using various analytical techniques of machine learning which are best suited for the task year’s weather data. The AI model can be trained to analyze the data for the last 10 years and predict the
at hand, whether it is do with speech recognition, image recognition, numerical data, text data, prediction, etc. weather conditions for that day. The weather data for the previous years to test the model is called Training Data,
• Training the Model: In this process, we use the data prepared for Training and allow the Model Algorithm to and the predicted data of the weather conditions is called Testing Data.
process it and understand the patterns, features and rules, to be able to predict. The further training helps the For example, if we want to prepare an AI model to predict the school average of students in Board examination,
model to predict more accurately over the period of time and get closer to completing the task it is designed to do. we will feed the marks obtained by students in Board examination in the previous years, this will be treated as
• Testing, Evaluating and Tuning the Model: The Testing data is used to check the accuracy of the Models training data. Once the model is ready, it will predict the school average for the coming year. Now when we are
prediction. The evaluation of the results and further, Tuning of the algorithm helps the model to achieve testing it, we feed the different data set and that is the testing data.
complete accuracy in predicting. For any AI project to be efficient, the training data should be authentic and relevant to the problem statement
• Deployment and Prediction: Once the model is tested, it is deployed in the real world. The unseen real scoped. It should be unbiased data for the project to give accurate predictions. In the previous example, if the
world data is fed to the model, which it should be able to use and predict with great accuracy as it has been training data- weather forecast is taken from an unreliable source or not of the previous years, the prediction
thoroughly exposed to the testing data. of the AI model will be incorrect.
Data Collection Data Features
Data features refers to the type of data that you want to collect. In the above example, the data features would
Data Preparation and Wrangling be each subject average, number of students taking the exam, theory and practical marks distribution of each
Machine Learning Process Training the Model Reliable Sources to Obtain Relevant Data
subject, etc.
Model Selection
Data is the base for any AI project to be built. When the data is acquired, it's important to check if it's from a
reliable and authentic source for the accuracy of the project. Also the acquisition methods should be authentic
so that there's no conflict in achieving project goals. There are various sources to collect relevant data for our
Testing, Evaluating and Tuning the Model
• Surveys: Data can be collected from online surveys, telephonic surveys or in person surveys and collect
responses. Surveys are a way of collecting data from a group of people in order to gain information and insights
Deployment and Prediction project.
into various topics of interest. The process involves asking people for information through questionnaires which
can be online or offline. It can be considered as a data source.
#Digital Literacy • Web Scraping: Data or information can also be extracted from a website. Web scraping or Data scraping is
Video Session the method of downloading information from the World Wide Web and storing it onto your computer for later
Scan the QR code or visit the following link to watch the video: reference.
The role of data in Artificial Intelligence • Sensors: Data can also be collected from various sensors like collecting environmental data, and storing it in
some data storage solutions. Sensors are connected through gateways which enable them to collect live data.
https://www.youtube.com/watch?v=oyhdkoPYRVs
• Cameras: Data can be seen, written down or recorded onto the computer. Cameras are used to collect data in
the form of images. CCTV, web cameras, surveillance cameras are big sources of visual data that can be acquired
from various places.
Decision Making in Machines/Computers 137

