Page 201 - Ai_417_V3.0_C9_Flipbook
P. 201
While acquiring data, you need to collect data in a regular bases or timely manner to ensure it is up-to-date and
reflects current conditions or trends. You need to also ensure that the data you acquired is complete and sufficient.
If you don't have sufficient data, your analysis may be incomplete or unreliable.
What is Data?
Data is a piece of raw information or facts and statistics collected together for reference or analysis. They are raw
facts that need to be processed to get meaningful information. Whenever we want the AI project to be able to
predict an output, we need to train it with a data set first. Data plays an important part in an AI project as it creates
the base on which the AI project is built.
Types of Data
There are two types of data:
• Training Data: It is data on which we train our AI project model. It is basically to fit the parameters of the project
for the model. In training data, the output is available to the model.
• Testing Data: It is used to check the performance of an AI model. In testing data, the data is not seen for which
the predictions have to be made.
For example, if we want to prepare an AI model to predict the school average of students in board examinations,
we will feed the marks obtained by students in board examinations in the previous years, this will be treated as
training data. Once the model is ready, it will predict the school average for the coming year. Now when we are
testing it, we feed the different datasets and that is the testing data.
Data Features
In the data acquisition stage, it is very important that the data we provide to an AI project is relevant. How do we
know what data to be used in a problem scoping?
• We need to visualise the factors that affect the problem statement, for which we need to extract the data features.
• We need to find out the parameters that will affect the problem statement directly or indirectly.
• Data features refer to the type of data that you want to collect. In the above example, the data features would
be each subject average, the number of students taking the exam, the theory and practical marks distribution
of each subject, etc.
Reliable Sources of Relevant Data
Data is the base for any AI project to be built. When the data is acquired, it's important to check if it's from a
reliable and authentic source for the accuracy of the project.
Also, the acquisition methods should be authentic so that there's no conflict in achieving project goals.
There are various sources to collect relevant data for our project:
• Surveys: Data can be collected from online surveys, telephonic surveys or in-person surveys to collect responses.
Surveys are a way of collecting data from a group of people to gain information and insights into various topics
of interest. The process involves asking people for information through questionnaires which can be online or
offline. It can be considered as a data source.
• Web Scraping: Data or information can also be extracted from a website. Web scraping or Data scraping is the
method of downloading information from the World Wide Web (WWW) and storing it on your computer for
later reference. The data collected in this way is online data.
AI Reflection, Project Cycle and Ethics 199

