Page 166 - Ai_V1.0_Class9
P. 166
Secondary Data Sources
Secondary data sources are the external sources for collecting data, rather than generating it personally. Some sources
for secondary data collection are: Published Literature, Government Publications, Market research reports, etc.
• UCI is a collection of
• Countries like databases, domain
Australia, EU, India, theories, and data
New Zealand, and generators in
Singapore are openly collaboration with
sharing datasets on Dataset Search the University of
various portals
Massachusetts
• Kaggle is an online
community of data .gov • This is a toolbox UCI
by Google that can
scientists where you search for data by
can access different datasets name Machine Learning Repository
types of data
Best Practices for Acquiring Data
Acquiring data effectively is crucial for ensuring its accuracy, reliability, and usability. Here are some best practices
for acquiring data:
1. Set Clear Goals: Understand why you need the data and what you want to achieve; specify the type, format,
and detail level required.
2. Identify Data Sources: Use primary data that you collect yourself (surveys, interviews) and secondary data
from others (reports, databases).
3. Evaluate Sources: Ensure data sources are trustworthy, relevant, accurate, and current; get necessary
permissions and respect privacy.
4. Collect and Prepare Data: Use surveys, interviews, sensors, and web scraping; clean data by fixing errors,
removing duplicates, and anonymising.
5. Validate, Document, and Store: Cross-check and sample for accuracy, keep detailed records and meta-
data, store data securely, and regularly update it while following laws and regulations.
Checklist of Factors that make Data Good or Bad
Here’s a checklist of factors that can help determine whether data is of good quality (good) data or poor quality
(bad) data:
Good Data Bad Data
• Data is well structured • Data is scattered
• It is accurate • Contains a lot of incorrect values
• It is consistent • Contains missing and duplicate values
• It is presented well • It is poorly presented
• Contains facts which are relevant to • Contains facts which are not relevant
our requirement to our requirement
164 Artificial Intelligence Play (Ver 1.0)-IX

