Page 166 - Ai_V1.0_Class9
P. 166

Secondary Data Sources

              Secondary data sources are the external sources for collecting data, rather than generating it personally. Some sources
              for secondary data collection are: Published Literature, Government Publications, Market research reports, etc.

                                                                                           •   UCI is a collection of
                                            •   Countries like                               databases, domain
                                              Australia, EU, India,                          theories, and data
                                              New Zealand, and                               generators in
                                              Singapore are openly                           collaboration with
                                              sharing datasets on   Dataset Search           the University of
                                              various portals
                                                                                             Massachusetts

                    •   Kaggle is an online
                       community of data           .gov            •   This is a toolbox   UCI
                                                                      by Google that can
                       scientists where you                           search for data by
                       can access different     datasets              name                 Machine Learning Repository
                       types of data



              Best Practices for Acquiring Data

              Acquiring data effectively is crucial for ensuring its accuracy, reliability, and usability. Here are some best practices
              for acquiring data:
              1.   Set Clear Goals: Understand why you need the data and what you want to achieve; specify the type, format,
                  and detail level required.

              2.   Identify Data Sources: Use primary data that you collect yourself (surveys, interviews) and secondary data
                  from others (reports, databases).

              3.   Evaluate  Sources:  Ensure  data  sources  are  trustworthy,  relevant,  accurate,  and  current;  get  necessary
                  permissions and respect privacy.
              4.   Collect and Prepare Data: Use surveys, interviews, sensors, and web scraping; clean data by fixing errors,
                  removing duplicates, and anonymising.
              5.   Validate, Document, and Store: Cross-check and sample for accuracy, keep detailed records and meta-
                  data, store data securely, and regularly update it while following laws and regulations.


              Checklist of Factors that make Data Good or Bad

              Here’s a checklist of factors that can help determine whether data is of good quality (good) data or poor quality
              (bad) data:

                                      Good Data                                      Bad Data


                        •  Data is well structured                    •  Data is scattered
                        •  It is accurate                             •  Contains a lot of incorrect values
                        •  It is consistent                           •  Contains missing and duplicate values
                        •  It is presented well                       •  It is poorly presented
                        •  Contains facts which are relevant to       •  Contains facts which are not relevant
                            our requirement                               to our requirement



                    164     Artificial Intelligence Play (Ver 1.0)-IX
   161   162   163   164   165   166   167   168   169   170   171