Page 223 - AI Ver 1.0 Class 10
P. 223

• Online mode: Open-source Govt. portals, WHO websites.
                    • Offline mode: Surveys, questionnaires, experiments, personal interviews.

                 While handling data online or offline, the following points should always be remembered:
                    • The source of data should be authentic and reliable, as the random data source could provide wrong or unusable
                   data.
                    • For proper training of AI Model, the authenticity of data is must.

                    • Privacy of data sources should always be kept in mind, as it is a fundamental right of everyone.
                    • Consent from the owner of the data should be taken before using their personal dataset.
                    • Data present in the public domain should preferably be used, if available.


                 Types of Data

                 The most suitable way for a dataset is storing it in the form of tables. It’s most easy to maintain and analyse if
                 data is in the form of tables. The following are some of the popular tabular formats of storing data:
                    • Spreadsheets: Data stored in the form of rows and columns under a filename is a spreadsheet application.
                   It’s a powerful tool for analysis, visual representation, calculations and accounting purposes. Some popular
                   spreadsheet applications are MS Excel, Open Office Spreadsheet, etc.
                    • Comma Separated Values (CSV): These are files with extension of .csv that contain records with each value
                   separated with commas. Every line is a single record. These files are created using Excel, Google Sheets, and also
                   simple word processing programs like Notepad.

                    • Structured Query Language (SQL): A query language that is used to store, manage and retrieve data from
                   DBMS.  It’s  a  domain  specific  language  primarily  used  to  handle  structured  data  in  database  management
                   systems.


                 Issues Related to Data
                 At the time of collecting the data needed for Data Science we might face some issues like:
                    • Erroneous Data: It means the values in a dataset is not received as per the expectations in that position. There
                   are two ways in which the data can be erroneous:

                      ✶ Incorrect Values: The values in the dataset at random places are not correct. Either the data is mismatched
                      or it is not relevant to that position. For example, Marks column does not have values in decimal, phone
                      number column instead of having 10 digits mobile number has eight digits landline number, Name column
                      instead of having full name has only the first name.
                      ✶ Invalid or Null Values: It means value either corrupted or has no meaning. These values when occurring in
                      a dataset need to be removed as they hold no value for data processing. For example, phone number not
                      appropriately filled, email address with nothing given using @ sign.

                    • Missing Data: It means data not present at the desired location of a dataset. Missing data is not erroneous
                   data. Data with the missing value is considered as an incomplete dataset. For example, email address, pin code
                   missing in a set of student details.
                    • Outliers Data: It means the data that differs drastically from the rest of the data. This kind of unusual data
                   needs to be removed or replaced from the dataset for accurate results. For example, value zero given in marks
                   of a student who is absent instead of exemption. This will not give an accurate class average.



                                                                                              Data Science  221
   218   219   220   221   222   223   224   225   226   227   228