Page 180 - Data Science class 10
P. 180

DATA MERGING




                                                                                    04








                    Learning Outcome



                 4.1.  Overview of Data Merging                     4.2.  Data Joins
                 4.3.  What is Z-Score?                             4.4.  Concept of Percentile
                 4.5.  Quartiles                                    4.6.  Deciles



        The practice of combining two or more similar records into a single one is known as data merging. It is used for
        adding variables to a dataset, appending cases or observations, or removing duplicates and other inaccurate data.
        In this chapter, we will learn about different ways of simplifying the process of data merging.


        4.1. OVERVIEW OF DATA MERGING

        In Data Science, data merging is the procedure of combining two or more datasets into a single data frame. This
        process is needed when we have raw data stored in multiple files, worksheets or data tables that we want to
        analyse all at once.
        However, while merging the data from different sources, there are many problems that arise that require correction
        for successful data merging. Different data sources will always have distinct naming conventions, as opposed to
        the primary data source. They might use different methods of data aggregation, etc. It frequently occurs that the
        additional data source was produced at a very different time by different individuals with different goals and use
        cases. Due to all of these reasons, a significant disparity between different data sources shouldn't be taken as odd.


        4.1.1 Benefits of Data Merging
        There are many benefits to merging your datasets. Some of them are as follows:
           • Accuracy

           • Completeness
           • Convenience
        Let us read about them in details.
           • Accuracy: All of the information is guaranteed to be entirely accurate when it is combined into a single set.

           • Completeness: All of the information is combined into one place when the sets are merged. This guarantees
          that the set of data is complete and there are no missing pieces of information in the set.  Since everything is in
          one place, it is easy to find and use.

           • Convenience: Datasets are combined to form a single, large dataset. This makes working with and evaluating
          the set as a whole simple. Everything is right there in front of you, so there's no need to worry about searching
          through various files or attempting to manually piece together various portions of the dataset.

          178   Touchpad Data Science-X
   175   176   177   178   179   180   181   182   183   184   185