Page 180 - Data Science class 10
P. 180
DATA MERGING
04
Learning Outcome
4.1. Overview of Data Merging 4.2. Data Joins
4.3. What is Z-Score? 4.4. Concept of Percentile
4.5. Quartiles 4.6. Deciles
The practice of combining two or more similar records into a single one is known as data merging. It is used for
adding variables to a dataset, appending cases or observations, or removing duplicates and other inaccurate data.
In this chapter, we will learn about different ways of simplifying the process of data merging.
4.1. OVERVIEW OF DATA MERGING
In Data Science, data merging is the procedure of combining two or more datasets into a single data frame. This
process is needed when we have raw data stored in multiple files, worksheets or data tables that we want to
analyse all at once.
However, while merging the data from different sources, there are many problems that arise that require correction
for successful data merging. Different data sources will always have distinct naming conventions, as opposed to
the primary data source. They might use different methods of data aggregation, etc. It frequently occurs that the
additional data source was produced at a very different time by different individuals with different goals and use
cases. Due to all of these reasons, a significant disparity between different data sources shouldn't be taken as odd.
4.1.1 Benefits of Data Merging
There are many benefits to merging your datasets. Some of them are as follows:
• Accuracy
• Completeness
• Convenience
Let us read about them in details.
• Accuracy: All of the information is guaranteed to be entirely accurate when it is combined into a single set.
• Completeness: All of the information is combined into one place when the sets are merged. This guarantees
that the set of data is complete and there are no missing pieces of information in the set. Since everything is in
one place, it is easy to find and use.
• Convenience: Datasets are combined to form a single, large dataset. This makes working with and evaluating
the set as a whole simple. Everything is right there in front of you, so there's no need to worry about searching
through various files or attempting to manually piece together various portions of the dataset.
178 Touchpad Data Science-X

