Page 71 - Informatics_Practices_Fliipbook_Class12
P. 71
2 Biscuit Food 20 2
3 Bourn-Vita Food 70 1
4 Soap Hygiene 40 4
Day 2 Purchases:
Product Category Price Quantity
0 Jeans Clothes 400 2
1 Chocolate Food 50 4
2 Air Freshner Household 80 2
3 Coffee Food 120 1
Now that groceryDF1 and groceryDF2 have been constructed, let us use Pandas method concat() to concatenate
them to construct a new DataFrame groceryDF comprising purchases of both days, as shown below:
>>> groceryDF = pd.concat([groceryDF1, groceryDF2], ignore_index = True)
>>> print(groceryDF)
Product Category Price Quantity
0 Bread Food 20 2
1 Milk Food 60 5
2 Biscuit Food 20 2
3 Bourn-Vita Food 70 1
4 Soap Hygiene 40 4
5 Brush Hygiene 30 2
6 Detergent Household 80 1
7 Tissues Hygiene 30 5
8 Jeans Clothes 400 2
9 Chocolate Food 50 4
10 Air Freshner Household 80 2
11 Coffee Food 120 1
Concatenating DataFrames having Mismatched Column Names
In the above example, the two DataFrames had identical column names. However, while merging the two DataFrames
(say df1 and df2), suppose the DataFrame df2 contains a column name df2Col that does not belong to the
DataFrame df1. So, the concatenated DataFrame will have the column name df2Col, but it would not have any valid
value corresponding to the rows of the DataFrame df1. These rows will have the value NaN in the column df2Col of
the concatenated DataFrame. Similarly, if the DataFrame df1 contains a column name df1Col that does not belong
to the DataFrame df2, then the concatenated DataFrame will have the column name df1Col, but it would not have
any valid value corresponding to the rows of the DataFrame df2. These rows will have the value NaN in the column
df1Col of the concatenated DataFrame.
>>> # Create the first DataFrame for GDP data
>>> gdp1 = {'Year': [2018, 2019, 2020],
'Gross Domestic Product': [21.3, 22.6, 20.9],
'Inflation Rate': [2.1, 1.8, 2.5]}
>>> gdpDF1 = pd.DataFrame(gdp1)
>>> # Create the second DataFrame for extended GDP data
>>> gdp2 = {'Year': [2021, 2022],
'Gross Domestic Product': [23.2, 24.6],
'Inflation Rate': [2.5, 2.0],
'Unemployment Rate': [4.2, 4.4]}
>>> gdpDF2 = pd.DataFrame(gdp2)
Note that the DataFrame gdpDF2 contains a column name Unemployment Rate that does not belong to the
DataFrame gdpDF1. So, the concatenated DataFrame will have the column name Unemployment Rate, but it would
not have any valid values corresponding to the rows of the DataFrame gdpDF1.
Data Handling using Pandas DataFrame 57

