How to join two dataframes in python

 We are given 2 separate dataframes. One indicates orders and the other has products. We need to combine these 2 dataframes so that we know what products were ordered in each order.



As we can see there is a common column in both dataframe- order_id

So using this we can join these 2 dataframes by merge method.

Before this merge,



Orders dataframe has rows 3421083, columns 7

products dataframe has rows 32434489, columns 4

indicator=True function helps us to identify the source of each row(left/right/both)




What does this mean?

orders is a smaller dataframe but since we need all information provided in the columns of that dataframe, we decided to merge both dataframes using orders as a "must have" data.

So in the combined(merged) dataframe, we can see that 206209 user_id rows were not there in products dataframe. Hence after merge, merge indicator is showing it as left only.


The above is the screenshot of samples of order_id which were there only in orders dataframe and were not in products dataframe.

We can verify by taking a sample order_id 1187899 if it is there in products dataframe or not


This verifies that our understanding is correct about those 206209 order_id that are only in orders dataframe.




The new dataframe order_product has total 11 rows (7 from orders and 4 from products dataframe)



But when orders had 3421083 rows and products had 32434489 rows, why new dataframe has 32640698 rows?

Because 32434489+206209 = 32640698

Since we chose to merge using "left" option, orders had unique 206209 rows and they were added to the 32434489 rows of products and new dataframe has 32640698 rows.

As you can see, merge operations is easy and straightforward but after both dataframes are merged, it requires some introspection to ensure that the data merged as per our expectation.

There are other ways of joining 2 dataframes too but that we will discuss in another post.















No comments:

Post a Comment

Complex query example

Requirement:  You are given the table with titles of recipes from a cookbook and their page numbers. You are asked to represent how the reci...