How to build a complex query in SQL

 

Dataset name: warehouse_orders

Table names: orders, warehouse

I have uploaded these tables in to bigquery and here are contents of these tables

Order table:



warehouse table:


As we can see, warehouse_id is a common column in both tables.

Orders table gives information about the orders fulfilled and warehouse table gives information about warehouse parameters.

Our goal is to calculate how much % of orders are being fulfilled by each warehouse.

Our expected end result should be something like this...

% orders fulfilled = number of orders per warehouse/total orders X 100

  • No of orders fulfilled can be obtained using COUNT function from orders table
  • Total number of orders can be obtained from orders table using SUM function
  • Warehouse name can be obtained from warehouse table (it is already available in column name warehouse_alias
  • Warehouse_id is in both tables so we can join both tables using this column

Let us join tables in the first query so we get warehouse_id and warehouse name columns for our end result

SELECT 

warehouse.warehouse_id,

warehouse.warehouse_alias

FROM 

`my-sql-project-336018.warehouse_orders.warehouse` AS warehouse

LEFT JOIN 

`my-sql-project-336018.warehouse_orders.orders` AS orders

ON 

warehouse.warehouse_id = orders.warehouse_id

I am joining tables with LEFT JOIN...why?

Because I want to ensure that all warehouse_id in the warehouse table must be there and the corresponding warehouse_id data in orders table should be there.

Which means, if there are warehouse_id in orders table but not in warehouse table, they are not required for our task.

Also always check that subquery in itself should run independently because we are going to use its result in the main query. We are getting below result when we run this subquery.



Next, let us find out number of orders by using COUNT function in orders table.

COUNT(orders.order_id) AS number_of_orders_fulfilled

And since using aggregation, add GROUP BY function too...


SELECT 
warehouse.warehouse_id,
warehouse.warehouse_alias,
COUNT(orders.order_id) AS number_of_orders_fulfilled
FROM 
`my-sql-project-336018.warehouse_orders.warehouse` AS warehouse
LEFT JOIN 
`my-sql-project-336018.warehouse_orders.orders` AS orders
ON 
warehouse.warehouse_id = orders.warehouse_id
GROUP BY warehouse.warehouse_id, warehouse.warehouse_alias

This is the result we got so far...


Some warehouse/s has zero fulfilled orders. So when we checked the details of the data, it says some warehouses are newly built and are not in operation yet. So we will have to remove those from the list.

In order to get % of orders fulfilled, we still need total orders fulfilled.
If we run an independent query for that, it would be something like this...

SELECT 
COUNT(*) as total_orders_fulfilled
FROM 
`my-sql-project-336018.warehouse_orders.orders` 
And the result of this query is...

Now adding this into our main query...

SELECT 
warehouse.warehouse_id,
warehouse.warehouse_alias,
COUNT(orders.order_id) AS number_of_orders_fulfilled,
ROUND(COUNT(orders.order_id)/(SELECT COUNT(*) FROM warehouse_orders.orders)*100,2) AS percent_order_fulfilled
FROM
 `my-sql-project-336018.warehouse_orders.warehouse` AS warehouse
LEFT JOIN 
`my-sql-project-336018.warehouse_orders.orders` AS orders
ON 
warehouse.warehouse_id = orders.warehouse_id
GROUP BY warehouse.warehouse_id, warehouse.warehouse_alias


The end result we are getting is...


If we calculate a couple of samples to make sure the result is right, out of total 9999 orders, when 548 are fulfilled, it is 5.48% of fulfillment.

This is good but we want to do some cosmetic changes to this query so it looks more presentable.

First, we want to remove warehouses with zero fulfillment which we will do using HAVING function.

Second, warehouse_alias column gives only name of the warehouse but does not indicate where this warehouse is located. So, using CONCAT function, we can have warehouse name and state name together.

Third, instead of giving exact fulfillment percentages, how about giving it in a range?
For example, we can say 0 to 20% , 20 to 60% and above 60% fulfillment
We can do this using CASE statement

Below is the CONCAT function we are going to use so we get 
State:Warehouse_alias format and we will call that column as warehouse_name

CONCAT(warehouse.state,':',warehouse.warehouse_alias) AS warehouse_name

And, to remove warehouses with zero orders,

HAVING COUNT(orders.order_id) > 0

Now result looks like this...


This looks much better. We will still try one last change here by adding fulfillment ranges from 0 to 20 to 60 using CASE statement...something like this...and we will call this column "fulfillment_summary"

CASE 
WHEN COUNT(orders.order_id)/(SELECT COUNT(*) FROM my-sql-project-336018.warehouse_orders.orders AS orders) <= 0.20
THEN "Fulfilled 0-20% of orders"
WHEN COUNT(orders.order_id)/(SELECT COUNT(*) FROM my-sql-project-336018.warehouse_orders.orders AS orders) > 0.20 AND
COUNT(orders.order_id)/(SELECT COUNT(*) FROM my-sql-project-336018.warehouse_orders.orders AS orders) <= 0.60
THEN "Fulfilled 21-60% or orders"
ELSE 
"Fulfilled more than 60% of orders"
END AS fulfillment_summary

So the full query looks like this...

SELECT 
warehouse.warehouse_id,
CONCAT(warehouse.state,':',warehouse.warehouse_alias) AS warehouse_name,
COUNT(orders.order_id) AS total_orders_fulfilled,
CASE 
WHEN COUNT(orders.order_id)/(SELECT COUNT(*) FROM my-sql-project-336018.warehouse_orders.orders AS orders) <= 0.20
THEN "Fulfilled 0-20% of orders"
WHEN COUNT(orders.order_id)/(SELECT COUNT(*) FROM my-sql-project-336018.warehouse_orders.orders AS orders) > 0.20 AND
COUNT(orders.order_id)/(SELECT COUNT(*) FROM my-sql-project-336018.warehouse_orders.orders AS orders) <= 0.60
THEN "Fulfilled 21-60% or orders"
ELSE 
"Fulfilled more than 60% of orders"
END AS fulfillment_summary
FROM 
`my-sql-project-336018.warehouse_orders.warehouse` AS warehouse
LEFT JOIN 
`my-sql-project-336018.warehouse_orders.orders` AS orders
ON 
warehouse.warehouse_id = orders.warehouse_id
GROUP BY warehouse.warehouse_id, warehouse_name
HAVING COUNT(orders.order_id) > 0
ORDER BY fulfillment_summary DESC

And our final result looks like this...





No comments:

Post a Comment

Complex query example

Requirement:  You are given the table with titles of recipes from a cookbook and their page numbers. You are asked to represent how the reci...