How to write a simple SQL query

  There is  a public dataset on kaggle web site which gives you the citi bike usage in New York. Bike sharing company has 2 types of customers. One who are subscribers (most likely commuters who need to use bike everyday from one station to another) and customer(those who may rent the bike as and when needed).

Dataset

I am trying to find out what routes are most popular with different user types. I have this in mind...I would like to see end result in different columns.


This is what I have in mind to accomplish as end result



This is the query I wrote in SQL

SELECT 
        usertype,  
        CONCAT(start_station_name,"to",end_station_name) AS route,
        COUNT(*) AS num_trips,
        ROUND(AVG(CAST(tripduration AS INT64 )/60),2) AS duration       
FROM `bigquery-public-data.new_york.citibike_trips` 
WHERE tripduration IS NOT NULL
GROUP BY usertype,start_station_name,end_station_name
ORDER BY num_trips DESC
LIMIT 10



I am adding start and end station name using CONCAT function and calling the column as route

I am counting rows for all columns and calling the column as num_trips as they are showing me the frequency of trips

I found that tripduration column has numerical values but its data type is string so I am going to convert its data type from string to INT using CAST function. Also I am looking for average trip duration in minutes so using AVG function, I am dividing the result by 60 since data for trip duration is in seconds

And I am removing any rows where trip duration is not available using WHERE tripduration IS NOT NULL function

And finally I am grouping and ordering the result using GROUP BY and ORDER BY functions

This is a simple SQL query and results is below



No comments:

Post a Comment

Complex query example

Requirement:  You are given the table with titles of recipes from a cookbook and their page numbers. You are asked to represent how the reci...