There is a public dataset on kaggle web site which gives you the citi bike usage in New York. Bike sharing company has 2 types of customers. One who are subscribers (most likely commuters who need to use bike everyday from one station to another) and customer(those who may rent the bike as and when needed).
I am trying to find out what routes are most popular with different user types. I have this in mind...I would like to see end result in different columns.
This is the query I wrote in SQL
I am adding start and end station name using CONCAT function and calling the column as route
I am counting rows for all columns and calling the column as num_trips as they are showing me the frequency of trips
I found that tripduration column has numerical values but its data type is string so I am going to convert its data type from string to INT using CAST function. Also I am looking for average trip duration in minutes so using AVG function, I am dividing the result by 60 since data for trip duration is in seconds
And I am removing any rows where trip duration is not available using WHERE tripduration IS NOT NULL function
And finally I am grouping and ordering the result using GROUP BY and ORDER BY functions
This is a simple SQL query and results is below
No comments:
Post a Comment