Data Analytics on Cyclistic Bike-Share

Google Data Analytics Capstone Project (Oct 2021)

Yan Houng
9 min readOct 5, 2021
Photo by Joshua Fernandez on Unsplash

Google Data Analytics Professional Certificate courses are courses offered by Google to teach people regarding data analysis skills and processes. I have taken the courses starting from Jun 2021. I am now on the last course and it requires me to complete a data analysis project. Thus, this is the capstone project about data analysis on Cyclistic Bike-Share business.

Table of contents:

  1. Project Backgrounds
  2. Six processes of data analysis
  3. Conclusion
  4. Reference

1. Project Backgrounds:

I am a junior data analyst who had just joined the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, our team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, our team will design a new marketing strategy to convert casual riders into annual members.

The other information about the project backgrounds can be found here.

2. Six processes of data analysis:

2.1 Ask

The main business problem that we are targeting to solve is “What marketing strategies will be able to help in converting casual riders into annual members?”. But first, we will need to think about “how do annual members and casual riders use Cyclistic bikes differently?”. Thus, we will need to identify the riding differences between annual members and casual riders.

Business Task: To design marketing strategies that are able to encourage casual riders to convert into annual members.

2.2 Prepare → Process

We will be making use of Cyclistic’s historical trip data to analyze and identify trends. The Cyclistic’s historical trip data is located here. (Note: The datasets have a different name because Cyclistic is a fictional company in this case study. The data has been made available by Motivate International Inc. under this license.) We will make use of Cyclistic’s historical trip data from July 2020 to Jun 2021. There is 1 data table for each month, which makes up a total of 12 data tables. The following is one of the data tables for July 2020.

July 2020 data table

Then, we will need to do data pre-processing for the available data. We will analyze the data structure and do some data cleaning. The tool that we are using here is R.

July 2020 data structure

There are several missing data in the 12 data tables. We have analyzed that all the columns with missing data and decided whether if the missing data will affect our data analysis. Besides, we also extracted the month, day, year, and day of the week information from the date column and created new columns for this information. Next, we calculated the ride length in time and the distance between starting and ending positions. Lastly, we created a column to decide whether the rider is returning the bike in the same station as the starting station based on the distance between starting and ending positions. All the changes to the original data have been recorded down as a changelog and written in R markdown.

For details, please refer to the R markdown here.

2.3 Analyze → Share

Next, we will merge all 12 data tables into 1 data frame and proceed with data analysis on the data frame. Here are some of the findings from the bike-sharing data for 12 months:

Chart 1: Number of rides and average duration VS member or casual riders

Based on Chart 1, we can see that the total number of rides throughout a year from Jul 2020 to Jun 2021. The number of rides of annual members is higher than casual riders’. The number of rides of casual riders is 76% of annual members’. Although the number of rides for casual riders is lower, the average riding duration of casual riders is a lot higher than annual members’. It is about 2.5 times of annual members’ average riding duration. From this observation, we can see the purpose of casual riders renting a bike. The casual riders are more likely to rent a bike to cycle around for some distances, while annual members rent a bike to travel to a specific location, for example, their workplace for working.

Chart 2: The number of rides VS casual riders or members with the bicycle types

From chart 2, we can observe that the popular bicycle type is the docked bicycle among casual riders and it is the same for annual members as well. The second popular bicycle type is the classic bike and the least popular type is the electric bike.

Chart 3: The number of rides and average riding duration for different days of the week

On weekdays, we can observe that there are more rides from annual members compared to casual riders. While on weekends, there are more casual rider’s rides than the annual members’ rides. Based on chart 3, we can able to notice that the average riding duration of casual riders is higher than annual members’ average riding duration. From this information, we have more confidence to say that a lot of the annual riders rent bicycles to cycle to work on the weekdays, while there are more casual riders who rent bicycles to cycle around on the weekends as their riding duration is very high compared to annual members’.

Chart 4: The number of rides VS different time on a day

Chart 4 show the total number of rides versus different time on a day. We can identify that the peak hours of the rides are from 4 pm to 6 pm. The number of rides slowly decreases at the midnight and reaches the valley at 4 am in the early morning, then picks up the number and reaches the peak at 5 pm. After that, the number decreases drastically at night time. We can observe that there is an exception in the trend for annual members. The annual members’ bar chart shows a bimodal distribution. There is another peak at the time from 7 am to 9 am.

Chart 5: The number of rides VS different time on different days of a week

We can observe this bimodal distribution on Monday to Friday charts in chart 5. While for weekends, Saturday and Sunday, the chart showed a normal distribution with casual riders’ rides higher than annual members’ rides. From these observations, we can say that 7 am to 9 am and 4 pm to 6 pm on weekdays are the 2 peak hours where the annual members travel to and fro their houses and workplaces. For weekends, the number of riders including annual members and casual riders slowly pick up in the morning then reach its peak at 2 pm time and then the number slowly drops down.

Chart 6: Same station to return bike for different riders

We can see that there are only a few riders will return the bike to the same station where they rented the bike from Chart 6. Among these riders, the number of casual riders is about 2 times of annual members. This means that casual riders will rent bicycles to tour around an area then return the bicycles at the same station. The total number of riders (both casual riders and annual members) who cycle and return bicycles at a different station is 8 times the number of riders who return the bicycle at the same station. Thus, we can understand that the riders usually will ride the bicycle to another place and return the bicycle there.

Chart 7: Same station to return bike for different riders on different days of the week

From chart 7, we can see that there are more annual members who rent and ride bicycles to another place and return the bicycle at another station on weekdays as compared to casual riders. The annual members who need to work on weekdays will usually rent bicycles to cycle to the workplace and return bicycles at the station nearby the workplace. While for weekends, the number of casual riders and annual members who rent bicycles are almost the same for returning bicycles at a different station.

Chart 8: The number of rides & total duration VS different months

Lastly, chart 8 shows the total number of rides for different months and the total sum up riding duration of every rider for different months. The number of rides starts to increase from February and reach the highest number in June then decreases again till February. It is the same trend for the total riding duration chart where it is a normal distribution too. The chart shows that there is minimal ride duration in January and February then the riding duration picks up and reaches the peak in July. Then the total duration drops again till December. From chart 8, we can see that there are very few people renting bicycles and ride in the winter season from December to February. Then, the number of rides and riding duration increase when the temperature increase in the Spring season from March to May. The number of rides and riding duration reaches the maximum in the Summer (June to August). After Summer, the number of rides and riding duration decrease in the Autumn season and reach the minimum in Winter.

2.4 Act

In summary, the riding behavior differences between annual members and casual riders are:

I) Annual members usually rent bicycles to travel between workplace and residence on weekdays while casual riders usually rent bicycles to cycle around for a longer period.

II) There are 2 peak hours for annual members on weekdays which are before start work hours and after off-work hours. There is only 1 peak hour for casual members which is in the afternoon.

With the identification of the above differences, we can consider launching a free membership campaign for casual riders. First, we can have a free membership campaign that targets casual riders who rent bicycles in the afternoon. Those casual riders who rent bicycles in the afternoon at a certain period like 2 pm to 7 pm can enjoy free membership for 1 month. Then we can have a promotional price of membership extension that is exclusive for those casual riders who finished the 1-month free membership.

Secondly, we can have another free membership campaign that targets casual riders who rent bicycles for a long period. If the casual riders rented the bicycles longer than a certain of hours, they can enjoy free membership for 1 month.

While The riding behavior similarity between annual members and casual riders is that the number of riders is the highest during Summer and it is the lowest during Winter. Thus, 3rd recommendation is to launch the marketing campaign before Summer when the temperature starts to rise and the campaign end date is during Autumn when the temperature starts to drop. This is to ensure the effectiveness of the marketing campaign because, during the Winter season where the weather is cool, there will lesser riders on the road.

3. Conclusion

With the observations that are being shared out in chapter 2.3, there are 3 recommendations suggested:

A) Launch a free membership program for casual riders who ride bicycles in the afternoon.

B) Launch a free membership program for casual riders who ride bicycles for a long period.

C) Launch a marketing campaign before Summer and end the campaign before Winter.

So, that's what I will suggest to the marketing team in Cyclistic. That's the end of the case study of the capstone project for Google Data Analytics Professional Certificate courses. Thank you very much and hope you enjoy the reading.

4. Reference

I) My GitHub repository for this project:- https://github.com/houng87/bike-share_data_analysis_OCT2021

II) Cyclistic’s historical trip data:- https://divvy-tripdata.s3.amazonaws.com/index.html

--

--

Yan Houng

Data Science aspirant who started in this field in 2020.