Commuter Van Express, Inc. (CVE) is a commuter shuttle service with operations in a large US
city where public transportation is frequently delayed and overcrowded. The company operates
a fleet of 14-passenger vans, equipped with WiFi and comfortable seats, to provide shuttle
service along fixed routes between residential neighborhoods and the city center during
commuting hours (running toward the city center in the morning and towards the residential
areas in the evening). CVE’s customers use a website or mobile app to book a seat in one of
the shuttles, thereby guaranteeing that space will be available. Vans run on a regular schedule
along each route, and the app includes location tracking to provide users with real-time arrival
information.
CVE uses analytics platforms to collect and organize data on their ongoing operations. One set
of metrics includes ride volume (number of rides actually taken each day) and revenues, which
can vary per ride due to volume discounts on package purchases (e.g. 12 rides for the price of
10), a monthly subscription option, and various short-term coupons and promotions. Their
platforms also track user activity in the mobile app, including actions like starting a new session,
tapping on a stop, booking a ride, etc.
It is April 1, 2016 and CVE has hired you as a consultant to help them understand their recent
performance and develop a method to forecast future rides and revenues. To assist in your
analysis, the company has provided you with daily data from its analytics platforms for the first
quarter of 2016. The dataset has been reviewed by CVE’s analytics team and confirmed to be
clean and free of errors.
Use RStudio to answer the following questions. Provide your written answers, along with any
relevant tables and charts, in a single PDF file. Any charts included in your report should be
properly labeled and formatted for an audience of company executives. Do not include R code
in your PDF report. RMarkdown is not required or suggested for this assignment. You should
also submit a single .R script file with your code for the analysis.
Regression Analysis.
1. Because customers value flexibility in their commuting plans, CVE allows customers to
cancel a booking without penalty up until the van they booked arrives at their chosen
stop. As a result, not all ride bookings result in a ride actually taking place. Estimate a
simple linear regression model to understand the relationship between daily bookings
and daily completed rides. Report the estimated regression equation and R
2 value and
interpret them in words.
Professor Kate Ashley
MISM 6202
2. CVE would like to know if ride bookings through the mobile app can be predicted using
the actions that an app user may perform prior to booking: namely, starting a session,
tapping on a stop, tapping on the sidebar, and viewing van ETAs. Estimate a multiple
regression model that uses the relevant variables to predict ride bookings. Multiple
models involving these variables are possible; select the best model and explain your
choice, citing specific numerical evidence from the regression output. Report the
estimated regression equation and R
2 value and interpret them in words.
Forecasting.
3. Create a well-formatted and labeled scatter plot to visually inspect the ‘rides’ variable.
Describe any trend and seasonality that appear to be present.
4. Construct a k-period simple moving average for the rides variable, where k is chosen
based on your assessment of the seasonality patterns in the data. Explain your choice of
k and report MSE, MAD, and MAPE for this forecasting model.
5. Estimate a linear trend model for the ‘rides’ variable. Report the estimated linear trend
equation and the R
2 of the model, and interpret both the equation and the R
2
in words.
6. Estimate a linear trend model with day-of-week dummy variables for the ‘rides’ variable.
Interpret both the estimated regression equation and the R
2
in words, and comment on
the magnitude of the adjusted R
2
relative to the adjusted R
2
from the regression you
performed in (5).
7. (a) Use the estimated regression equation from (6) to calculate a forecast of ‘rides’ for
each day in your dataset. Calculate MSE, MAD, and MAPE for this forecast. Comment
on which of the two forecasts you have calculated in this problem set (from Q4 and this
question) performs the best and why that method is best-suited to this data.
(b) Use the estimated regression equation from (7a) to forecast daily completed rides for
each weekday in the next month (April 1-April 29). Optional: Also forecast revenues for
each day.
8. Write a concise but thorough 1-2 paragraph summary of the forecasting analysis you
performed in this problem set, focusing on the most important findings. In other words,
think about the work you did for Q3-Q7 and summarize what you would communicate to
CVE to help them better understand their ridership data.