Development of a multiple regression

Read the prompt and answer the assignment questions based on my python script and discussion post below.

Prompt: Last week’s discussion involved development of a multiple regression model that used miles per gallon as a response variable. Weight and horsepower were predictor variables. You performed an overall F-test to evaluate the significance of your model. This week, you will evaluate the significance of individual predictors. You will use output of Python script from Module Six to perform individual t-tests for each predictor variable. Specifically, you will look at Step 5 of the Python script to answer all questions in the discussion this week.

Assignment:

Is at least one of the two variables (weight and horsepower) significant in the model? Run the overall F-test and provide your interpretation at 5% level of significance. See Step 5 in the Python script. Include the following in your analysis:
Define the null and alternative hypothesis in mathematical terms and in words.
Report the level of significance.
Include the test statistic and the P-value. (Hint: F-Statistic and Prob (F-Statistic) in the output).
Provide your conclusion and interpretation of the test. Should the null hypothesis be rejected? Why or why not?
What is the slope coefficient for the weight variable? Is this coefficient significant at 5% level of significance (alpha=0.05)? (Hint: Check the P-value, P is greater than the absolute value of t, for weight in Python output. Recall that this is the individual t-test for the beta parameter.) See Step 5 in the Python script.
What is the slope coefficient for the horsepower variable? Is this coefficient significant at 5% level of significance (alpha=0.05)? (Hint: Check the P-value, P is greater than the absolute value of t, for horsepower in Python output. Recall that this is the individual t-test for the beta parameter.) See Step 5 in the Python script.
What is the purpose of performing individual t-tests after carrying out the overall F-test? What are the differences in the interpretation of the two tests?
What is the coefficient of determination of your multiple regression model from Module Six? Provide appropriate interpretation of this statistic.
~~~

My Discussion Post:

According to my python, the coefficients of my regression equation should be 37.2363 for the constant, -3.7564 for the weight, and -0.0332 for the horsepower. The standard format of multiple linear regression equation is Ŷ = a + b1X1 + b2X2. In this case, our linear equation shall be Y = 37.24 – 3.756 X1 – 0.033 X2, rounded off to 3 decimal places. X1 represents the independent variable weight, and X2 represents the independent variable horsepower.

The weight and the horsepower are, in this case, the predictor variables, while miles per gallon represents the response variable, Y. This means that miles per gallon of a vehicle depend on the weight of the vehicle and the horsepower of the vehicle. The p values are greater than their respective test statistic, meaning that the variable coefficients are significant. The slope coefficients include – 3.756 for the weight and – 0.033 for the horsepower. These coefficients are very significant as they represent the mean increase in miles per gallon for every additional 1 unit of weight and horsepower respectively.

My Python Script:

In [2]:

import pandas as pd
from IPython.display import display, HTML

read data from mtcars.csv data set.

cars_df_orig = pd.read_csv(“https://s3-us-west-2.amazonaws.com/data-analytics.zybooks.com/mtcars.csv”)

randomly pick 30 observations from the data set to make the data set unique to you.

cars_df = cars_df_orig.sample(n=30, replace=False)

print only the first five observations in the dataset.

print(“Cars data frame (showing only the first five observations)n”)
display(HTML(cars_df.head().to_html()))
Cars data frame (showing only the first five observations)

Unnamed: 0 mpg cyl disp hp drat wt qsec vs am gear carb
4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
14 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
15 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
28 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
11 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
STEP 2: SCATTERPLOT OF MILES PER GALLON AGAINST WEIGHT¶
The block of code below will create a scatterplot of the variables “miles per gallon” (coded as mpg in the data set) and “weight” of the car (coded as wt).

In [4]:

import matplotlib.pyplot as plt

create scatterplot of variables mpg against wt.

plt.plot(cars_df[“wt”], cars_df[“mpg”], ‘o’, color=’red’)

set a title for the plot, x-axis, and y-axis.

plt.title(‘MPG against Weight’)
plt.xlabel(‘Weight (1000s lbs)’)
plt.ylabel(‘MPG’)

show the plot.

plt.show()
STEP 3: SCATTERPLOT OF MILES PER GALLON AGAINST HORSEPOWER¶
The block of code below will create a scatterplot of the variables “miles per gallon” (coded as mpg in the data set) and “horsepower” of the car (coded as hp).

In [6]:

import matplotlib.pyplot as plt

create scatterplot of variables mpg against hp.

plt.plot(cars_df[“hp”], cars_df[“mpg”], ‘o’, color=’blue’)

set a title for the plot, x-axis, and y-axis.

plt.title(‘MPG against Horsepower’)
plt.xlabel(‘Horsepower’)
plt.ylabel(‘MPG’)

show the plot.

plt.show()
STEP 4: CORRELATION MATRIX FOR MILES PER GALLON, WEIGHT AND HORSEPOWER¶
Now you will calculate the correlation coefficient between the variables “miles per gallon” and “weight”. You will also calculate the correlation coefficient between the variables “miles per gallon” and “horsepower”. The corr method of a dataframe returns the correlation matrix with the correlation coefficients between all variables in the dataframe. You will specify to only return the matrix for the three variables.

In [8]:

create correlation matrix for mpg, wt, and hp.

The correlation coefficient between mpg and wt is contained in the cell for mpg row and wt column (or wt row and mpg column).

The correlation coefficient between mpg and hp is contained in the cell for mpg row and hp column (or hp row and mpg column).

mpg_wt_corr = cars_df[[‘mpg’,’wt’,’hp’]].corr()
print(mpg_wt_corr)
mpg wt hp
mpg 1.000000 -0.869627 -0.791014
wt -0.869627 1.000000 0.663993
hp -0.791014 0.663993 1.000000
STEP 5: MULTIPLE REGRESSION MODEL TO PREDICT MILES PER GALLON USING WEIGHT AND HORSEPOWER¶
This block of code produces a multiple regression model with “miles per gallon” as the response variable, and “weight” and “horsepower” as predictor variables. The ols method in statsmodels.formula.api submodule returns all statistics for this multiple regression model.

In [10]:

from statsmodels.formula.api import ols

create the multiple regression model with mpg as the response variable; weight and horsepower as predictor variables.

model = ols(‘mpg ~ wt+hp’, data=cars_df).fit()
print(model.summary())

OLS Regression Results

Dep. Variable: mpg R-squared: 0.838
Model: OLS Adj. R-squared: 0.826
Method: Least Squares F-statistic: 69.75
Date: Wed, 07 Oct 2020 Prob (F-statistic): 2.16e-11
Time: 00:12:13 Log-Likelihood: -69.283
No. Observations: 30 AIC: 144.6
Df Residuals: 27 BIC: 148.8
Df Model: 2

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

Intercept 37.2363 1.583 23.518 0.000 33.988 40.485
wt -3.7564 0.632 -5.943 0.000 -5.053 -2.460

hp -0.0332 0.009 -3.686 0.001 -0.052 -0.015

Omnibus: 4.804 Durbin-Watson: 2.679
Prob(Omnibus): 0.091 Jarque-Bera (JB): 3.519
Skew: 0.825 Prob(JB): 0.172

Kurtosis: 3.307 Cond. No. 574.

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

This question has been answered.

Get Answer