Empirical Analysis

 

 

 

1. The cost of a leading liquid laundry detergent in different sizes is given below. (23 pts)
Part 1:
a) Using “size” as the independent variable and “cost” as the dependent variable, make a scatter plot.
b) Does it appear from inspection that there is a relationship between the variables? Why or why not?
c) Calculate the least squares line. Put the equation in the form of: ^y = a + bx
d) Find the correlation coefficient.
e) If the laundry detergent were sold in a 40 ounce size, find the estimated cost.
f) If the laundry detergent were sold in a 90 ounce size, find the estimated cost.
g) Use the two points in (e) and (f) to plot the least squares line on your graph from (a).
h) Does it appear that a line is the best way to fit the data? Why or why not?
i) Are there any outliers in the above data?
j) Is the least squares line valid for predicting what a 300 ounce size of the laundry detergent would
cost? Why or why not?
k) What is the slope of the least squares (best-fit) line? Interpret the slope.
Part 2:
a) Complete the above table for the cost per ounce of the different sizes.
b) Using “Size” as the independent variable and “Cost per ounce” as the dependent variable, make a
scatter plot of the data.
c) Does it appear from inspection that there is a relationship between the variables? Why or why not?
d) Calculate the least squares line. Put the equation in the form of: ^y = a + bx
e) Find the correlation coefficient.
f) If the laundry detergent were sold in a 40 ounce size, find the estimated cost per ounce.
g) If the laundry detergent were sold in a 90 ounce size, find the estimated cost per ounce.
h) Use the two points in (f) and (g) to plot the least squares line on your graph from (b).
i) Does it appear that a line is the best way to fit the data? Why or why not?
j) Are there any outliers in the above data?
k) Is the least squares line valid for predicting what a 300 ounce size of the laundry detergent would
cost per ounce? Why or why not?
l) What is the slope of the least squares (best-fit) line? Interpret the slope.
2. A biologist assumes there is linear relationship between the amount of fertilizer supplied to Tomato
plant and the yield of tomatoes obtained.
Eight tomato plant of same variety are selected at random and treated, weekly, with x grams of
fertilizer dissolved in a fixed quantity of water. The yield y kilograms of tomatoes is recorded.
Plant A B C D E F G H
x 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
y 3.9 4.4 5.8 6.6 7.0 7.1 7.3 7.7
a) Plot a scatter plot of yield y, against the amount of fertilizer, x. (3pts)
b) Calculate the Equation of the least square regression line of y on x. (6pts)
c) Estimate the yield of a plant treated, weekly with 3.2 grams of fertilizer. (2pts)
d) Indicate, why it may not be appropriate to use your equation to predict the yield of a
plant treated, weekly, with 20grams of fertilizer. (1pts)
3. We will use the dataset below to learn a decision tree which predicts if people pass machine
learning (Yes or No), based on their previous GPA (High, Medium, or Low) and whether or
not they studied.
a) What is the entropy H(Passed)? (3pts)
b) What is the entropy H(Passed | GPA)? (3pts)
c) What is the entropy H(Passed | Studied)?(3pts)
d) Draw the full decision tree that would be learned for this dataset. You do
not need to show any calculations. (3pts)

This question has been answered.

Get Answer