Regression analysis

Simulation is a powerful methodology for investigating the properties of econometric estimators and tests. The power of the method derives from being able to define and control the statistical environment in which the investigator specifies the data-generating process (DGP) and generate data used to investigate the properties. We are going to use this simulation method to examine the OLS estimator properties we learned in class. [Hint: Understand the Stata code I posted for this question.] Suppose we are interested in the effect of education on salary as expressed in the following model: ??????? = ?0 + ?1?????????? + ?? For this problem, we are going to assume that the true model is ??????? = 12000 + 1000 ?????????? + ?? The model indicates that the salary for each person is $12,000 plus $1,000 times the number of years of education plus the error term for the individual. Our goal is to explore how much our estimate of ?????????? ̂ varies. I posted a code that will simulate a data set with 100 observations. Values of education for each observation are between 0 and 16 years. The error term will be a normally distributed error term with a standard deviation of 10,000. [Hint: Understand the OLS properties.]

a. (5) Explain why the means of the estimated coefficients across the multiple simulations are what they are.

b. (5) What are the minimum and maximum values of the estimated coefficients on education? Explain whether these values are inconsistent with our statement that OLS estimates are unbiased.

c. (5) Rerun the simulation with a larger sample size in each simulation. Specifically, set the sample size to 1,000 in each simulation. Compare the mean, minimum, and maximum of the estimated coefficients on education to the original results above. Briefly explain.

d. (5) Rerun the simulation with a smaller sample size in each simulation. Specifically, set the sample size to 20 in each simulation. Compare the mean, minimum, and maximum of the estimated coefficients on education to the original results above. Briefly explain.

e. (5) Reset the sample size to 100 for each simulation, and rerun the simulation with a smaller standard deviation (equal to 500) for each simulation. Compare the mean, minimum, and maximum of the estimated coefficients on education to the original results above. Briefly explain.

f. (5) Keeping the sample size at 100 for each simulation, rerun the simulation with a larger standard deviation for each simulation. Specifically, set the standard deviation to 50,000 for each simulation. Compare the mean, minimum, and maximum of the estimated coefficients on education to the original results above. Briefly explain.

g. (5) Revert to original model (sample size at 100 and standard deviation at 10,000). Now run 500 simulations. Summarize the distribution of the ?????????? ̂ estimates as you’ve done so far, but now also plot the distribution of these coefficients using code provided. Describe the density plot in your own words.

This question has been answered.

Get Answer