1) (15 points) Suppose you are interested in uncovering the link between air pollution and asthma. You are given data for
everyone in Santa Barbara which includes an indicator variable equal to 1 if the person has asthma (Yi), and a continuous
average pollution variable (Di). You decide to run the following regression:
Yi = β1Di + ui
(a) (8 points) Prove that under the constant treatment effect assumption:
βˆ
1 = ρ +
Pn
i=1 Diyi(0)
Pn
i=1 D2
i
For full credit you must justify each step. Hint: Begin by showing that Yi = yi(0) + ρDi
.
(b) (4 points) Would you expect βˆ
1 to give us the true causal effect of pollution on asthma rates? Why or why not?
(c) (3 points) Would your answer to part (b) change if you instead ran the following regression?
Yi = β0 + β1Di + ui
For full credit you must justify your answer.
2) (12 points) Suppose you are interested in finding the effect of X on Y. Consider the following two regressions:
Yi = α1Xi + i (1)
Yi = γ1Xi + γ2Zi + ui (2)
where Xi
is observed, and Zi
is a relevant and unobserved omitted variable. Thus, equation (1) is the regression we are
able to run, while equation (2) is the regression we would run if we could observe Zi
.
(a) (6 points) Derive an equation for the omitted variable bias caused by Zi
. For simplicity assume that Pn
i=1 Xiui = 0.
Hint: Plug equation (2) into the equation for ˆα1.
(b) (6 points) For each of the following situations give the sign of the omitted variable bias in ˆα1. For full credit you
must justify your answer:
i. (2 points) Yi
is a doctor’s mortality rate, Xi
is doctor’s pay, and Zi
is the percentage of the patients who are
low-income.
ii. (2 points) Yi
is GPA, Xi
is Class Attendance Percentage, and Zi
is Laziness.
iii. (2 points) Yi
is SAT Score, Xi
is GPA, and Zi
is Ability.
3) (3 points) Suppose you have a friend who is running a job training program designed to increase the probability of
employment. There are 100 people eligible for the program, but only 50 spots. Your friend wants to make sure that they
will be able to capture the causal effect of the program, but they also want to make sure that people aren’t put in a group
they’re unhappy with. As such, your friend decides to randomly choose 50 people for the treatment group, and then let
them decide whether or not they will be treated. Your friend then runs the following regression:
Employmenti = β0 + β1JobT rainingi + ui
Will βˆ
1 capture the causal effect of the program? If yes, explain why. If no, then describe what changes your friend should
make to ensure they capture a causal effect?