PART I (33 points)
This question requires you to read the below article, its appendix, as well as its variable codebook
(on Moodle). The data is called “replicationdata.Rdata”.
Nicole Janz (2018) Foreign direct investment and repression: An analysis across industry
sectors, Journal of Human Rights, 17:2, 163-183, DOI: 10.1080/14754835.2017.1306691
The article and appendix, variable codebook, and data are uploaded on Moodle. For further
information, the replication materials are also online at the Harvard Dataverse repository
(https://doi.org/10.7910/DVN/WHCZJR); feel free to explore this extra information.
1) What is the research question the author is investigating? Why is this question relevant?
[4 points]
2) What are the three main dependent (outcome) variables that are related to human
rights, and how are they measured? Are they categorical or continuous, and why? Be as
specific as you can. You can use the main text as well as the variable codebook to answer
this question. [6 points]
3) We will now turn to further variables in the data set:
• Population [“Lag_logpopulation”]
• Democracy [“Lag_polity2”]
• Conflict [“Lag_confl”]
What do these variables measure, and what do the values / scores stand for? Are they
categorical or continuous, and how do you know? [8 points]
4) We now turn to Table 5 in the online appendix of the article, as displayed below. Your
task is to replicate some rows of this table. Take the variables from above
(“Lag_logpopulation”, “Lag_polity2”, “Lag_confl”) and use Rstudio to calculate and
3
describe the
a. number of observations N, mean, standard deviation (St. Dev.), minimum, and
maximum, in table format [4 points]
b. display and describe the distribution (figures and main text) of each of the
variables suitable to its measurement (e.g. histogram, bar chart, or boxplot) [8
points]
5) What is the main general result of the study, according to the conclusion? [3 points]
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Part II (33 points)
For this assignment you will use an extract from the UK 2011 census dataset (“Q3 census
data.csv”) that is available on Moodle. This dataset holds information on economic activity,
ethnicity, highest level of educational qualification and religion at the neighbourhood level in
Nottingham. Column “OA” identifies each neighbourhood. The variables we are interested in
are:
• percentage of the population in the neighbourhood who are employed (emp)
• percentage who are unemployed (unemp)
• percentage with no qualifications, not even GCSE or Foundation level (noqual)
• percentage with a university degree or above (degqual).
• variable areatype contains a classification of the neighbourhood developed by the Office
for National Statistics. There are seven types of areas in Nottingham as listed below:
Code Description
1 Cosmopolitan
2 Ethnicity central
3 Multicultural metropolitan
4 Urbanites
5 Suburbanites
6 Constrained city dwellers
4
7 Hard-pressed living
You are conducting research with the assumption that employment and previous education
(degree) are related. You should use descriptive statistics to examine and compare these
variables. Please pick two variables from the data set:
• one variable related to employment (either emp or unemp), and
• one variable related to education (either noqual or degqual).
- Describe the summary statistics of two of the variables you have selected. You should display
and describe a table (similar to table A5 above) that shows mean, standard deviation,
minimum and maximum for the two variables. Make sure to compare the two variables and
note what you have learned about education and employment. [10 points]
Next, please select one of the education variables( noqual or degqual), and answer the following
two questions: - Visualise the distribution of your education variable in a histogram, and describe the shape
of the distribution (e.g. skewed and in which direction?). Then calculate z-scores for the
values of the variable and produce a histogram to show the distribution of these z-scores.
Explain the difference between the plots using z-scores and your previous histogram. [10
points] - We now ask you to create subsets of your selected education variable per neighbourhood
type (variable areatype). Produce a table or figure showing summary statistics of the variable
for each of these neighbourhoods separately. Identify three notable differences between the
areas and summarise them below the table. [11 points]
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Part III (34 points)
This question pertains to variables in the “Constituency and Local Authority Data_England.csv”
dataset found on Moodle. For all questions below, use Rstudio to create graphs, examine
descriptive statistics, and conduct your analysis. Be sure to report in text the results of any
hypothesis test.
A researcher is interested in educational attainment and whether this differs by region (e.g. East
Midlands, London). She wishes to use information in the variable “ConstPctDegree_2015”
(which measures the percent of individuals within each parliamentary constituency who had a
Level 4 or above qualification at the time of the 2011 Census) to learn more about this topic. - Briefly describe the variable “ConstPctDegree_2015”. You should indicate:
a. the scale on which the variable is measured;
b. appropriate measures of central tendency and spread;
c. the number of observations in this variable. [3 points] - Create a figure that displays the variable’s distribution and insert it into your document.
In a sentence, describe what you see. [4 points] - Provide descriptive statistics for educational attainment. Specifically, report and
interpret the sample mean, median and standard deviation. [6 points] - Based on what you see in the figure and descriptive statistics, do you think the
distribution of the variable is skewed? If so, in which direction is it skewed? [3 points]
5 - Create a boxplot that compares educational attainment by different regions of England.
Insert this figure into your document. [3 points] - Briefly comment on what you see in this figure (e.g., does educational attainment appear
to vary across the regions? Do any regions appear different from the others?) [3 points] - The researcher has read that educational attainment is highest in southern regions of
England. Based on this, she hypotheses that the percentage of individuals living in
parliamentary constituencies in the South West with at least a university education will
be higher than the national average (calculate the national average yourself). Test her
hypothesis, use lecture and seminar materials as a guide.
a. What kind of t-test will you conduct?
b. Specify the null and alternate hypothesis.
c. State the alpha level (p-value cut-off point) you will use.
d. Conduct your test. Report the test statistic, degrees of freedom, and p value.
e. What do you conclude about educational attainment in the South West?
[12 points