Statistics for Public and NP Managers

 

This assignment requires you to locate at least one dataset that is relevant to you. This can be a dataset
that you discussed in Assignment 1, though it does not need to be. Unless the dataset is very large (i.e.,
> 1GB) or includes personally identifiable information (i.e., full names, addresses, phone numbers, etc.),
please upload your dataset to our shared Google Drive folder for the class:
https://drive.google.com/drive/folders/18Z1sqE6ZJsBnYwSEfv9O9S5pnfJ2hZd1?usp=sharing
As I mentioned in Assignment 1, you can find datasets all across the web. Here are two archives that
might be useful:
https://ds4ps.org/data/
https://guides.lib.vt.edu/c.php?g=10459&p=4009985
R also includes several datasets for you to use. To access these, you can type data() into the R console.
To load one of the datasets into R, you can type data(NameOfDataset), where NameOfDataset is the
name of one of the datasets that appeared in the list that popped up when you ran data(). Once you
load the dataset, you can locate it by typing ls() into the R console. Please do not use mtcars, as we
discussed that extensively in class and I would like you to work on something new.
You should include the answers to the following questions in a single document that you upload to D2L.
Please do your best to upload the document by our class period in two weeks (11/16 @ 5:30pm). If you
need extra time, please email me to let me know.
1) Data Description
a. Briefly describe between 5-10 columns (variables) in your dataset. If you are using the
same dataset as Assignment 1, feel free to copy information from that assignment. If
your dataset does not include that many columns, use another dataset or to describe
more than one dataset to get to 5-10 variables.
b. Briefly discuss where you obtained your dataset. Again, feel free to copy from your
Assignment 1 answers if you are using the same dataset.
2) Measures of Central Tendency
a. For each of the columns you described, discuss an appropriate measure of central
tendency for that variable. If it does not make sense to have a measure of central
tendency for one or several of your variables, explain why. You should make sure that at
least 5 of your columns/variables have a meaningful measure of central tendency, so
select or complement with another dataset if needed.
b. For each variable for which calculating a measure of central tendency makes sense,
calculate at least one measure of central tendency. If it makes sense to calculate more
than one measure of central tendency for any of the variables, please do so. Remember
to use na.rm = TRUE to ignore missing data.
c. What can you say about the skewness of each variable from the measures of central
tendency?
3) Dispersion and Confidence
a. For any variable for which it makes sense, calculate measures of dispersion. Include the
Range, Standard Deviation, and Variance for each variable.
b. For the relevant columns (i.e., ratio/interval variables), calculate the 95% confidence
interval for the population mean of the variable. Remember that the formula for a 95%
confidence interval is sample_mean + 1.96*s/sqrt(n), where n is the number of
observations – Note: to get the number of observations for a specific column, you can
use length(dataset_name$column_name)
4) For the remaining questions, select two variables that you can use to make a comparison. You
can either 1) compare two variables against each other, if the comparison makes sense, or 2)
compare one variable across two different values of a second variable (e.g., as we did when we
compared mpg for automatic and manual cars in the mtcars dataset). Note that if you do not
have a binary variable like automatic/manual cars, you can compare observations less than and
greater than the median. Describe, in words, the comparison that you plan to make.
5) T-Test
a. Run a t-test to make the comparison you specified above. Copy the results of the t-test into your document (i.e., screenshot the results)
b. What do the results of the t-test say? Can you successfully reject the null hypothesis that the population means in your comparison are the same? Why or why not?
6) Bayesian Inference
a. Use the BEST package to make the comparison you specified above. Copy the results and HDI plot into your document (i.e., screenshot the results and plot).
b. Interpret the results of the Bayesian analysis. What can you say about the difference in population means from your comparison?

 

 

 

This question has been answered.

Get Answer