Statistics for Public and NP Managers

  This assignment requires you to locate at least one dataset that is relevant to you. This can be a dataset that you discussed in Assignment 1, though it does not need to be. Unless the dataset is very large (i.e., > 1GB) or includes personally identifiable information (i.e., full names, addresses, phone numbers, etc.), please upload your dataset to our shared Google Drive folder for the class: https://drive.google.com/drive/folders/18Z1sqE6ZJsBnYwSEfv9O9S5pnfJ2hZd1?usp=sharing As I mentioned in Assignment 1, you can find datasets all across the web. Here are two archives that might be useful: https://ds4ps.org/data/ https://guides.lib.vt.edu/c.php?g=10459&p=4009985 R also includes several datasets for you to use. To access these, you can type data() into the R console. To load one of the datasets into R, you can type data(NameOfDataset), where NameOfDataset is the name of one of the datasets that appeared in the list that popped up when you ran data(). Once you load the dataset, you can locate it by typing ls() into the R console. Please do not use mtcars, as we discussed that extensively in class and I would like you to work on something new. You should include the answers to the following questions in a single document that you upload to D2L. Please do your best to upload the document by our class period in two weeks (11/16 @ 5:30pm). If you need extra time, please email me to let me know. 1) Data Description a. Briefly describe between 5-10 columns (variables) in your dataset. If you are using the same dataset as Assignment 1, feel free to copy information from that assignment. If your dataset does not include that many columns, use another dataset or to describe more than one dataset to get to 5-10 variables. b. Briefly discuss where you obtained your dataset. Again, feel free to copy from your Assignment 1 answers if you are using the same dataset. 2) Measures of Central Tendency a. For each of the columns you described, discuss an appropriate measure of central tendency for that variable. If it does not make sense to have a measure of central tendency for one or several of your variables, explain why. You should make sure that at least 5 of your columns/variables have a meaningful measure of central tendency, so select or complement with another dataset if needed. b. For each variable for which calculating a measure of central tendency makes sense, calculate at least one measure of central tendency. If it makes sense to calculate more than one measure of central tendency for any of the variables, please do so. Remember to use na.rm = TRUE to ignore missing data. c. What can you say about the skewness of each variable from the measures of central tendency? 3) Dispersion and Confidence a. For any variable for which it makes sense, calculate measures of dispersion. Include the Range, Standard Deviation, and Variance for each variable. b. For the relevant columns (i.e., ratio/interval variables), calculate the 95% confidence interval for the population mean of the variable. Remember that the formula for a 95% confidence interval is sample_mean + 1.96*s/sqrt(n), where n is the number of observations – Note: to get the number of observations for a specific column, you can use length(dataset_name$column_name) 4) For the remaining questions, select two variables that you can use to make a comparison. You can either 1) compare two variables against each other, if the comparison makes sense, or 2) compare one variable across two different values of a second variable (e.g., as we did when we compared mpg for automatic and manual cars in the mtcars dataset). Note that if you do not have a binary variable like automatic/manual cars, you can compare observations less than and greater than the median. Describe, in words, the comparison that you plan to make. 5) T-Test a. Run a t-test to make the comparison you specified above. Copy the results of the t-test into your document (i.e., screenshot the results) b. What do the results of the t-test say? Can you successfully reject the null hypothesis that the population means in your comparison are the same? Why or why not? 6) Bayesian Inference a. Use the BEST package to make the comparison you specified above. Copy the results and HDI plot into your document (i.e., screenshot the results and plot). b. Interpret the results of the Bayesian analysis. What can you say about the difference in population means from your comparison?      

Unlock Your Academic Potential with Our Expert Writers

Embark on a journey of academic success with Legit Writing. Trust us with your first paper and experience the difference of working with world-class writers. Spend less time on essays and more time achieving your goals.

Order Now