Problem Set 2
C. Durso
Introduction
These questions were rendered in R markdown through RStudio (https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf, http://rmarkdown.rstudio.com ).
Please generate your solutions in R markdown and upload both a knitted doc, docx, or pdf document in addition to the Rmd file.
The questions in this problem set use material from the slides on discrete probability spaces and the Rmd “Discrete_Probability_Distributions_2_3_3.Rmd”.
Load Data
data(“PolioTrials”)
dat<-PolioTrials
Question 1
Please carry out the analysis below and answer the questions that follow. For this assignment, please do all calculations in R and show the code and the results in the knit document.
Frame Question
The basic question “did the vaccine work?” was addressed in problem set 1 using the “rbinom” function to implement the idea that populations in the “Vaccinated” and “Placebo” groups in the “Randomized Control” experiment were the same in regards to paralytic polio cases by using the “rbinom” function to generate a count of polio cases in a population of the size “Vaccinated” population with a probability of paralytic polio estimated from the combined groups. The “rbinom” function with arguments “rbinom(n,size,prob)” draws n random samples from the binomial distribution Binom(size,prob)used to model the number of successes in “size” independent Bernoulli trials with probability of success equal to “prob”.
Note that the function “dbinom(x,size,prob)” gives the value of the density function for Binom(size,prob) at x. Thus “dbinom(x,size,prob)” give the probability of exactly x successes in “size” independent Bernoulli trials in which the probability of success is “prob”. Likewise,the function “pbinom(x,size,prob)” returns the probability of the event that the number of successes is in the set {0,1,…x}.
Q1, part 1
(5 points)
Consider the null model that the number of paralytic polio cases in the “Vaccinated” group follows the binomial distribution with “size” equal to the number of participants in the “Vaccinated” group of the “RandomizedControl” experiment and “prob” equal to the proportion of paralytic polio cases in the pooled “Vaccinated” and “Placebo” groups of the “RandomizedControl” experiment. What is the probability under this model of the event of that the number of cases is less than or equal to the observed number of cases? Please calculate this directly rather than simulating it.
Q1, part 2
(5 points)
Is the value computed in part 1 strong evidence against the null model?
Question 2
Please carry out the analysis below and answer the questions that follow.
Frame Question
In this section, you will address the question of whether the “NotInoculated” and “Placebo” groups in the “Randomized Control” experiment had significantly different rates of paralytic polio using a binomial probability space directly.
Let the null model for the number of paralytic polio cases in the “Placebo” group of the “RandomizedControl” be the binomial probability space with “size” number of participants in the “Placebo” group of the “RandomizedControl” experiment and “prob” equal to the proportion of paralytic polio cases in the pooled “Placebo” and “NotInnoculated” groups of the “RandomizedControl” experiment, capturing the idea that populations in the “NotInnoculated” and “Placebo” groups in the “RandomizedControl” experiment were the same in regards to paralytic polio cases.
Q2, part 1
(10 points)
The function “qbinom(p,size,prob)” returns the smallest value of x for which the value of “pbinom(x,size,prob)” is greater that or equal to p. Please generate a plot with possible counts of paralytic polio cases under this model represented by position along the horizontal axis and the corresponding probability represented by position along the vertical axis. Please restrict the represented possibilities x to those between “qbinom(0.0001,size,prob)” and “qbinom(0.9999,size,prob)” with “size” and “prob” as in the null model. Using “geom_vline”, add a vertical line at the observed number of paralytic polio cases.
Q2, part 2
(5 points)
Does the observed count of paralytic polio cases appear to be a fairly typical value under the null model or a fairly unusual value? Please explain.
Q2, part 3
(5 points)
Please calculate the probability under the null model of the event that the count of paralytic polio cases in the “Placebo” group is greater than or equal to the observed value. Please be careful to include the probability that the count equals the observed value. (Hint: 0.002402719 is incorrect.)
Q2, part 4
In tests of a null hypothesis, the probability of interest can be described as the probability of the event that the value of the test statistic under the null model is as extreme as or more extreme than the observed value of the test statistic. What constitutes “extreme” can be defined in different ways, depending on the context.
In the test of the null hypothesis that the number of paralytic polio cases in the “Placebo” group of the “RandomizedControl” is consistent with the counts distributed according to the binomial probability space with “size” number of participants in the “Placebo” group of the “RandomizedControl” experiment and “prob” equal to the proportion of paralytic polio cases in the pooled “Placebo” and “NotInnoculated” groups of the “RandomizedControl” experiment, there are several possibilities. For example, if “extreme” is defined as “greater than or equal to”, the probability in Q2, part 3 is the probability of interest.
If “extreme” is defined as “having probability less than or equal to the observed value”, what values of x are extreme in this sense under the null model described in this question? Please give your answer in terms of intervals of integers. You may use plots to help you understand the structure of this event. (10 points)
Define the median for the distribution of a test statistic in a hypothesis test to be the smallest value m such that the probability that the probability under the null model that the test statistic is less than or equal to m is at least 0.5. If “extreme” is defined as “as far from the median as the observed value or further from the median than the observed value”, what values of x are extreme in this sense under the null model described in this problem? The median for a binomial distribution with parameters “size” and “prob” may be calculated as “qbinom(.5,size,prob)”. Please give your answer in terms of intervals of integers. (5 points)
Compute the probability of the event that the count is as far from the median as the observed value or further from the median than the observed value. Again, please be careful to include the case of equidistance. (5 points)