Develop a simple R script to execute the following data preprocessing and statistical analysis. Where required show analytical output and interpretations.
Preprocessing
Load the file “Data set .xlsx” into R (attached here). This file contains information on the times required for each of 337,145 transactions at tax collector facilities in four disguised cities of a county in Florida. This is your master data set.
Split the data by facility. You can use any of several approaches to this, including the “subset” command which will pull out only those data rows matching a logical condition. If needed help can be found online with this common R command. Bear in mind all spelling, spacing, caps, etc. must be exact for the logical test to work properly. The command is of the form:
new.data.1 = subset(master.data, facility == “Hooterville”)
3. the numerical portion of your U number as a random number seed and the method demonstrated in class, take a random sample of 70 cases from each facility. Store the sampled observations from each facility in separate data frames.
Analysis
Using your 70-case sample, construct a 90% confidence interval on the population mean transaction time for Hooterville.
Assuming the data in the primary Hooterville data frame represents the population, does your 90% confidence interval include the true population mean on the transaction time variable?
Use R and your reduced 70-case data set for Oblong. Can you say (α = .05) that the population mean transaction time is greater than 8 minutes? How about greater than 9 minutes and 15 seconds?
Referencing Part 3 above, what “test against” (mu) value in a two-tailed hypothesis test would yield p = .05 in a two-tailed hypothesis test on the Oblong transaction time?
Using R and your sample 70-case data sets, show comparative notched boxplots of the four facilities’ transaction time variable. Your boxplots should be displayed side by side in a single graphic with an appropriate title and x-axis labels. Do these plots indicate a possible difference between the transaction times for the two facilities? Do these plots indicate a difference in skewness or number of potential outliers between Hooterville and Pixley?
Using R and your sample 70-case data sets, does there appear to be a statistically significant difference (α = .05) between the mean transaction times for Hooterville and Bunnyville