Instructions

What should I bring to tutorial on October 5?

  • R output (e.g., plots and explanations) for Question 1 (a)-(d), Question 2 (a) and (b). You can either bring a hardcopy or bring your laptop with the output.

Tutorial Grading

Tutorial grades will be assigned according to the following marking scheme.

Mark
Attendance for the entire tutorial 1
Assigned homework completiona 1
In-class exercises 4
Total 6

Practice Problems

Question 1

One survey showed that among 785 randomly selected subjects who completed four years of college, 144 smoke and 641 do not (based on data from the American Medical Association). The rate of smoking in the general population is reported to be 27%. Researchers are interested in finding out if the rate of smoking is different for college graduates than for the general population.

  1. What are appropriate null and alternative hypotheses to test the claim?

  2. Assume you conduct a hypothesis test using simulation and get the following empirical distribution for values of the test statistic \(\hat{p}\), assuming the null hypothesis is true. For simplicity, this distribution only shows the results of 300 simulations. There are 300 dots on the plot, one for each simulation (note that in practice, 300 simulations is not sufficient). What does each dot on the plot represent?

  1. Based on the plot above, what is your estimate of the P-value? How would you interpret this p-value?

  2. Which of the following is a valid conclusion for this hypothesis test?
    1. There is very strong evidence for the null hypothesis that the rate of smoking is lower for college graduates than in the general population.
    2. There is insufficient evidence to reject the null hypothesis that the rate of smoking is lower for college graduates than in the general population.
    3. There is very strong evidence against the null hypothesis that the rate of smoking is the same for college graduates and the general population.
    4. There is very strong evidence for the null hypothesis that the rate of smoking is the same for college graduates and the general population.

Question 2

In August 2017, researchers at Dalhousie university conducted a poll to learn more about attitudes towards the legalization of marijuana across Canada. They found that 14.5% of respondants strongly disagreed with the legalization of recreational marijuana, based on a sample of 1,087 adults who had been living in Canada for at least 12 months.

Suppose a headline reads “One out of six Canadians strongly disagree with the legalization of recreational marijuana in Canada”.

  1. State the null hypothesis and alternative hypothesis to test if the Dalhousie survey data contradicts the claim.

  2. Simulate 10,000 datasets to determine whether the survey data contradicts the headline. What do you conclude?

  3. Repeat the simulation from (b) with 100, 1000, 10000, and 50000 simulated datasets by writing a function with the number of repetitions as the argument which returns the estimated p-value. For each number repetition, record the p-value and the time it takes to obtain the simulation results. To calculate runtime, you can use the Sys.time() function to record the time before and after your simulation, and then calculate the difference. For example

startTime <- Sys.time()

YOUR CODE

endTime <- Sys.time()

endTime - startTime # This is the time it took to run your code

Which of the estimated pvalues do you think is the best estimate of the “true” p-value? What happens to the runtime as the number of simulations increases?

Question 3

In a Gallup poll of 1012 randomly selected adults, 9% said that cloning of humans should be allowed. We want to test the hypothesis that a proportion \(c\) of adults believe cloning should be allowed.

  1. What are the (general) null hypothesis and alternative hypothesis for this test?

  2. Write a function to test this hypothesis which takes the following arguments:
    • prop_cloning: the proportion of adults who believe cloning should be allowed (under the null hypothesis)
    • repetitions: the number of repetitions

    and returns the p-value.

  3. For each of the statements below, use your function from (b) to determine which of the following headlines are contradicted by the Gallop poll. In each case, interpret the p-value and draw a conclusion.

    1. “12% of people believe that cloning humans should be allowed”

    2. “Poll reports that 90% of people believe cloning humans should not be allowed”