Tutorial grades will be assigned according to the following marking scheme.
Mark | |
---|---|
Attendance for the entire tutorial | 1 |
Assigned homework completiona | 1 |
In-class exercises | 4 |
Total | 6 |
One survey showed that among 785 randomly selected subjects who completed four years of college, 144 smoke and 641 do not (based on data from the American Medical Association). The rate of smoking in the general population is reported to be 27%. Researchers are interested in finding out if the rate of smoking is different for college graduates than for the general population.
What are appropriate null and alternative hypotheses to test the claim?
Assume you conduct a hypothesis test using simulation and get the following empirical distribution for values of the test statistic \(\hat{p}\), assuming the null hypothesis is true. For simplicity, this distribution only shows the results of 300 simulations. There are 300 dots on the plot, one for each simulation (note that in practice, 300 simulations is not sufficient). What does each dot on the plot represent?
Based on the plot above, what is your estimate of the P-value? How would you interpret this p-value?
In August 2017, researchers at Dalhousie university conducted a poll to learn more about attitudes towards the legalization of marijuana across Canada. They found that 14.5% of respondants strongly disagreed with the legalization of recreational marijuana, based on a sample of 1,087 adults who had been living in Canada for at least 12 months.
Suppose a headline reads “One out of six Canadians strongly disagree with the legalization of recreational marijuana in Canada”.
State the null hypothesis and alternative hypothesis to test if the Dalhousie survey data contradicts the claim.
Simulate 10,000 datasets to determine whether the survey data contradicts the headline. What do you conclude?
Repeat the simulation from (b) with 100, 1000, 10000, and 50000 simulated datasets by writing a function with the number of repetitions as the argument which returns the estimated p-value. For each number repetition, record the p-value and the time it takes to obtain the simulation results. To calculate runtime, you can use the Sys.time()
function to record the time before and after your simulation, and then calculate the difference. For example
startTime <- Sys.time()
YOUR CODE
endTime <- Sys.time()
endTime - startTime # This is the time it took to run your code
Which of the estimated pvalues do you think is the best estimate of the “true” p-value? What happens to the runtime as the number of simulations increases?
In a Gallup poll of 1012 randomly selected adults, 9% said that cloning of humans should be allowed. We want to test the hypothesis that a proportion \(c\) of adults believe cloning should be allowed.
What are the (general) null hypothesis and alternative hypothesis for this test?
prop_cloning
: the proportion of adults who believe cloning should be allowed (under the null hypothesis)repetitions
: the number of repetitionsand returns the p-value.
For each of the statements below, use your function from (b) to determine which of the following headlines are contradicted by the Gallop poll. In each case, interpret the p-value and draw a conclusion.
“12% of people believe that cloning humans should be allowed”
“Poll reports that 90% of people believe cloning humans should not be allowed”