Instructions

What should I bring to tutorial on November 2?

  • R output (e.g., plots and explanations) for Question 1. You can either bring a hardcopy or bring your laptop with the output.

Tutorial Grading

Tutorial grades will be assigned according to the following marking scheme.

Mark
Attendance for the entire tutorial 1
Assigned homework completiona 1
In-class exercises 4
Total 6

Practice Problems

Question 1

In this question, we will look at the median duration of flights departing from New York in 2013. We’ll take as our population all flights departing from New York in 2013 in the flights data. The flights data is one of the data sets in the library nycflights13.

  1. Plot a histogram of air_time for the whole population and describe it.

  2. Plot the histograms of air_time for (i) a sample of size 25, (ii) a sample of size 100. Compare these histograms and the histogram from part (a).

  3. Do each of the following:
    1. Generate 500 samples of size 25 and plot a histogram of the median air_time in each of these samples.

    2. Generate 500 samples of size 100 and plot a histogram of the median air_time in each of these samples.

    3. Do these histograms represent sampling distributions or bootstrap distributions?

    4. Compare the histograms in (i) and (ii) to the histograms from (a) and (b)

Question 2

In this question, we will look at the Gestation data in the mosaicData library. First load the library:

library(mosaicData)

You can read about the data by looking at the help information for the data frame

help(Gestation)

In this question, you will find confidence intervals for parameters related to the distribution of the mother’s age, which is the variable age. First remove the two observations which have missing values for age.

Gestation <- Gestation %>% filter(!is.na(age))
  1. The plot below shows the bootstrap distribution for the mean of mother s age, for 100 bootstrap samples. The red dot is the estimate of the mean for the first bootstrap sample, and the grey dots are the estimates of the mean for the other 99 bootstrap samples.

    1. Explain how the value of the red dot is calculated.

    2. Using this plot, estimate as accurately as possible a 90% confidence interval for the mean of mother’s age.

  1. Find a 99% bootstrap confidence interval for the median of mother’s age. Use 5000 bootstrap samples.

Question 3

In lecture this week, we used Gunturkun’s data to calculate confidence intervals for the proportion of couples who tilt their heads to the right when they kiss. Our 95% confidence interval was (0.56, 0.73).

  1. If we want to be very certain that we capture the population parameter of interest, should we use a larger confidence level or a smaller confidence level? Will this result in a wider confidence interval or a narrower confidence interval?

  2. Which of the following statements are true and which are false?
    1. We are 95% confident that between 56% and 73% of kissing couples in this sample tilt their head to the right when they kiss.

    2. We are 95% confident that between 56% and 73% of all kissing couples in the population tilt their head to the right when they kiss.

    3. If we considered many random samples of 124 couples, and we calculated 95% confidence intervals for each sample, 95% of these confidence intervals will include the true proportion of kissing couples in the population who tilt their heads to the right when they kiss.

  3. In the week 4 lecture, we carried out an hypothesis test to determine whether couples are equally likely to tilt their heads to the right or to the left when they kiss. We tested the hypotheses: \[H_0: p = 0.5\] versus \[H_A: p \ne 0.5\] where \(p\) is the proportion of couples who tilt their heads to the right when they kiss. Using Gunturkun’s data, our P-value was 0.003.
    How do this hypothesis test and the confidence interval tell a similar story?