Bring your answer to question 2.
file_url <- "https://raw.githubusercontent.com/ntaback/UofT_STA130/master/week6/Week6PracticeProblems-student.Rmd"
download.file(url = file_url , destfile = "Week6PracticeProblems-student.Rmd")
Look for the file “Week6PracticeProblems-student.Rmd” under the Files tab then click on it to open.
Change the subtitle to “Week 6 Practice Problems Solutions” and change the author to your name and student number.
Type your answers below each question. Remember that R code chunks can be inserted directly into the notebook by choosing Insert R from the Insert menu (see Using R Markdown for Class Assignments). In addition this R Markdown cheatsheet, and reference are great resources as you get started with R Markdown.
In lecture, we looked at the sampling distribution of the mean arrival delay for 2013 flights from New York to San Francisco for samples of size 25 and 100. We’ll now look at the median of the arrival delay. We’ll take as our population all flights from New to San Francisco in the flights
data.
arr_delay
for the populationarr_delay
for a sample of size 25arr_delay
for a sample of size 100In lecture we looked at the sampling distributions of the mean arrival delay for samples of size 25 and 100. How do they compare to the histograms in part a.?
Examine the sampling distribution of the median arrival delay for samples of size 25 by looking at a histogram of the medians for 500 samples of size 25. Examine the sampling distribution of the median arrival delay for samples of size 100 by looking at a histogram of the medians for 500 samples of size 100. Describe how these histograms compare to the histograms in part a. and to the histograms showing the sampling distributions of the mean for samples of size 25 and 100.
Bring your output for this question to tutorial on Friday February 16 (either a hardcopy or on your laptop).
In this question, we’ll look at the Gestation
data in the mosaicData
library. First load the library:
library(mosaicData)
You can read about the data by looking at the help information for the data frame
help(Gestation)
In this question, you will find confidence intervals for parameters related to the distribution of the mother’s age, which is the variable age
. First remove the two observations which have missing values for age
.
Gestation <- Gestation %>% filter(!is.na(age))
In lecture this week, we used Güntürkün’s data to calculate confidence intervals for the proportion of couples who tilt their heads to the right when they kiss. Our 95% confidence interval was (0.56, 0.73).
If we want to be very certain that we capture the population parameter of interest, should we use a larger confidence level or a smaller confidence level? Will this result in a wider confidence interval or a narrower confidence interval?
In the week 4 lecture, we carried out an hypothesis test to determine whether couples are equally likely to tilt their heads to the right or to the left when they kiss. We tested the hypotheses: \[H_0: p = 0.5\] versus \[H_A: p \ne 0.5\] where \(p\) is the proportion of couples who tilt their heads to the right when they kiss. Using Güntürkün’s data, our P-value was 0.003.
How do this hypothesis test and the confidence interval tell a similar story?