Tutorial grades will be assigned according to the following marking scheme.
Mark | |
---|---|
Attendance for the entire tutorial | 1 |
Assigned homework completiona | 1 |
In-class exercises | 4 |
Total | 6 |
year
that represents year of study at university.Insert your answer here
sample()
function to simulate selecting 100 students and recording their year of study.Insert your answer here
Insert your answer here
Insert your answer here
Insert your answer here
replicate()
function in R can evaluate an expression a fixed number of times. For example, the following code will:replicate(n = 5, mean(sample(1:10, size = 8, replace = TRUE)))
## [1] 5.750 6.750 4.250 4.875 5.875
Use the replicate()
and the function you wrote in part (e) to simulate sampling 500 students per year for 50 years. Plot the distribution of the average years of study for each year. What is the mean and standard deviation of the this distribution? What is the shape of the distribution? Explain.
Insert your answer here
The Galton
data set in the mosaic
library contains data from Francis Galton in the 1880s.
library(mosaic)
library(tidyverse)
glimpse(Galton)
## Observations: 898
## Variables: 6
## $ family <fct> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5...
## $ father <dbl> 78.5, 78.5, 78.5, 78.5, 75.5, 75.5, 75.5, 75.5, 75.0, 7...
## $ mother <dbl> 67.0, 67.0, 67.0, 67.0, 66.5, 66.5, 66.5, 66.5, 64.0, 6...
## $ sex <fct> M, F, F, F, M, M, F, F, M, F, M, M, F, F, F, M, M, M, F...
## $ height <dbl> 73.2, 69.2, 69.0, 69.0, 73.5, 72.5, 65.5, 65.5, 71.0, 6...
## $ nkids <int> 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 5, 5, 5, 5, 5, 6, 6, 6, 6...
Galton
data use R to calculate the average and variance of child’s height in families 1, 2, and 32. Which family has the largest variance? Explain the meaning of variance in this context.Insert your answer here
One way to calculate the mean height of kids in each family is to use group_by()
in combination with summarise()
function in the dplyr
library. We haven’t covered group_by()
in class yet, but an example on how to do this is given below. Note that both group_by()
and summarise()
return a data frame.
Here is an example. Consider a simple data frame marks
of the final marks for two (fictitious) students that each took five courses during their first year at UofT. The example below uses group_by()
then summarise()
to calculate the average mark for each student.
library(tidyverse)
marks <- data_frame(student = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
courses = c("STA130", "MAT137", "ECO100", "CSC148", "PHL100",
"STA130", "MAT137", "ECO100", "CSC148", "PHL100"),
grade = c(82, 83, 77, 84, 79, 83, 74, 85, 77, 72))
marks_grouped <- group_by(marks, student)
ave_grades <- summarise(marks_grouped, ave = mean(grade))
ave_grades
## # A tibble: 2 x 2
## student ave
## <dbl> <dbl>
## 1 1 81
## 2 2 78.2
# verify calculations
mean(c(82, 83, 77, 84, 79))
## [1] 81
mean(c(83, 74, 85, 77, 72))
## [1] 78.2
Insert your answer here
An article from FiveThirtyEight explored where people check the weather. Use the weather_check
data set in the fivethirtyeight
library to answer the following question.
count()
function in the dplyr
library.library(fivethirtyeight)
glimpse(weather_check)
## Observations: 928
## Variables: 9
## $ respondent_id <dbl> 3887201482, 3887159451, 3887152228, 388714...
## $ ck_weather <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, ...
## $ weather_source <chr> "The default weather app on your phone", "...
## $ weather_source_site <chr> NA, NA, NA, NA, "Iphone app", "AccuWeather...
## $ ck_weather_watch <ord> Very likely, Very likely, Very likely, Som...
## $ age <fct> 30 - 44, 18 - 29, 30 - 44, 30 - 44, 30 - 4...
## $ female <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, ...
## $ hhold_income <ord> $50,000 to $74,999, Prefer not to answer, ...
## $ region <chr> "South Atlantic", NA, "Middle Atlantic", N...
Insert your answer here
Insert your answer here
The R function sample(x, size, replace = TRUE)
applied to the vector c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
(this also can be written in R as 0:9
) can take a random sample of size size
from the numbers 0 - 9. If the sampling is truly random then we would expect that each digit, 0 - 9, to have an equal chance of being sampled.
dat <- sample(0:9, size = FILL IN CODE, replace = TRUE)
samp <- data_frame(dat)
ssamp_dat <- count(FILL IN CODE)
mutate(FILL IN CODE)
Insert your answer here
Insert your answer here
Insert your answer here
Insert your answer here