Instructions

What should I bring to tutorial on January 26?

Bring your answers to Questions 1 and 2.

First steps to answering these questions.

  • Download this R Notebook directly into RStudio by typing the following code into the RStudio console window.
file_url <- "https://raw.githubusercontent.com/ntaback/UofT_STA130/master/week3/Week3PracticeProblems-student.Rmd"
download.file(url = file_url , destfile = "")

Look for the file “Week3PracticeProblems-student.Rmd” under the Files tab then click on it to open.

  • Change the subtitle to “Week3PracticeProblemsSolutions” and change the author to your name and student number.

  • Type your answers below each question. Remember that R code chunks can be inserted directly into the notebook by choosing Insert R from the Insert menu (see Using R Markdown for Class Assignments). In addition this R Markdown cheatsheet, and reference are great resources as you get started with R Markdown.

Practice Problems

Question 1

  1. Create an object in R named coin that represents the outcomes of flipping a coin.
  coin <- c("H","T")
  1. Use the sample() function to simulate flipping a coin 10 times. Run it twice do you get the same results?
coin <- c("H","T")
sample(coin, 10, replace = T)
##  [1] "T" "T" "H" "H" "T" "T" "H" "T" "T" "H"
  1. Calculate the number of heads you obtained in part (b). (HINT: save the result of the coin toss in part (b) as a vector then create a logical vector to check which elements are heads).
coin <- c("H","T")
results <- sample(coin, 10, replace = T)
sum(results == "H")
## [1] 3
  1. Write a function that returns the sum of heads in 10 coin tosses.
flipcoin <- function(){
  coin <- c("H","T")
  results <- sample(coin, 10, replace = T)
  sum(results == "H")
}
flipcoin()
## [1] 8
  1. Change the function that you wrote in part (d) to include an argument for the number of flips. So, your function should return the number of heads in \(n\) flips, where \(n\) is the number of flips.
flipcoin <- function(n){
  coin <- c("H","T")
  results <- sample(coin, n, replace = T)
  sum(results == "H")
}
flipcoin(20)
## [1] 17
  1. Modify the function that you wrote in part (e) to return the percentage of heads in \(n\) flips instead of the total number of flips.
flipcoin <- function(n){
  coin <- c("H","T")
  results <- sample(coin, n, replace = T)
  (sum(results == "H") / n)*100
}
flipcoin(20)
## [1] 75
  1. Create a scatterplot of the number of the percent of heads versus the number of coin tosses, where the number of coin tosses ranges from 1 to 500. The following starter code assumes that your function is named flipcoin().
library(tidyverse)
numtoss <- 1:500
perchead <- sapply(1:500,flipcoin)
coinflips <- data.frame(numtoss,perchead)
coinflips %>% ggplot(aes(x = numtoss, y = perchead)) + geom_point(alpha = 0.5) + geom_hline(yintercept = 50, colour = "red")

What do you observe as the number of tosses approaches 500?

The proportion of heads approaches 50%.

Question 2

  1. Read in the data sets fludat_prov.csv and popdat.csv into separate R data frames
library(tidyverse)
fludat_prov <- read_csv("fludat_prov.csv") # import data from file
popdat <- read_csv("popdat.csv") # import data from file

Examine the data frames using the Environment tab in RStudio. Do the data frames have the same number of rows and columns?

Yes.

  1. The names of the Provinces and Territories are different in fludat_prov and popdat. Recode so that the names are the same in both data frames.
fludat_prov$prov <- recode(fludat_prov$prov, "Province of Québec" = "Quebec", "Province of Ontario" = "Ontario", "Province of Saskatchewan" = "Saskatchewan", "Province of Alberta" = "Alberta")
  1. Use the popdat data frame to calculate the number of provinces/territories in each region?
popdat %>% group_by(region) %>% summarise(n = n())
  1. Using the popdat data fill in the missing regions in the region variable and repeat part (c).
popdat$region[popdat$prov == "Alberta"] <- "West" #recode only the region value for Alberta
popdat$region[popdat$prov == "Quebec"] <- "East" #recode only the region value for Alberta
popdat %>% group_by(region) %>% summarise(n = n())
  1. Join the popdat and fludat tables to create a new data frame. How many variables and observations are in the new data frame? Is this what you expected? Explain.
fludat_prov %>% inner_join(popdat, by = "prov")
  1. Calculate the rate of Flu A, the number of flu A cases divided by the number tested, in each province territory. Which province has the highest rate?
fludat_prov %>% 
  inner_join(popdat, by = "prov") %>% 
  group_by(prov) %>% 
  mutate(rate = fluA/testpop_size) %>%
  arrange(desc(rate))

R Markdown source