Tutorial grades will be assigned according to the following marking scheme.
| Mark | |
|---|---|
| Attendance for the entire tutorial | 1 | 
| Assigned homework completiona | 1 | 
| In-class exercises | 4 | 
| Total | 6 | 
These problems are based on the lesson Joining Data Frames.
The file heroes_information_exer.csv contains some information on superheroes and super_hero_powers_exer.csv conatins some information on powers of superheroes.
The following questions are based on data in heroes_information.csv and super_hero_powers.csv.
heroes_information.csv and super_hero_powers.csv into R using read_csv from the tidyverse library. Here is the R code. How may variables and observations are in each data frame?library(tidyverse)
heroinfo_url <- "https://raw.githubusercontent.com/ntaback/UofT_STA130/master/Fall2018/week5/heroes_information_exer.csv"
heropower_url <- "https://raw.githubusercontent.com/ntaback/UofT_STA130/master/Fall2018/week5/super_hero_powers_exer.csv"
hero_info <- read_csv(heroinfo_url)
hero_power <- read_csv(heropower_url)
glimpse(hero_info)## Observations: 487
## Variables: 4
## $ name      <chr> "A-Bomb", "Abe Sapien", "Abin Sur", "Abomination", "...
## $ Alignment <chr> "good", "good", "good", "bad", "bad", "good", "good"...
## $ Weight    <dbl> 441, 65, 90, 441, 122, 88, 61, 81, 104, 108, 90, 90,...
## $ Publisher <chr> "Marvel Comics", "Dark Horse Comics", "DC Comics", "...glimpse(hero_power)## Observations: 667
## Variables: 4
## $ name         <chr> "3-D Man", "A-Bomb", "Abe Sapien", "Abin Sur", "A...
## $ Agility      <chr> "True", "False", "True", "False", "False", "False...
## $ Flight       <chr> "False", "False", "False", "False", "False", "Tru...
## $ Marksmanship <chr> "False", "False", "True", "False", "False", "Fals...Use name as the key since it uniquely identifies observations.
heroes_information also have data in super_hero_powers?inner_join(hero_info, hero_power, by = "name") %>% head()inner_join(hero_info, hero_power, by = "name") %>% summarise(n = n())The proportion is 460/487 = 0.94.
weight for superheroes for each category of marksmanship? (HINT: use the group_by() function then summarise())left_join(hero_info, hero_power, by = "name") %>% 
  group_by(Marksmanship) %>% 
  summarise(n = n(), 
            mean_wt = mean(Weight, na.rm = TRUE), 
            sd_wt = sd(Weight, na.rm = TRUE),
            median_wt = median(Weight), 
            iqr_wt = IQR(Weight))left_join(hero_info, hero_power, by = "name") %>% 
  filter(Marksmanship != "NA") %>%
  ggplot(aes(x = Marksmanship, y = Weight)) + geom_boxplot()Superheroes with marksmanship are thinner compared to those without marksmanship. The variability in weight is greater in those without marksmanship compared to those without.