Tutorial grades will be assigned according to the following marking scheme.
Mark | |
---|---|
Attendance for the entire tutorial | 1 |
Assigned homework completiona | 1 |
In-class exercises | 4 |
Total | 6 |
These problems are based on the lesson Joining Data Frames.
The file heroes_information_exer.csv
contains some information on superheroes and super_hero_powers_exer.csv
conatins some information on powers of superheroes.
The following questions are based on data in heroes_information.csv
and super_hero_powers.csv
.
heroes_information.csv
and super_hero_powers.csv
into R using read_csv
from the tidyverse
library. Here is the R code. How may variables and observations are in each data frame?library(tidyverse)
heroinfo_url <- "https://raw.githubusercontent.com/ntaback/UofT_STA130/master/Fall2018/week5/heroes_information_exer.csv"
heropower_url <- "https://raw.githubusercontent.com/ntaback/UofT_STA130/master/Fall2018/week5/super_hero_powers_exer.csv"
hero_info <- read_csv(heroinfo_url)
hero_power <- read_csv(heropower_url)
glimpse(hero_info)
## Observations: 487
## Variables: 4
## $ name <chr> "A-Bomb", "Abe Sapien", "Abin Sur", "Abomination", "...
## $ Alignment <chr> "good", "good", "good", "bad", "bad", "good", "good"...
## $ Weight <dbl> 441, 65, 90, 441, 122, 88, 61, 81, 104, 108, 90, 90,...
## $ Publisher <chr> "Marvel Comics", "Dark Horse Comics", "DC Comics", "...
glimpse(hero_power)
## Observations: 667
## Variables: 4
## $ name <chr> "3-D Man", "A-Bomb", "Abe Sapien", "Abin Sur", "A...
## $ Agility <chr> "True", "False", "True", "False", "False", "False...
## $ Flight <chr> "False", "False", "False", "False", "False", "Tru...
## $ Marksmanship <chr> "False", "False", "True", "False", "False", "Fals...
Use name
as the key since it uniquely identifies observations.
heroes_information
also have data in super_hero_powers
?inner_join(hero_info, hero_power, by = "name") %>% head()
inner_join(hero_info, hero_power, by = "name") %>% summarise(n = n())
The proportion is 460/487 = 0.94.
weight
for superheroes for each category of marksmanship? (HINT: use the group_by()
function then summarise()
)left_join(hero_info, hero_power, by = "name") %>%
group_by(Marksmanship) %>%
summarise(n = n(),
mean_wt = mean(Weight, na.rm = TRUE),
sd_wt = sd(Weight, na.rm = TRUE),
median_wt = median(Weight),
iqr_wt = IQR(Weight))
left_join(hero_info, hero_power, by = "name") %>%
filter(Marksmanship != "NA") %>%
ggplot(aes(x = Marksmanship, y = Weight)) + geom_boxplot()
Superheroes with marksmanship are thinner compared to those without marksmanship. The variability in weight is greater in those without marksmanship compared to those without.