Tutorial grades will be assigned according to the following marking scheme.
Mark | |
---|---|
Attendance for the entire tutorial | 1 |
Assigned homework completiona | 1 |
In-class exercises | 4 |
Total | 6 |
These problems are based on the lesson Joining Data Frames.
The file heroes_information_exer.csv
contains some information on superheroes and super_hero_powers_exer.csv
conatins some information on powers of superheroes.
The following questions are based on data in heroes_information.csv
and super_hero_powers.csv
.
heroes_information.csv
and super_hero_powers.csv
into R using read_csv
from the tidyverse
library.If you are using rstudio.cloud then here is the R code.
library(tidyverse)
hero_info <- read_csv("heroes_information_exer.csv")
hero_power <- read_csv("super_hero_powers_exer.csv")
If you are using RStudio on your own computer then use this R code (internet connection required).
heroinfo_url <- "https://raw.githubusercontent.com/ntaback/UofT_STA130/master/Fall2018/week5/heroes_information_exer.csv"
heropower_url <- "https://raw.githubusercontent.com/ntaback/UofT_STA130/master/Fall2018/week5/super_hero_powers_exer.csv"
hero_info <- read_csv(heroinfo_url)
hero_power <- read_csv(heropower_url)
How may variables and observations are in each data frame?
Suggest a key to join the two data frames?
What proprotion of superheroes in heroes_information
also have data in super_hero_powers
?
What is the number of observations, average, median, standard deviation, and inter-quartile range of weight
for superheroes for each category of marksmanship? (HINT: use the group_by()
function then summarise()
)
Are superheroes with marksmanship thinner compared to those without marksmanship? Create a visualization to compare the distribution of weight between superheroes that have marksmanship and those that don’t have marksmanship. Which distribution has more variability?