This page contains course material such as class slides, practice problems, and tutorial assignments.
There are no tutorials on January 5. Instead of attending tutorial we suggest that you spend some time getting acquainted with the basics of R. We will be using R throughout the course.
The first classes are on January 8. Before you come to class do the following:
Read through the course syllabus
Read the R resources section of the course webpage. Make sure to login to http://rstudio.chass.utoronto.ca/ (see R resources section for more details).
Sign up for the Piazza discussion forum.
Get introduced to R. Two ways to get you started are:
Complete Datacamp’s free online Introduction to R
Read chapters 1, 2, and 3 of Hands-On Programming with R, by Garrett Grolemund.
You can do both (i) and (ii), but a lot of the same content is covered. If you decide to only complete the readings then make sure to type the commands into the console window in RStudio.
Modern Data Science with R: Section 2.1 and chapter 3 up to and including section 3.2.2.
Example solutions to practice problems
Note: in question 1, the textbook asks for scatterplots of each person’s height against their father’s height. The x- and y-axes in the plots in the solutions should be switched.
Annotated slides - 10:00 class
Annotated slides - 14:00 class
For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights - NYT
The Economic Guide To Picking A College Major - FiveThirtyEight
dplyr cheat sheet #1, dplyr cheat sheet #2
Modern Data Science with R: 4.1, 4.2, 4.3, 4.4, 5.1
Example solutions to practice problems
Typo in solution to Question 2 corrected on March 1. It used to say the test statistic is 0.38 in one spot, but the test statistic is 0.17, as used elsewhere else in the solution.
Note: A new version of the unannotated slides was posted February 8 (both html and pdf). This version corrects a few typos noted in class plus a typo on pages 58 and 59 (in the mathematical note that you’re not responsible for).
Annotated slides - 10:00 class
Annotated slides - 14:00 class
Introductory Statistics with Randomization and Simulation - Sections 2.1, 2.2, 2.3 (excluding 2.3.4)
Class slides (Watch for the typo on slide 46!)
Class slides (Watch for the typo on slide 46!)
Announcement about Mental Health Project
Annotated slides - 10:00 class
Annotated slides - 14:00 class
Modern Data Science with R: 7.1, 7.2, 7.3
Annotated slides - 10:00 class
Annotated slides - 14:00 class
Modern Data Science with R: 8.1, 8.2, 8.4
Annotated slides - 10:00 class
Annotated slides - 14:00 class
Modern Data Science with R: page 189, page 465 - 468, page 470.
Geotab Data Scientist Brenda Nguyen’s presentation on Hazardous Driving Data
Annotated slides - 10:00 class
Annotated slides - 14:00 class
Section 7.6 of Modern Data Science with R
Section 1.4.1 of Introductory Statistics with Randomization and Simulation from OpenIntro
Annotated slides - 10:00 class
Annotated slides - 14:00 class
Modern Data Science with R: Chapter 6.