This page contains course material such as class slides, practice problems, and tutorial assignments.
There are no tutorials on September 7. Instead of attending tutorial we suggest that you spend some time getting acquainted with the basics of R. We will be using R throughout the course.
The first classes are on September 10. Before you come to class do the following:
Read through the course syllabus.
Read the R resources section of the course webpage.
Sign up for the Piazza discussion forum.
Get introduced to R. Two ways to get you started are:
Sign up for R Studio Cloud
Complete the tutorial on R programming basics. If you would like a deeper introductory R programming tutorial then complete Datacamp’s free online Introduction to R
Read chapters 1, 2, and 3 of Hands-On Programming with R, by Garrett Grolemund.
You can do both (ii) and (iii), but a lot of the same content is covered. If you decide to only complete the readings then make sure to type the commands into the console window in RStudio.
ggplot2
library in R.Prof. Taback’s class slides - Mon 10:00 (L0101): html, pdf.
Prof. Moon’s class slides - Mon 14:00 (L0201): pdf, pdf - anotated
Grolemund, G. and Wickham, H. R for Data Science. Chapter 3.
Read this before answering practice problems using RStudio.
Prof. Taback’s class slides - Mon 10:00 (L0101): html, pdf.
Prof. Moon’s class slides - Mon 14:00 (L0201): pdf, pdf - anotated
Flu data used in class (csv format): Provincial Flu data, Provincial Population Size.
Read this before answering practice problems using RStudio.
Prof. Taback’s class slides - Mon 10:00 (L0101):
The slides on joining (merging) data frames and the Trump tweets example have been removed. You are not responsible for these topics or undertsanding this example this week. This topic will be covered on Oct.1
Prof. Moon’s class slides - Mon 14:00 (L0201):
Grolemund, G. and Wickham, H. R for Data Science. Chapter 5.
(Optional paid course) Data Camp Online Course. Data Manipulation in R with dplyr
Read this before answering practice problems using RStudio.
(NB: One question has been removed from the original post on Sept 23)
for
loopsProf. Taback’s class slides - Mon 10:00 (L0101):
Prof. Moon’s class slides - Mon 14:00 (L0201):
Sections 2.3.1, 2.3.2, 2.3.7 and 2.4 of Introductory Statistics with Randomization and Simulation from OpenIntro
Read this before answering practice problems using RStudio.
The tutorial will cover joining data frames. A short lesson on joining data frames is given here. It is strongly recommended that you study this before attempting the tutorial questions.
Read this before answering practice problems using RStudio.
Grolemund, G. and Wickham, H. R for Data Science. Chapter 13.1 - 13.4.
dplyr reference for Join two tbls together
Prof. Taback’s class slides - Mon 10:00 (L0101):
Prof. Moon’s class slides - Mon 14:00 (L0201)
Recommended reading:
Sections 2.1, 2.2, 2.3 (excluding 2.3.4) of Introductory Statistics with Randomization and Simulation from OpenIntro
(a free open-source textbook)
Read this before answering practice problems using RStudio.
Prof. Taback’s class slides - Mon 10:00 (L0101):
Prof. Moon’s class slides - Mon 14:00 (L0201):
Midterm test during tutorial. See term test information.
Prof. Taback’s class slides - Mon 10:00 (L0101):
Prof. Moon’s class slides - Mon 14:00 (L0201):
Baumer, Benjamin S., Daniel T. Kaplan, and Nicholas J. Horton. Modern data science with R. CRC Press, 2017. Pages 149-155. Available on Quercus for STA130-L0101 here, and for STA130-L0201 here. (You can safely ignore any mention of standard error or standard deviation.)
Computational and Inferential Thinking. Chapter 13. (NB: The code examples are in Python, and students are not responsbile for understanding the Python code. The discussion of the bootstrap and confidence intervals are appropriate for this course.)
Read this before answering practice problems using RStudio.
Prof. Taback’s class slides - Mon 10:00 (L0101):
Prof. Moon’s class slides - Mon 14:00 (L0201):
Baumer, Benjamin S., Daniel T. Kaplan, and Nicholas J. Horton. Modern data science with R. CRC Press, 2017. Pages 173-180, 189-192. Available on Quercus for STA130-L0101 here.
Read this before answering practice problems using RStudio.
Prof. Taback’s class slides - Mon 10:00 (L0101):
Prof. Moon’s class slides - Mon 14:00 (L0201):
Baumer, Benjamin S., Daniel T. Kaplan, and Nicholas J. Horton. Modern data science with R. CRC Press, 2017. Pages 189, 465-471. Available on Quercus for STA130-L0101 here.
Read this before answering practice problems using RStudio.
A blog post from last term’s STA130 poster fair. The pictures in this post should give student’s an idea of what your posters should look like and how they will be displayed.
Prof. Taback’s class slides - Mon 10:00 (L0101):
NYTimes: How Cheap Labor Drives China’s A.I. Ambitions discussed at the beginning of class.
Prof. Moon’s class slides - Mon 14:00 (L0201):
Section 1.4.1 of Introductory Statistics with Randomization and Simulation from OpenIntro
Read this before answering practice problems using RStudio.
Prof. Taback’s class slides - Mon 10:00 (L0101):
Prof. Taback’s class slides - Mon 14:00 (L0201):
Ten simple rules for responsible big data research Zook M, Barocas S, boyd d, Crawford K, Keller E, et al. (2017) Ten simple rules for responsible big data research. PLOS Computational Biology 13(3): e1005399.
Amazon scraps secret AI recruiting tool that showed bias against women.
Practice problems for ethics in data science are available here.
Example solutions for these problems are available here.
See information on final project.