This page contains course material such as class slides, practice problems, and tutorial assignments.

Week 0

September 7 Tutorial

There are no tutorials on September 7. Instead of attending tutorial we suggest that you spend some time getting acquainted with the basics of R. We will be using R throughout the course.

The first classes are on September 10. Before you come to class do the following:

  1. Read through the course syllabus.

  2. Read the R resources section of the course webpage.

  3. Sign up for the Piazza discussion forum.

  4. Get introduced to R. Two ways to get you started are:

  1. Sign up for R Studio Cloud

  2. Complete the tutorial on R programming basics. If you would like a deeper introductory R programming tutorial then complete Datacamp’s free online Introduction to R

  3. Read chapters 1, 2, and 3 of Hands-On Programming with R, by Garrett Grolemund.

You can do both (ii) and (iii), but a lot of the same content is covered. If you decide to only complete the readings then make sure to type the commands into the console window in RStudio.

Week 1

September 10 Class

Topics

  • Course overview.
  • Introduction to data vizualization using the ggplot2 library in R.
  • Histrograms, bar graphs, scatter plots, faceting.
  • Distribution of quantitative and numerical variables.

Slides

Prof. Taback’s class slides - Mon 10:00 (L0101): html, pdf.

Prof. Moon’s class slides - Mon 14:00 (L0201): pdf, pdf - anotated

References

  1. Data Visualization Basics

  2. Grolemund, G. and Wickham, H. R for Data Science. Chapter 3.

September 14 Tutorial

Read this before answering practice problems using RStudio.

Practice problems

Practice problems - Sample Solutions

Week 2

September 17 Class

Topics

  • Introduction to programming with R
    • RStudio user interface
    • R Objects
    • R Functions
    • R Scripts
    • R Packages
    • R Lists
    • R Notation
    • R Missing Data
  • Numerical descriptions of the distribution of quantitative variable

Slides

Prof. Taback’s class slides - Mon 10:00 (L0101): html, pdf.

Prof. Moon’s class slides - Mon 14:00 (L0201): pdf, pdf - anotated

Flu data used in class (csv format): Provincial Flu data, Provincial Population Size.

September 21 Tutorial

Read this before answering practice problems using RStudio.

Practice problems

Practice problems - Sample Solutions

Week 3

September 24 Class

Topics

  • Statistical data
  • Tidy data
  • Data wrangling
  • Boxplots

Slides

Prof. Taback’s class slides - Mon 10:00 (L0101):

The slides on joining (merging) data frames and the Trump tweets example have been removed. You are not responsible for these topics or undertsanding this example this week. This topic will be covered on Oct.1

Prof. Moon’s class slides - Mon 14:00 (L0201):

September 28 Tutorial

Read this before answering practice problems using RStudio.

Practice problems

(NB: One question has been removed from the original post on Sept 23)

Practice problems - Sample Solutions

Week 4

October 1 Class

Topics

  • Introduction to statistical inference
  • for loops
  • Simulation
  • Inference for a single proportion
  • Hypothesis testing

Slides

Prof. Taback’s class slides - Mon 10:00 (L0101):

Prof. Moon’s class slides - Mon 14:00 (L0201):

References

Sections 2.3.1, 2.3.2, 2.3.7 and 2.4 of Introductory Statistics with Randomization and Simulation from OpenIntro

October 5 Tutorial

Read this before answering practice problems using RStudio.

Practice problems

Practice problems - Sample Solutions

Week 5

October 8 Class - No Class due to Thanksgiving, but there is a tutorial on Friday, October 12

The tutorial will cover joining data frames. A short lesson on joining data frames is given here. It is strongly recommended that you study this before attempting the tutorial questions.

Topics

  • Joining two data frames.

October 12 Tutorial

Read this before answering practice problems using RStudio.

Practice problems

Practice problems - solutions

References

  1. Grolemund, G. and Wickham, H. R for Data Science. Chapter 13.1 - 13.4.

  2. dplyr reference for Join two tbls together

Week 6

October 15 Class

Topics

  • Comapring two proportions
  • Comapring two means
  • Type I and Type II Errors
  • Interpretation of P-values

Slides

Prof. Taback’s class slides - Mon 10:00 (L0101):

Prof. Moon’s class slides - Mon 14:00 (L0201)

References

Recommended reading:
Sections 2.1, 2.2, 2.3 (excluding 2.3.4) of Introductory Statistics with Randomization and Simulation from OpenIntro
(a free open-source textbook)

October 19 Tutorial

Read this before answering practice problems using RStudio.

Practice problems

Practice problems - solutions

Week 7

October 22 Class

Topics

  • Review class

Slides

Prof. Taback’s class slides - Mon 10:00 (L0101):

Prof. Moon’s class slides - Mon 14:00 (L0201):

October 26 Tutorial

Midterm test during tutorial. See term test information.

Week 8

October 29 Class

Topics

  • Population parameters and statistics
  • Sampling distribution
  • Bootstrap sampling distribution
  • Confidence intervals

Slides and References

Prof. Taback’s class slides - Mon 10:00 (L0101):

Prof. Moon’s class slides - Mon 14:00 (L0201):

References

Baumer, Benjamin S., Daniel T. Kaplan, and Nicholas J. Horton. Modern data science with R. CRC Press, 2017. Pages 149-155. Available on Quercus for STA130-L0101 here, and for STA130-L0201 here. (You can safely ignore any mention of standard error or standard deviation.)

Computational and Inferential Thinking. Chapter 13. (NB: The code examples are in Python, and students are not responsbile for understanding the Python code. The discussion of the bootstrap and confidence intervals are appropriate for this course.)

October 29 Tutorial

Read this before answering practice problems using RStudio.

Practice problems

Practice problems - solutions

Fall Reading Week

  • No classes or tutorials

Week 9

November 12 Class

Topics

  • Supervised versus Unsupervised Learning
  • Classification Trees
  • Interpreting a Classification Tree
  • Geometric Interpretation of a Classification Trees
  • Classification Tree Methodology
  • Training and Testing Classification Trees
  • Accuracy of Classification Trees
  • ROC Curves

Slides

Prof. Taback’s class slides - Mon 10:00 (L0101):

Prof. Moon’s class slides - Mon 14:00 (L0201):

References

Baumer, Benjamin S., Daniel T. Kaplan, and Nicholas J. Horton. Modern data science with R. CRC Press, 2017. Pages 173-180, 189-192. Available on Quercus for STA130-L0101 here.

November 16 Tutorial

Read this before answering practice problems using RStudio.

Practice problems

Practice problems - solutions

Week 10

November 19 Class

Topics

  • Relationships between two variables
  • Linear Relationships: The equation of a straight line
  • Linear regression models
  • Estimating the coefficients: Least Squares
  • Interpreting the slope with a continuous explanatory variable
  • Prediction/Supervised learning using a linear regression model
  • R2 - Coefficient of Determination
  • Introduction to Multiple Regression

Slides

Prof. Taback’s class slides - Mon 10:00 (L0101):

Prof. Moon’s class slides - Mon 14:00 (L0201):

References

Baumer, Benjamin S., Daniel T. Kaplan, and Nicholas J. Horton. Modern data science with R. CRC Press, 2017. Pages 189, 465-471. Available on Quercus for STA130-L0101 here.

November 23 Tutorial

Read this before answering practice problems using RStudio.

Practice problems

Practice problems - solutions

Week 11

Topics

  • Inference for regression parameters
  • Regression when the independent variable is a categorical variable
  • Is the regression line the same for two groups?
  • An example of a variable affecting a relationship in a non-regression setting
  • Confounding

November 26 Class

A blog post from last term’s STA130 poster fair. The pictures in this post should give student’s an idea of what your posters should look like and how they will be displayed.

Slides

Prof. Taback’s class slides - Mon 10:00 (L0101):

NYTimes: How Cheap Labor Drives China’s A.I. Ambitions discussed at the beginning of class.

Prof. Moon’s class slides - Mon 14:00 (L0201):

References

Section 1.4.1 of Introductory Statistics with Randomization and Simulation from OpenIntro

November 30 Tutorial - The last tutorial 👍

Read this before answering practice problems using RStudio.

Practice problems

Practice problems

Practice problems - solutions

Week 12

Topics

December 3 Class

  • Finish confounding from last class
  • Identification of ethical considerations involving research where data is collected:
    • History of Ethical Codes: Nuremberg Code; and Declaration of Helsinki.
    • Tuskegee Syphilis Study
    • Informed consent
    • Ethical issues in Data Science Research
    • Ethical issues using Public Data
    • Bias and Inclusion in AI Systems

Slides

Prof. Taback’s class slides - Mon 10:00 (L0101):

Prof. Taback’s class slides - Mon 14:00 (L0201):

References

Practice problems

Practice problems for ethics in data science are available here.

Example solutions for these problems are available here.

STA130 Poster Fair 🎨 - Thursday, December 6 (the last STA130 class) 😄

See information on final project.