Instructions

What should I bring to tutorial on September 14?

  • R output (e.g., plots and explanations) for Question 2 only. You can either bring a hardcopy or bring your laptop with the output.

Practice Problems

Question 1

The Marriage data is in the mosaic package, which you must first load with the command library(mosaic). You can read more about the data and the variables here: https://rdrr.io/cran/mosaicData/man/Marriage.html. You can also use the help command ?Marriage for a description of the data.

  1. Choose two categorical variables and plot thier distributions. Interpret the plots.
# Construct your plots in this code chunk
library(mosaic)
library(tidyverse)

Answer the question in this space.

  1. Choose a quantitative variables and plot it’s distributions. Interpret the plot.
# Construct your plots in this code chunk
library(mosaic)
library(tidyverse)

Answer the question in this space.

  1. Construct a plot that shows the relationship between two variables. What can you say about the relationship?
# Construct your plots in this code chunk
library(mosaic)
library(tidyverse)

Answer the question in this space.

Question 2

The Gestation data set is also part of the mosaic package. Use the help command ?Gestation for a description of the data.

  1. Create three histograms of the length of gestation using the number of bins defined as 2, 25, and 50. What is the relationship between the number of bins and the width of the bins? Which number of bins do you think is most appropriate to display this distribution? What is the shape of the distribution? Explain.
# Construct your plots in this code chunk
library(tidyverse)
library(mosaic)

Answer the question in this space.

  1. Do high school graduates and college graduates have different gestation distributions? Construct a data vizualization to investigate the answer to this question. Explain why you chose this vizualization.
# Construct your plots in this code chunk
library(tidyverse)
library(mosaic)

Answer the question in this space.

  1. Create a vizualization to explore the relationship between a babies birth weight and gestation length. Explain why you chose this vizualization. What can you learn from your vizualization?
# Construct your plots in this code chunk
library(tidyverse)
library(mosaic)

Answer the question in this space.

  1. Modify the vizualization that you created in part (c) to evaluate if the relationship between a babies weight and gestation time is the same for mother’s that never smoked compared to meother’s that smoke now. Explain how you modified your vizualization.
# Construct your plots in this code chunk
library(tidyverse)
library(mosaic)

Answer the question in this space.

Question 3

For this exercise, you will load data from an external source. You can read about the data here: http://sta220.utstat.utoronto.ca/data/the-skeleton-data/.

The data are in a plain text file with spaces between columns here: http://stats.onlinelearning.utoronto.ca/wp-content/uploaded/Data/SkeletonDatacomplete.txt. The following code will load the data into a tibble (the tidyverse version of a data frame).

  1. Read the data into R using the following code.
library(tidyverse)
data_url <- "http://stats.onlinelearning.utoronto.ca/wp-content/uploaded/Data/SkeletonDatacomplete.txt"
skeleton_data <- read_table(data_url)

Inspect the data to make sure it is read in completely. You can compare by going directly to the data_url.

  1. Construct at least four interesting graphs with the data, including: a graph of one categorical variable, a graph of one quantitative variable, a graph with at least two variables, a graph with at least three variables.

Example graph of one categorical variable:

# Construct your plots in this code chunk
library(tidyverse)

Example graphs of one quantitative variable:

# Construct your plots in this code chunk
library(tidyverse)

Example graphs with two variables:

# Construct your plots in this code chunk
library(tidyverse)

Example graphs with three variables:

# Construct your plots in this code chunk
library(tidyverse)
  1. Describe what you learned about the data from your graphs.

Answer the question in this space.