Bring your answer to question 2 parts b and c.
file_url <- "https://raw.githubusercontent.com/ntaback/UofT_STA130/master/week10/Week10PracticeProblems-student.Rmd"
download.file(url = file_url , destfile = "Week10PracticeProblems-student.Rmd")
Look for the file “Week10PracticeProblems-student.Rmd” under the Files tab then click on it to open.
Change the subtitle to “Week 10 Practice Problems Solutions” and change the author to your name and student number.
Type your answers below each question. Remember that R code chunks can be inserted directly into the notebook by choosing Insert R from the Insert menu (see Using R Markdown for Class Assignments). In addition, this R Markdown cheatsheet, and reference are great resources as you get started with R Markdown.
[Adapted from Exercise 7.7 in the textbook]
From the text: “The Whickham data set … includes data on age, smoking, and mortality from a one-in-six survey of the electoral roll in Whickham … in the United Kingdom. The survey was conducted in 1972-1974 to study heart disease and thyroid disease. A follow-up on those in the survey was conducted twenty years later. Load the mosaicData
package and look at the help file for Whickham
to see the definition of the variables.” Note that the data frame includes the data for women only, and we will consider as our population the women in Whickham during the period of the study. The data collected in 1972-74 are referred to as the “baseline” values.
age
into a new categorical variable with 3 age categories: women between the age of 18 and 44, women who are older than 44 and younger than 65, and women who are 65 and over. Note that this is the age at the time of the first survey. Examine the percentage of smokers and non-smokers who died at follow-up in each age category.Bring your output for parts b and c of this question to tutorial on Friday, March 23 (either a hardcopy or on your laptop).
In this question we will again consider the Mario Kart eBay data from lecture.
totalPr
) for sellers who do and do not use stock photos.Sellers are rated by buyers on eBay, captured in the variable sellerRate
. To simplify our analysis, we will categorize sellers by whether their rating is low, medium or high. Create a new variable called seller_rating
that is “low” if sellerRate
is less than or equal to 100, “medium” if it is greater than 100 but less than or equal to 5000, and “high” if it is greater than 5000. Carry out a regression analysis to predict totalPr
using the new variable seller_rating
.
R
treating as the baseline category?totalPr
for sellers with low ratings? What is the estimate from the fitted regression line for the mean totalPr
for sellers with medium ratings? What is the estimate from the fitted regression line for the mean totalPr
for sellers with high ratings?Now fit an appropriate regression line to examine whether seller_rating
has an effect on the relationship between totalPr
and duration
.
totalPr
and duration
?totalPr
and duration
?totalPr
and duration
?[Adapted from Introductory Statistics with Randomization and Simulation from OpenIntro]
For each of the following situations, state a possible confounding variable: