The goal of today’s homework is to practice visualizing and calculating probabilities using the tidyverse.
Clone the hw-04-smoking assignment repo into GitHub desktop. Then open the .Rmd file in RStudio sto get started!
In this lab we will work with the tidyverse and
mosaicData packages.
You may need to install
the mosaicData package!
library(tidyverse)
library(mosaicData) Note that these packages are also loaded in your R Markdown document.
Today’s data comes from a study of conducted in Whickham, England. In this study, the researchers recorded each participant’s age, smoking status at the start of the study, and their health outcome 20 years later.
The data is in the mosaicData package. You can load it
with
data(Whickham)Take a peek at the codebook with
?WhickhamPlease follow these coding practices in this homework and all coding moving forward. Your code style will be assessed according to the following guidelines:
dplyr function (lines end in
%>%) or ggplot layer (lines end in
+)Don’t forget to commit often!
How many observations are in this dataset? What does each observation represent?
How many variables are in this dataset? What type of variable is each? Display each variable using an appropriate visualization.
What would you expect the relationship between smoking status and health outcome to be?
Create a visualization depicting the relationship between smoking status and health outcome.
Calculate the conditional probabilities of death for each smoking
status, only reporting probabilities for the outcome of
Dead. Briefly describe the relationship, and evaluate whether or
not it is what you expected. Use the visualization from the
previous exercise and the conditional probabilities to support your
narrative.
Create a new variable called age_cat using the
following scheme:
age <= 44 ~ "18-44"age > 44 & age <= 64 ~ "45-64"age > 64 ~ "65+"Re-create the visualization from Exercise 4, this time faceting
by age_cat.
Extend the table from Exercise 5 by breaking it down by age category.
Compare the visualization from Exercise 7 and the table from Exercise 8 to what you previously observed in Exercises 4 and 5. What changed, and what might explain the change? Use the table you calculated in Exercise 8 to support your narrative.
Knit to PDF to create a PDF document. Knit and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Please upload your PDF document to Canvas.
This lab was adapted from Data Science in a Box.