Clone a repo + start a new project

Go to the appex-10-[GITHUB USERNAME] repo, clone it, and start a new project in RStudio.

Example: Bootstrap Estimation with infer

NOTE: you may need to install the infer package. You may do so by typing install.packages("infer) in the Console.

library(tidyverse)
library(infer)

Data

The General Social Survey (GSS) gathers data on contemporary American society in order to understand and explain trends and constants in attitudes, behaviors, and attributes. Hundreds of trends have been tracked since 1972. In addition, since the GSS adopted questions from earlier surveys, trends can be followed for up to 70 years.

The GSS contains a standard core of demographic, behavioral, and attitudinal questions, plus topics of special interest. Among the topics covered are civil liberties, crime and violence, intergroup tolerance, morality, national spending priorities, psychological well-being, social mobility, and stress and traumatic events.

We will analyze data from the 2018 GSS, using to understand how much time US adults spend on email.

gss_email <- read_csv("data/gss_email_2018.csv") %>%
  slice(150:300)

We’ll use the variable email: the number of minutes the respondents spend on email weekly. We will begin by live-coding together, and then you must complete the remainder of the Application Exercise on your own!

LIVE CODE TOGETHER

We want to calculate a 95% confidence interval for the mean amount of time Americans spend on email weekly.

Set seed

Set a seed to control R’s random sampling process.

Bootstrap distribution

Calculate 95% confidence interval

Interpretation

YOUR TURN

We will use the infer package to calculate a 95% confidence interval for the median amount of time Americans spend on email weekly.

We start by setting a seed. We’ll use 1234 to set our seed today but you can use any value you want on assignments.

set.seed(1234)

Creating the data frame

Un-comment the lines and fill in the blanks to create a data frame containing the bootstrapped data - sample medians in our case.

boot_dist <- gss_email #%>% 

Glimpse the data frame of the bootstrapped data.

### Glimpse the bootstrapped data frame

Find the confidence interval

Fill in the blanks to construct the 95% bootstrap confidence interval for the median amount of time Americans spend on email weekly.

Interpret the interval

Write the interpretation for the interval calculated above.