HW 05 - Bootstrap estimation

due Tuesday, November 1 at 11:59pm

Introduction and Data

The goal of this week’s homework is to practice creating bootstrap confidence intervals, and visualizing bootstrap distributions.

Tips

Tip 1: Set your seed

When dealing with randomness (as often the case in simulation in statistics), it is important to specify which pseudo-random draw you used in your analysis, so that you or someone else can reproduce the exact numbers you initially report. The set.seed() function in R allows you to ensure that all of your analysis relies on a specific pseudo-random draw:

set.seed(42)

Tip 2: Define global parameters

Often, we rely on specific parameters values throughout our analysis, and at a later point, we may want to replace them. In order to minimize the need to change your code later, we can assign the parameter values to a name, and use the name (rather than the hard-coded value) downstream. Then, to update your code at a later point, you can just change the value. Here, we are assigning the number of reps to a variable called num_reps:

num_reps <- 100
boot_dist <- diamonds %>%
  specify(response = price) %>%
  generate(reps=num_reps, type = "bootstrap") %>%
  calculate(stat="mean")

The data

The data may be found by cloning your hw-05-boston- repository available in the GitHub course organization.

Today’s data comes from the city of Boston, courtesy of the U.S. Census Bureau. In particular, the Boston dataset contains data about median value of owner-occupied housing units in 506 suburbs of Boston. “Owner-occupied housing units” is defined as: one-family houses on less than 10 acres without a business or medical office on the property. The variables and their definitions are as follows:

You may load in the data with the following code, where ____ should be replaced by a meaningful name of your choosing. Don’t forget to set eval = TRUE before knitting:

___ <- read.csv("data/Boston.csv")

Exercises

Write all R code according to the style guidelines discussed in class. Make sure that your plots have appropriate labels and titles.

Hint: Don’t forget to set a seed in order to ensure reproducibility!

  1. Based on this data set, provide a point estimate for the population mean of the median value of owner-occupied homes in Boston, medv.

  1. Construct a 95% bootstrap interval for the mean of the median value of owner-occupied homes in Boston. Use at least 1,000 bootstrap samples. Make sure your interval is reproducible.

  1. Visualize the bootstrap distribution and your confidence interval from Exercise 2. Interpret the confidence interval you constructed.

  1. Considering towns with a pupil-teacher ratio of at most 15, provide a point estimate of the proportion of owner-occupied houses in Boston with median values over $40,000.

  1. For towns with a pupil-teacher ratio of at most 15, construct a 99% bootstrap interval for the proportion of owner-occupied houses in Boston with median values over $40,000. Make sure your interval is reproducible.

  1. Visualize the bootstrap distribution and your confidence interval from Exercise 5. Interpret the confidence interval you constructed.

  1. Provide a point estimate of the correlation between the median housing value of owner-occupied houses in Boston medv and the pupil-teacher ratio in the town.

Hint: To simulate the correlation between two variables, use specify(var1 ~ var2). Remember that correlation is still a numerical quantity, so that should help you choose the type of simulation you want to perform.

  1. Construct a 95% bootstrap interval for the correlation between the median housing value of owner-occupied houses in Boston medv and the pupil-teacher ratio in the town. Make sure your interval is reproducible.

Note: you do not need to visualize your confidence interval.

Hint: To simulate the correlation between two variables var1 and var2, use specify(var1 ~ var2). Remember that correlation is still a numerical quantity, so that should help you choose the type of simulation you want to perform.

  1. Construct and report your 90% and 99% bootstrap interval using the bootstrap distribution from Exercise 8. Then answer the following questions:

Submission

Knit to PDF to create a PDF document. Knit and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo. Please only upload your PDF document to Canvas.