The goal of this application is to continue practicing two-sample statistical inference using simulation-based approaches. Use the lecture notes, readings, and application exercises to help you complete homework.
Today’s dataset has been adapted from a large data set of wine reviews from Kaggle. Reviews were scraped from WineEnthusiast, which contain information about both the wine itself as well as the reviewer’s description of that wine. Here, I provide a subset of the data. You may consider each of these observations to be an independent, representative sample of all wines.
The variables are as follows
country: the country that the wine is from (France,
Italy, US)points: the number of points WineEnthusiast rated the
wine on a scale of 1-100.price: cost for a bottle (USD)variety: type of grapes used to make the winetype: red or white winetitle: title of the wine reviewdescription: reviewer’s description of the winelibrary(tidyverse)
library(infer)
_______ <- read_csv("data/wines.csv")
Suppose you are interested in how the type of wine (red or white) impacts its price. Comprehensively evaluate the hypothesis that the average price of red wines is different from the average price of white wines in these three countries. Use \(\alpha = 0.05\) as the significance level.
Because we will be simulating randomly, we should first set a seed. Set the seed equal to the exercise number.
# set seed here
# obtain null distribution
Display a visualization of your simulated null distribution, and describe the values that would cause you to reject your null hypothesis. Does the observed sample statistic lie in this rejection region?
What is your p-value, decision, and conclusion in context of the research question?
Construct and interpret a 95% two-sided confidence interval for the difference you investigated in Exercise 2.