The goal of this homework is to continue practicing two-sample statistical inference using both simulation-based approaches. Use the lecture notes, readings, and application exercises to help you complete homework.
Today’s dataset has been adapted from a large data set of wine reviews from Kaggle. Reviews were scraped from WineEnthusiast, which contain information about both the wine itself as well as the reviewer’s description of that wine. Here, I provide a subset of the data. You may consider each of these observations to be an independent, representative sample of all wines.
The variables are as follows
country: the country that the wine is from (France,
Italy, US)points: the number of points WineEnthusiast rated the
wine on a scale of 1-100.price: cost for a bottle (USD)variety: type of grapes used to make the winetype: red or white winetitle: title of the wine reviewdescription: reviewer’s description of the wineWrite all R code according to the style guidelines discussed in class.
To following resource provides code needed to make useful symbols. You may use the code to typeset the characters of interest in the narrative of your document:
$\mu$$\alpha$$>$$<$$\neq$$H_{0}$$H_{a}$$p_{group}$Overall hint: When performing a hypothesis test, you must provide the significance level of your test, the null and alternative hypotheses, the p-value, your decision, and an interpretation of the p-value in context of the original research question.
At the start of each exercise that requires simulation, set a random seed equal to the exercise number in the R chunk.
Make sure that your code is reproducible.
The data are not pre-provided for you this week! Instead, you will
add it to your GitHub hw-07-wine project yourself!
wines.csv. Locate this file on
your local computer, usually found in your Downloads
folder.
From your Math 118 folder you created for this
course, find the hw-07-wine project folder corresponding to
this homework. Now create a new data/ folder
within this folder.
Within your new data/ folder, upload the
wines.csv file.
You may now load in the data with the following code as per usual,
where ____ should be replaced by a meaningful name of your
choosing. Then remove eval = FALSE before you knit!
library(tidyverse)
library(infer)
____ <- read.csv("data/wines.csv")country where the wine is
produced is independent of its type (red or white). Use a
significance level of \(\alpha =
0.05\). Visualize your null distribution.
Let’s say I’m doing to a nice dinner party, and I’m told to bring a bottle of wine. I want to bring an acceptable bottle, but I don’t like these friends enough to spend more than $20 on a bottle of wine. So I have to be strategic. I’ve been told that a cheap white is usually better than a cheap red, but that an expensive red is always better than an expensive white. Maybe I should buy a wine based on its points-to-price ratio.
ratio,
calculated as a wine’s points divided by its price.
Hint: refer back to the lectures about probability and the definition of Type I error to answer this question!
Knit to PDF to create a PDF document. Knit and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Please upload your PDF document to Canvas.