paintings <- read.csv("data/paris_paintings.csv",
na = c("n/a", "", "NA")) %>%
dplyr::select(Height_in, Width_in, landsALL) %>%
na.omit()
We will be looking at data about Paris Paintings in today’s application exercise. These data were collected by Hilary Coe Cronheim and Sandra van Ginhoven (previous PhD students at Duke University Art, Art History & Visual Studies) as part of the Data Expeditions project sponsored by iiD. These data consist of the physical and artistic features of paintings featured in printed catalogues of 28 auction sales in Paris, 1764-1780. In total, these students analyzed 3,393 paintings, their prices, and descriptive details from sales catalogues over 60 variables
We’ll primarily focus on the variables:
Height_in: Height (in inches)Width_in: Width (in inches)landsALL: If any type of landscape is mentioned (either
lands_sc, lands_figs, or lands_ment)Go onto GitHub and edit the README.md file to complete
the data dictionary (Becky will show you how to do this).
First, pull your changes! Right now,
landsALL is coded as a numeric variable in the dataset.
Modify the dataset to make landsALL a factor so R treats it
as a categorical or factor variable.
Create a scatterplot to visualize the relationship between width and height. Color the points based on the whether the painting has any landscape elements.
Based on your scatterplot, does the relationship between Height and Width differ between paintings with landscape elements and those without? Briefly explain.
Fit a model using width and whether the painting has landscape elements to predict the height. Present the model in a tidy format.
Using your model from the previous exercise,
Width_in.landsAll.Write the equation of the model for
How do the slopes compare between the two models? How do the intercepts compare?
Now let’s see how well the model fits the data. Obtain the \(R^2\) for your model and interpret it!