A note on expectations: For each exercise your answer include any relevant output (tables, summary statistics, plots) in your answer. Doing this is easy! Just place any relevant R code in a code chunk, and hit Knit.
Plastic pollution is a major and growing problem, negatively affecting oceans and wildlife health. Our World in Data has a lot of great data at various levels including globally, per country, and over time. For this homework we focus on data from 2010.
Additionally, National Geographic recently ran a data visualization communication contest on plastic waste as seen here.
Learning goals for this homework are
Go to the course GitHub organization and locate your HW 02 repo,
which should be named
hw-02-plastic-waste-[GITHUB USERNAME].
Click on the green Code button, select the second option “Open with GitHub Desktop”
The GitHub Desktop application will open up, with a white window
that says “Clone a Repository”. Important: in the
second line that says “Local Path”, there is a button that says
Choose.... Click on it, and find and select the folder we
created for this course. Then hit the blue Clone
button.
After successfully cloning, the window will disappear and you will see the that Current Repository is the one you just cloned. Success!
Navigate to the project folder you just created within the
Math118 folder, and open the hw-02-plastic-waste.Rmd file
to begin.
You will write your answers in the document
hw-02-plastic-waste.Rmd Before starting the exercises, be
sure to update the author name and date in the YAML at the top of the
.Rmd file. Knit the document and make sure the resulting PDF file has
your name and date.
We’ll use the tidyverse package for this analysis. You can run run the following code in the Console to load this package.
library(tidyverse)The dataset for this assignment can be found as a csv file in the
data folder of your repository. You can read it in using
the following line of code. Make sure the file path is pointing to this
project folder! If you are not sure: hit “Session” -> “Set Working
Directory” -> “To Source File Location”.
plastic_waste <- read_csv("data/plastic-waste.csv")The variable descriptions are as follows:
code: 3 Letter country codeentity: Country namecontinent: Continent nameyear: Yeargdp_per_cap: GDP per capita constant 2011 international
$, rateplastic_waste_per_cap: Amount of plastic waste per
capita in kg/daymismanaged_plastic_waste_per_cap: Amount of mismanaged
plastic waste per capita in kg/daymismanaged_plastic_waste: Tonnes of mismanaged plastic
wastecoastal_pop: Number of individuals living on/near
coasttotal_pop: Total population according to GapminderLet’s start by taking a look at the distribution of plastic waste per capita in 2010.
ggplot(data = plastic_waste, aes(x = plastic_waste_per_cap)) +
geom_histogram(binwidth = 0.2)One country stands out as an unusual observation at the top of the distribution. One way of identifying this country is to filter the data for countries where plastic waste per capita is greater than 3.5 kg/person.
plastic_waste %>%
filter(plastic_waste_per_cap > 3.5)## # A tibble: 1 × 10
## code entity conti…¹ year gdp_p…² plast…³ misma…⁴ misma…⁵ coast…⁶ total…⁷
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 TTO Trinidad … North … 2010 31261. 3.6 0.19 94066 1358433 1341465
## # … with abbreviated variable names ¹continent, ²gdp_per_cap,
## # ³plastic_waste_per_cap, ⁴mismanaged_plastic_waste_per_cap,
## # ⁵mismanaged_plastic_waste, ⁶coastal_pop, ⁷total_pop
Did you expect this result? You might consider doing some research on Trinidad and Tobago to see why plastic waste per capita is so high there, or whether this is a data error.
From this point onwards the plots / output of the code won’t be printed in the homework, but you can run the code and view the results yourself.
Another way of visualizing numerical data is using density plots.
ggplot(data = plastic_waste, aes(x = plastic_waste_per_cap)) +
geom_density()And compare distributions across continents by coloring density curves by continent.
ggplot(data = plastic_waste,
mapping = aes(x = plastic_waste_per_cap,
color = continent)) +
geom_density()The resulting plot may be a little difficult to read, so let’s also fill the curves in with colors as well.
ggplot(data = plastic_waste,
mapping = aes(x = plastic_waste_per_cap,
color = continent,
fill = continent)) +
geom_density()The overlapping colors make it difficult to tell what’s happening
with the distributions in continents plotted first, and hence covered by
continents plotted over them. We can change the transparency level of
the fill color to help with this. The alpha argument takes
values between 0 and 1: 0 is completely transparent and 1 is completely
opaque. There is no way to tell what value will work best, so it’s best
to try a few.
ggplot(data = plastic_waste,
mapping = aes(x = plastic_waste_per_cap,
color = continent,
fill = continent)) +
geom_density(alpha = 0.7)This still doesn’t look great…
Recreate the density plots above using a different (lower) alpha level that works better for displaying the density curves for all continents.
Describe why we defined the color and
fill of the curves by mapping aesthetics of the plot but we
defined the alpha level as a characteristic of the plotting
geom.
Now is a good time to knit, commit and push your changes to GitHub with a short, informative commit message. Make sure to commit and push all changed files.
There is yet another way to visualize this relationship is using side-by-side box plots.
ggplot(data = plastic_waste,
mapping = aes(x = continent,
y = plastic_waste_per_cap)) +
geom_boxplot()Remember that we use
geom_point() to make scatterplots.
Next, visualize the relationship between plastic waste per capita and mismanaged plastic waste per capita using a scatterplot. Describe the relationship between the two variables.
Color the points in the scatterplot by continent. Does there seem to be any clear distinctions between continents with respect to how plastic waste per capita and mismanaged plastic waste per capita are associated?
Visualize the relationship between plastic waste per capita and total population as well as plastic waste per capita and coastal population. Do either of these pairs of variables appear to be more strongly linearly associated?
Now is another good time to knit, commit and push your changes to GitHub with a short, informative commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Hint:
The colors are from the viridis color palette. Take a look at the
functions starting with scale_color_viridis_* in the ggplot2
reference page..
plastic_waste <- plastic_waste %>%
filter(plastic_waste_per_cap < 3)Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. we will be checking these to make sure you have been practicing how to commit and push changes.
Once your work is finalized in your GitHub repo, submit the final PDF it to Canvas.
This homework was adapted from Data Science in a Box.