In January 2017, Buzzfeed published an article titled “These Nobel Prize Winners Show Why Immigration Is So Important For American Science”. In the article they explore where many Nobel laureates in the sciences were born and where they lived when they won their prize.
In this homework we will work with the data from this article to recreate some of their visualizations as well as explore new questions.
The learning goals of this homework are: - Manipulate and transform data to prepare it for visualization. - Recreate visualizations. - Summarize data.
Go to the course GitHub organization and locate your HW 03 repo,
which should be named hw-03-nobel-[GITHUB USERNAME]. Grab
the URL of the repo, and clone it in RStudio. Refer to HW 01 for
step-by-step for cloning a repo and creating a new RStudio project.
Change the author to your name in YAML.
We’ll use the tidyverse package for this analysis. Run the following code in the Console to load this package.
library(tidyverse)The dataset for this assignment can be found as a csv file in the
data folder of your repository. You can read it in using
the following.
nobel <- read_csv("data/nobel.csv")The variable descriptions are as follows:
id: ID numberfirstname: First name of laureatesurname: Surnameyear: Year prize woncategory: Category of prizeaffiliation: Affiliation of laureatecity: City of laureate in prize yearcountry: Country of laureate in prize yearborn_date: Birth date of laureatedied_date: Death date of laureategender: Gender of laureateborn_city: City where laureate was bornborn_country: Country where laureate was bornborn_country_code: Code of country where laureate was
borndied_city: City where laureate dieddied_country: Country where laureate dieddied_country_code: Code of country where laureate
diedoverall_motivation: Overall motivation for
recognitionshare: Number of other winners award is shared
withmotivation: Motivation for recognitionIn a few cases the name of the city/country changed after prize was
given (e.g. in 1975 Bosnia and Herzegovina was part of the Socialist
Federal Republic of Yugoslavia). In these cases the variables below
reflect a different name than their counterparts without the suffix
_original.
born_country_original: Original country where laureate
was bornborn_city_original: Original city where laureate was
borndied_country_original: Original country where laureate
dieddied_city_original: Original city where laureate
diedcity_original: Original city where laureate lived at
the time of winning the awardcountry_original: Original country where laureate lived
at the time of winning the awardNote that in this lab, the R chunks are not provided for you. Therefore you must create your own code chunks. A portion of the lab grade will be based on: - Reasonable number of commits to ensure you are tracking your progress - Good coding style - Figures are appropriately sized (i.e. not too big or small)
There are some observations in this dataset that we will exclude from our analysis to match the Buzzfeed results.
Hint: The lecture about logical operators could be useful here!
nobel_living that
filters for the following criteria. Confirm that once you have filtered
for these characteristics you are left with a data frame with 228
observations.country is available"org" as their gender)died_date is
NA)Knit, commit and push your changes to GitHub with an appropriate commit message again.
First, we’ll create a new variable to identify whether the laureate
was in the US when they won their prize. We’ll use the
mutate() function for this. The following pipeline mutates
the nobel_living data frame by adding a new variable called
country_us. We use an if/else statement to create this
variable. The first argument in the if_else() function is
the condition we’re testing for. If country is equal to
"USA", we set country_us to
"USA". If not, we set the country_us to
"Other".
Note that we can
achieve the same result using the fct_other() function
(i.e. with country_us = fct_other(country,
“USA”)).
nobel_living <- nobel_living %>%
mutate(
country_us = if_else(country == "USA", "USA", "Other")
)Next, we will limit our analysis to only the following categories: Physics, Medicine, Chemistry, and Economics.
nobel_living_science <- nobel_living %>%
filter(category %in% c("Physics", "Medicine", "Chemistry", "Economics"))You will work with the nobel_living_science data
frame you created above for the remainder of the lab. This means you’ll
need to define this data frame in your R Markdown document.
Hint:
You can change the orientation of the bars using the
coord_flip() function in ggplot2. Click here
to read more about the function.
Knit, commit and push your changes to GitHub with an appropriate commit message again.
Hint:
You should be able to borrow from code you used earlier to create the
country_us variable.
born_country_us that has
the value "USA" if the laureate is born in the US, and
"Other" otherwise. Be sure to save the variable to the
nobel_living_science data frame.
Knit, commit and push your changes to GitHub with an appropriate commit message again.
Note that your bar plot won’t exactly match the one from the Buzzfeed article. This is likely because the data has been updated since the article was published.
count function) for their birth
country (born_country), and arrange the resulting data
frame in descending order of number of observations for each
country.Knit, commit and push your changes to GitHub with an appropriate commit message again.
Go back through your write up to make sure you followed the coding style guidelines we discussed in class (e.g. no long lines of code), and your figures are reasonably sized.
Once you are finished, push to GitHub one last time and upload your PDF document to Canvas.
The plots in the Buzzfeed article are called waffle plots. You can find the code used for making these plots in Buzzfeed’s GitHub repo (yes, they have one!) here. You’re not expected to recreate them as part of your assignment, but you’re welcomed to do so for fun! © 2020 GitHub, Inc.
This lab was adapted from Data Science in a Box.