Go to the appex-05-[GITHUB USERNAME] repo, clone it, and
start a new project in RStudio.
Note: In each of these exercises you will need to
set eval=TRUE in the code chunk header when you’re ready to
knit and run the code for that exercise.
We will first work the dataset relig_income, which
contains data from a survey by Pew. This dataset contains three
variables:
religion, stored in the rows,After loading the data, go ahead and View() it to get a
feel for the data.
library(tidyverse)
data("relig_income")
Suppose our observation unit of interest is religion.
Explain why this data frame is in the wide format.
Pivot the data into the long form shown here:
## # A tibble: 180 × 3
## religion income count
## <chr> <chr> <dbl>
## 1 Agnostic <$10k 27
## 2 Agnostic $10-20k 34
## 3 Agnostic $20-30k 60
## 4 Agnostic $30-40k 81
## 5 Agnostic $40-50k 76
## 6 Agnostic $50-75k 137
## 7 Agnostic $75-100k 122
## 8 Agnostic $100-150k 109
## 9 Agnostic >150k 84
## 10 Agnostic Don't know/refused 96
## # … with 170 more rows
relig_income %>%
pivot_longer(cols = ____, # hint: it's easier to specify which column you don't want to pivot!
names_to = ____,
values_to = ____)
Let’s now work with the some more data about fish! The
fish_encounters dataset contains information about fish
swimming down a river. Each station recorded if a tagged fish was
observed at its monitor stations. The dataset contains three
variables:
fish, the fish identifier,station, the measurement stationsseen = 1 if the fish was seenAfter loading the data, go ahead and View() it to get a
feel for the data.
data("fish_encounters")
Suppose our observation unit of interest is fish.
Explain why this data frame is in the long format.
We want to pivot the data such that each fish is an observation, and we can easily see which stations it was observed at. Specifically, we want the data in this wide format:
## # A tibble: 19 × 12
## fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
## <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 4842 1 1 1 1 1 1 1 1 1 1 1
## 2 4843 1 1 1 1 1 1 1 1 1 1 1
## 3 4844 1 1 1 1 1 1 1 1 1 1 1
## 4 4845 1 1 1 1 1 NA NA NA NA NA NA
## 5 4847 1 1 1 NA NA NA NA NA NA NA NA
## 6 4848 1 1 1 1 NA NA NA NA NA NA NA
## 7 4849 1 1 NA NA NA NA NA NA NA NA NA
## 8 4850 1 1 NA 1 1 1 1 NA NA NA NA
## 9 4851 1 1 NA NA NA NA NA NA NA NA NA
## 10 4854 1 1 NA NA NA NA NA NA NA NA NA
## 11 4855 1 1 1 1 1 NA NA NA NA NA NA
## 12 4857 1 1 1 1 1 1 1 1 1 NA NA
## 13 4858 1 1 1 1 1 1 1 1 1 1 1
## 14 4859 1 1 1 1 1 NA NA NA NA NA NA
## 15 4861 1 1 1 1 1 1 1 1 1 1 1
## 16 4862 1 1 1 1 1 1 1 1 1 NA NA
## 17 4863 1 1 NA NA NA NA NA NA NA NA NA
## 18 4864 1 1 NA NA NA NA NA NA NA NA NA
## 19 4865 1 1 1 NA NA NA NA NA NA NA NA
Fill in the following code to change into the desired format.
fish_encounters %>%
pivot_wider(names_from = _____,
values_from = ______)
You might notice that there are a lot of NA or missing
values after pivoting wider. This is means that the fish was not
observed at that given station. Let’s replace the NA values
with 0s. In your code for Exercise 4, add the following to your
pivot_wider() function call:
values_fill = 0.
Once you have completed the activity, push your final changes to your GitHub repo! Make sure your repo is updated on GitHub, and that’s all you need to do to submit application exercises for participation.