AE 05: Reshaping

Clone a repo + start a new project

Go to the appex-05-[GITHUB USERNAME] repo, clone it, and start a new project in RStudio.

Note: In each of these exercises you will need to set eval=TRUE in the code chunk header when you’re ready to knit and run the code for that exercise.

Pivoting longer

We will first work the dataset relig_income, which contains data from a survey by Pew. This dataset contains three variables:

religion, stored in the rows,
the annual income bracket spread across the column names, and
the count (number of surveyees) if the religion-income pair, stored in the cell values.

After loading the data, go ahead and View() it to get a feel for the data.

library(tidyverse)
data("relig_income")

Exercise 1.

Suppose our observation unit of interest is religion. Explain why this data frame is in the wide format.

Exercise 2

Pivot the data into the long form shown here:

## # A tibble: 180 × 3
##    religion income             count
##    <chr>    <chr>              <dbl>
##  1 Agnostic <$10k                 27
##  2 Agnostic $10-20k               34
##  3 Agnostic $20-30k               60
##  4 Agnostic $30-40k               81
##  5 Agnostic $40-50k               76
##  6 Agnostic $50-75k              137
##  7 Agnostic $75-100k             122
##  8 Agnostic $100-150k            109
##  9 Agnostic >150k                 84
## 10 Agnostic Don't know/refused    96
## # … with 170 more rows

relig_income %>%
  pivot_longer(cols = ____, # hint: it's easier to specify which column you don't want to pivot!
               names_to = ____,
               values_to = ____)

Pivoting wider

Let’s now work with the some more data about fish! The fish_encounters dataset contains information about fish swimming down a river. Each station recorded if a tagged fish was observed at its monitor stations. The dataset contains three variables:

fish, the fish identifier,
station, the measurement stations
seen = 1 if the fish was seen

After loading the data, go ahead and View() it to get a feel for the data.

data("fish_encounters")

Exercise 3

Suppose our observation unit of interest is fish. Explain why this data frame is in the long format.

Exercise 4

We want to pivot the data such that each fish is an observation, and we can easily see which stations it was observed at. Specifically, we want the data in this wide format:

## # A tibble: 19 × 12
##    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE   MAW
##    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int> <int>
##  1 4842        1     1      1     1       1     1     1     1     1     1     1
##  2 4843        1     1      1     1       1     1     1     1     1     1     1
##  3 4844        1     1      1     1       1     1     1     1     1     1     1
##  4 4845        1     1      1     1       1    NA    NA    NA    NA    NA    NA
##  5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA
##  6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA    NA
##  7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
##  8 4850        1     1     NA     1       1     1     1    NA    NA    NA    NA
##  9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
## 10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
## 11 4855        1     1      1     1       1    NA    NA    NA    NA    NA    NA
## 12 4857        1     1      1     1       1     1     1     1     1    NA    NA
## 13 4858        1     1      1     1       1     1     1     1     1     1     1
## 14 4859        1     1      1     1       1    NA    NA    NA    NA    NA    NA
## 15 4861        1     1      1     1       1     1     1     1     1     1     1
## 16 4862        1     1      1     1       1     1     1     1     1    NA    NA
## 17 4863        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
## 18 4864        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
## 19 4865        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA

Fill in the following code to change into the desired format.

fish_encounters %>%
  pivot_wider(names_from = _____, 
              values_from = ______)

Exercise 5

You might notice that there are a lot of NA or missing values after pivoting wider. This is means that the fish was not observed at that given station. Let’s replace the NA values with 0s. In your code for Exercise 4, add the following to your pivot_wider() function call: values_fill = 0.

Submitting application exercises

Once you have completed the activity, push your final changes to your GitHub repo! Make sure your repo is updated on GitHub, and that’s all you need to do to submit application exercises for participation.