Clone a repo + start a new project

  • Go to the appex-04-[GITHUB USERNAME] repo, clone it, and start a new project in RStudio.

  • Click on the green Code button, select the second option “Open with GitHub Desktop”

  • The GitHub Desktop application will open up, with a white window that says “Clone a Repository”. Important: in the second line that says “Local Path”, there is a button that says Choose.... Click on it, and find and select the folder we created for this course. Then hit the blue Clone button.

  • After successfully cloning, the window will disappear and you will see the that Current Repository is the one you just cloned. Success!

  • Navigate to the project folder you just created within the Math118 folder, and open the appex04-kagglesurvey.Rmd file to begin.

Practice with data wrangling

library(tidyverse)

The data provided here are a subset of the survey results provided by Kaggle, retaining observations with full responses for selected variables.

# From Kaggle: https://www.kaggle.com/datasets/kaggle/kaggle-survey-2017/
datascience <- read_csv("data/kaggle_survey_subset.csv", show_col_types = F) 

Warm-up exercise

Among the respondents, what was the most popular Major studied?

# code here

Group analysis

Today, your group is going to dive deeper into the datascience survey data. I want you to generate your own investigation. Using your data-wrangling skills, try to find some interesting conclusions. At the end of class, your group will share your process and results with the rest of the class!

Your final results must include:

  • Summary statistics or frequency table

  • Visualization

  • Comments of your code using the pound-symbol (#) to describe what you are doing in each code chunk

You can create more than one visualization and/or more than one table. Whatever speaks to you! The individual components (i.e. table/summary stats vs plot) do not need to use the same set of variables. The following code chunks are provided as guidance, but you are welcome to go in whatever order you’d like and also to create more code chunks!

As a stretch goal, I encourage you to try using mutate() somewhere in your analysis, but it isn’t necessary!

Mutating/subsetting (optional)

Summary statistics/tables

Visualizations

Data dictionary

Below is the data dictionary for the subset of the Kaggle data data.

variable class description
Country character Home country of employee
Gender character Selected gender, one of “A different identity”, “Female”, “Male”, “Non-binary, genderqueer, or gender non-conforming”
Age double Age at time of survey
EmploymentStatus character One of “Employed full-time”, “Employed part-time”, or “Independent contractor, freelancer, or self-employed
EmployerIndustry character Industry of current employer
FormalEducation character Highest level of education
Major character College major
CompensationAmount double Total compensation
CompensationCurrency character 3-letter currency code for day-to-day currency
CurrentJobTitle character Job title
TitleFit character Assessment of how well title fits actual duties. One of “Fine”, “Perfectly”, “Poorly”
LanguageRecommendation character Recommended programming language
WorkDataVisualizations character Proportion of job dedicated to creating data visualizations, broken into pre-determined categories
JobSatisfaction character Rating of job satisfaction on scale of 1-10, where 1 is not satisfied and 10 is highly satisfied
JobSatisfaction2 double Rating of job satisfaction on scale of 1-10, where 1 is not satisfied and 10 is highly satisfied