class: center, middle, inverse, title-slide .title[ # Welcome to Intro to Data Science ] .author[ ### Becky Tang ] .date[ ### Sep. 12, 2022 ] --- class: center, middle # Welcome! --- ## What is Data Science? *"Data science is a concept to unify statistics, data analysis, machine learning and their related methods in order to understand and analyze actual phenomena with data. It employs techniques and theories drawn from many fields within the context of <font class="vocab">mathematics, statistics, information science, and computer science</font>."* .pull-right[ [-Wikipedia](https://en.wikipedia.org/wiki/Data_science) ] We will be leveraging the programming language R to introduce ourselves to data science. --- ## Necessary background We assume ZERO background in data science, statistics, and coding. We will help you get started with learning statistics, but this course will *not* be your typical intro stats course. There is a large emphasis on computing, so all we need is an openness to learn! (and a laptop) That being said, there is a lot of material in this course! --- class: regular ## Instructor [Becky Tang](https://beckytang.rbind.io/) <i class="material-icons">mail_outline</i> [btang@middlebury.edu](mailto:btang@midlebury.edu)<br> <i class="material-icons">calendar_today</i> M 3-5pm, F 10:30am-12:00pm (and by appointment) -- .pull-left[ <img src="img/00/eBird.png" width="70%" style="display: block; margin: auto;" /> <img src="img/00/beautiful_bird.png" width="70%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="img/00/restio.jpg" width="100%" style="display: block; margin: auto;" /> ] --- ## Where to find information - Course website: [https://math118-fall2022.github.io/website/](https://math118-fall2022.github.io/website/) - GitHub (assignments): [https://github.com/math118-fall2022](https://github.com/math118-fall2022) --- ## Course Objectives - Learn to explore, visualize, and analyze data in a reproducible and shareable manner - Gain experience in data wrangling and munging, exploratory data analysis, predictive modeling, and data visualization - Work on problems and case studies inspired by and based on real-world questions and data - Learn to effectively communicate results through written assignments and final project presentation --- class: middle, center ## Examples of Data Science --- ## Partisanship and marriage age <img src="img/00/marriage_partisan.png" width="50%" style="display: block; margin: auto;" /> [Source: 538](https://fivethirtyeight.com/features/how-many-republicans-marry-democrats/) --- ## National parks <img src="img/00/parks.png" width="60%" style="display: block; margin: auto;" /> [Source: 538](https://fivethirtyeight.com/features/the-national-parks-have-never-been-more-popular/) --- ## Gender pay gap <img src="img/00/pay_gap.png" width="100%" style="display: block; margin: auto;" /> [Source: InformationIsBeautiful](https://informationisbeautiful.net/visualizations/gender-pay-gap/) --- # Serena Williams <img src="img/00/serena.png" width="95%" style="display: block; margin: auto;" /> [Source: NYT](https://www.nytimes.com/interactive/2022/08/28/sports/tennis/serena-age-opponents.html) --- ## Abortion maps <img src="img/00/abortion_dist.png" width="80%" style="display: block; margin: auto;" /> [Source: NYT](https://www.nytimes.com/interactive/2022/06/24/upshot/dobbs-roe-abortion-driving-distances.html?searchResultPosition=17) --- ## Police representation <img src="img/00/police_rep.png" width="65%" style="display: block; margin: auto;" /> [Source: 538](https://fivethirtyeight.com/features/reexamining-residency-requirements-for-police-officers/) --- ## TED Talk [https://www.youtube.com/watch?v=hVimVzgtD6w](https://www.youtube.com/watch?v=hVimVzgtD6w) --- class: center, middle ## Your Turn! --- ## Create a GitHub account <small> <small> .instructions[ Go to https://github.com/, and create an account (unless you already have one). After you create your account, click [here](https://forms.gle/CzeyScmzudMTbMTQ6) and enter your GitHub username. ] Tips for creating a username from [Happy Git with R](http://happygitwithr.com/github-acct.html#username-advice). - Incorporate your actual name! - Reuse a username from other contexts if you can, e.g., Twitter - Pick a username you will be comfortable revealing to your future boss. - Shorter is better than longer. - Be as unique as possible in as few characters as possible. - Make it timeless. - Avoid words laden with special meaning in programming, like `NA`. </small> .instructions[ Raise your hand if you have any questions. ] --- ## Time for an analysis! - Let's take a look at the [UN Votes](https://math118-fall2022.github.io/website/appex/00-unvotes.html) analysis --- ## Discussion Share with someone next to you! 1. Start by introducing yourself! Name, year, major/academic interest, favorite hobby. 2. Consider the plot in Part 1. Describe how the voting behaviors of the five countries have changed over time. 3. Consider the plot in Part 2. - On which issues have the three countries voted most similarly in recent years? - On which issues have they voted most differently in recent years? - Has this changed over time? --- class: middle, center ## Course Policies --- ## Class Meetings -- <font class="vocab">Lecture</font> - Focus on concepts behind data analysis - Interactive lecture that includes examples and hands-on exercises - Bring fully-charged laptop to every lecture - Please let me know as soon as possible if you do not have access to a laptop --- ## Textbooks None are required, but here are some supplemental resources: - [OpenIntro Statistics, 4th Edition](https://www.openintro.org/stat/textbook.php?stat_book=os) - Free PDF available online. Hard copy available for purchase. - Assigned readings about statistical content - [R for Data Science](http://r4ds.had.co.nz/) - Free online version. Hard copy available for purchase. - Assigned readings and resource for R coding using `tidyverse` syntax. --- ## Activities & Assessments - <font class="vocab">Homework</font>: Weekly individual homework assignments combining conceptual and computational skills. Usually assigned Wednesdays after class, and due the following Tuesday on Canvas at 11:59pm. *Lowest score will be dropped.* -- - <font class="vocab">Application Exercises</font>: Exercises usually started in class and when assigned, due at 11:59pm before the next class. Graded based on a good-faith effort has been made in attempting all parts. Successful on-time completion of at least 90% will result in full points for that AE; anything lower than that will be assigned points accordingly. *Lowest two scores will be dropped.* -- - <font class="vocab">Data Visualization Examples</font>: Sign up to briefly present (~2 min) a visualization that you find interesting and meaningful. Around two presentations per week. This will include a small written component as well. --- ## Activities & Assessments (cont.) - <font class="vocab">Exams</font>: One take-home midterm exam, scheduled for **Monday, Oct. 17, 2022**. -- - <font class="vocab">Final Project</font>: Projects presented during the last two days of the semester on **Friday Dec. 9, 2022 and Monday, Dec. 12, 2022**. You must complete the project and present in class to pass the course. -- - <font class="vocab">Participation</font>: I expect you to attend every class! Additionally, there will be various small assignments to help deepen your learning experience (e.g. reflection exercises, peer feedback) --- ## Grade Calculation <small> | Component | Weight | |---------------|--------| | Homework | 30%| | Application Exercises | 15% | | Midterm Exam | 20% | | Final Project | 25% | | Sharing visualization | 5% | | Participation | 5% | -- - You are expected to attend all lectures. Excessive absences can impact your final course grade. </small> --- ## Excused Absences - Students who miss a class due to a scheduled varsity trip, religious holiday, or short-term illness should fill out the respective form. - These excused absences do not excuse you from assigned work. -- - If you have a personal or family emergency or chronic health condition that affects your ability to participate in class, please contact your academic dean’s office. -- - Exam dates cannot be changed and no make-up exams will be given. --- ## Late Work & Regrade Requests - Homework assignments: - After the assigned deadline, there is a 10% penalty for each day the assignment is late - Please communicate with me early if you will need a homework extension! - Late work will only be accepted for homework assignments. - Regrade requests must be submitted within one week of when the assignment is returned --- ## Academic Honesty All work for this class should be done in accordance with the Middlebury Honor code. Any violations will automatically result in a grade of 0 on the assignment and will be reported. --- ## Reusing Code - Unless explicitly stated otherwise, you may make use of online resources (e.g. StackOverflow) for coding examples on assignments. If you directly use code from an outside source (or use it as inspiration), you must or explicitly cite where you obtained the code. Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. -- - On individual assignments, you may discuss the assignment with one another; however, you may not directly share code or write up with other students. This includes copy-and-paste sharing, as well as showing your screen with the code displayed to another student. -- - On team assignments, you may not directly share code or write up with another team. Unauthorized sharing of the code or write up will be considered a violation for all students involved. --- ## Where to find help - **If you have a question during lecture, feel free to ask it!** There are likely other students with the same question, so by asking you will create a learning opportunity for everyone. -- - **Office Hours**: A lot of questions are most effectively answered in-person, so office hours are a valuable resource. Please use them! If my scheduled office hours do not work for you, feel free to send me an e-mail! -- - **Campuswire**: An online forum for asking and answering questions. Great place to post general/clarifying questions about lecture material or assignments. -- - **Fellow students**: I encourage you to work together. However, I suggest and expect that you make an honest individual attempt at all the problems before consulting with others. --- ### Inclusion In this course, we will strive to create a learning environment that is welcoming to all students. If there is any aspect of the class that is not welcoming or accessible to you, please let me know immediately. <br><br> -- Additionally, if you are experiencing something outside of class that is affecting your performance in the course, please feel free to talk with me and/or your academic dean. --- class: center, middle ## Questions? --- ## To-do - Fill out the **Getting To Know You Survey on Canvas** - due Wednesday, 9/14 at 11:59pm - After filling out the survey, [schedule](https://calendly.com/beckytang/10min) a brief 1:1 meeting with me! - Please make sure you can access the [course website](https://math118-fall2022.github.io/website/) - Enter your Github username in the [Googleform](https://forms.gle/CzeyScmzudMTbMTQ6) - Please create a folder on your Desktop called "Math118"