Life expectancy is a key metric for assessing population health, as it takes into account both infant and elderly mortality. It can be used to compare countries, with the generally accepted belief that a higher life expectancy is associated with a higher quality of life. However, the inequality of life expectancy is still very large across and within countries. We will be examining data about life expectancy across countries. In particular, you will analyze data from the World Health Organization using linear regression models in order to learn what factors are important for determining a country’s average life expectancy in a given year.
The data were modified from this Kaggle dataset. According to the website, “the Global Health Observatory (GHO) data repository under World Health Organization (WHO) keeps track of the health status as well as many other related factors for all countries. The datasets are made available to public for the purpose of health data analysis. The dataset related to life expectancy, health factors for 193 countries has been collected from the same WHO data repository website and its corresponding economic data was collected from United Nation website. Among all categories of health-related factors only those critical factors were chosen which are more representative.” However, due to missing data for certain countries, they ultimately decided to remove ten countries from the final dataset. Thus the resultant dataset contains information about 183 countries.
The broom package is installed with the tidyverse, but we need to load it separately in order to make use of it.
In this lab we will work with the tidyverse and
broom packages. You will need to load them in yourself!
In this homework you will once again upload the data on your own! The
life_expectancy.csv data can be found on Canvas \(\Rightarrow\) Files and click on the file
called life_expectancy.csv. Follow the instructions from HW
07 to upload the life_expectancy.csv data into RStudio.
Then, you can load the data as usual using the following. Be sure to
remove the eval = FALSE before knitting!
life_expect <- read.csv("data/life_expectancy.csv")| Variable name | Description |
|---|---|
country |
Country |
year |
Year (2000-2015) |
status |
Developed or Developing country status |
life_expectancy |
life expentancy in age |
adult_mortality |
Adult Mortality Rates of both sexes (probability of dying between 15 and 60 years per 1000 population) |
infant_deaths |
Number of Infant Deaths per 1000 population |
alcohol |
Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol) |
percentage_expenditure |
Expenditure on health as a percentage of Gross Domestic Product per capita (%) |
hepB |
Hepatitis B immunization coverage among 1-year-olds (%) |
measles |
Number of reported cases of Measles per 1000 population |
BMI |
Average Body Mass Index (BMI) of entire population |
under_five_deaths |
Number of under-five deaths per 1000 population |
polio |
Polio immunization coverage among 1-year-olds (%) |
total expenditure |
General government expenditure on health as a percentage of total government expenditure (%) |
diphtheria |
Diphtheria tetanus toxoid and pertussis immunization coverage among 1-year-olds (%) |
HIV_AIDS |
Deaths per 1000 live births HIV/AIDS (0-4 years) |
GDP |
Gross Domestic Product per capita (in USD) |
population |
Population of the country |
thinness_10_19 |
Prevalence of thinness among children and adolescents for Age 10 to 19 (% ) |
thinness_5_9 |
Prevalence of thinness among children for Age 5 to 9(%) |
income_composition |
Human Development Index in terms of income composition of resources (index ranging from 0 to 1) |
schooling |
Number of years of schooling |
To following resource provides code needed to make useful symbols. You may use the code to typeset the characters of interest in the narrative of your document:
$\hat{y}$$\widehat{response}$$y_{unit}$Make sure to follow the tidy coding practices discussed throughout the semester.
NA values for
schooling.life_expectancy.What does that tell you about the average
life expectancy across countries? Is this what you expected to see? Why,
or why not? Include any summary statistics and visualizations you use in
your response.
life_expectancy and schooling.
Linear model is in the form \(\hat{y} = b_0 + b_1 x\).
m_school to
predict average life expectancy by average number of years
of schooling (schooling), and display the output. Based on
the regression output, write the linear model.
m_status to predict the
life expectancy of a country based on its
status, and display the output. Based on the regression
output, write the linear model.
m_status in the
context of the data.
m_school model.
Using this new augmented data frame, create diagnostic plots to
determine if the conditions of linearity and normality are met. For each
condition, briefly explain why or why not based off the appropriate
diagnostic plot.
m_school
model.
Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. we will be checking these to make sure you have been practicing how to commit and push changes.
Once your work is finalized in your GitHub repo, submit the final PDF to Canvas.