For this lab we will use the replication data from Michael Ross’s “The Oil Curse: How Petroleum Wealth Shapes the Development of Nations.”
We will be exploring the relationship between oil dependency and democracy.
This lab will use some libraries you’ve seen before and we should load them now plus broomand coefplot which you might need to install
library(tidyverse)
library(broom)
library(coefplot)
Read in the Ross data
rossdata <- read.csv("https://raw.githubusercontent.com/GarciaRios/govt_6029/master/data/ross_2012.csv", stringsAsFactors = FALSE)
head (rossdata)
glimpse(rossdata)
This is a pretty big data-set, we do not need all the variables let’s subset our data to include only the variables we will use.
ctyyearpolitylogoil_gdpcap2000_sup_Q_1logGDPcapoecdrossdata_short
Some of those names are too long, we should change it to something meaningful and short. This can be done easily using dplyr and rename.
rossdata_short <- rossdata_short %>%
rename(oil = logoil_gdpcap2000_sup_Q_1,
gdp = logGDPcap)
This data-frame is way easier to glimpse at:
glimpse(rossdata_short)
A lot of missing values here. Let’s omit them and then look at a summary of the data set
rossdata_short <- rossdata_short %>%
na.omit()
rossdata_short %>%
summary()
Finally! we are ready to start data-analyzing..
We are going to be exploring the relationship between Democracy level (polity) and other covariates.
Let’s explore these relationships with plots.
Okay, let’s do it together
We begin simple…
Unfortunately, a simple scatter plot makes it hard to detect any relationship. However, ggplot2 makes it easy to add different colors and shapes which might help identify trends.
It seems like the higher the GDP the more democratic countries are, except if you are a high oil producer or a non-OECD. Let’s explore these relationships using a regression.
model1 <- lm(polity ~ oil, data = rossdata_short)
summary(model1)
Let’s now include controls for GDP per capita and OECD membership
model2<-lm(polity ~ gdp + oil + oecd, data = rossdata_short)
summary(model2)
Recall the OECD membership clustering? Let’s try an interaction
model3 <- lm(polity ~ gdp + oil*oecd, data = rossdata_short)
summary(model3)
broom has three main functions, all of which return data frames (not lists, numeric vectors, or other types of object). glance returns a data frame with a single row summary of the model:
glance(model2)
tidy returns a data frame with a row for each coefficient estimate:
tidy(model2)
augment returns the original data frame used in the model with additional columns for fitted values, the standard errors of those fitted values, residuals, etc.
head(augment(model2))
How about a coefficient plot, roppeladder… etc.
ggplot(tidy(model2) %>% filter(term != "(Intercept)"),
aes(x = term, y = estimate,
ymin = estimate - 2 * std.error,
ymax = estimate + 2 * std.error)) +
geom_pointrange() +
coord_flip()
We can also use coefplot
coefplot(model2, coefficients = c("oecd", "oil", "gdp"))
Several packages (stargazer, texreg, apsrtable) are useful for creating publication type regression tables. stargazer and texreg are the most complete package. Both allow output to either LaTeX or HTML tables for many types of statistical models. We’ll use stargazer here:
library(stargazer)
stargazer(model1, model2, model3, type = "html")
We are going to use predict to get predicted values. We first have to set up a newdata
xnew <- data.frame(
gdp = 5.9, oil = 0, oecd = 1)
predict(model2, newdata = xnew, interval = "confidence")
xnew2 <- data.frame(gdp = 5.9, oil = 0, oecd = 0)
predict(model2, newdata = xnew2, interval = "confidence")
What is this really doing?
model2
names(model2)
pe2<-model2$coefficients
pe2
1*pe2[1] + 5.9*pe2[2] + 0*pe2[3] + 1*pe2[4]
1*pe2[1] + 5.9*pe2[2] + 0*pe2[3] + 0*pe2[4]
We can create a matrix of hypothetical data to obtain predictions for a range of values:
xnew <- expand.grid(gdp = seq(4,11),
oil = 0,
oecd = 1)
xnew <- expand.grid(gdp = seq(4,11),
oil = mean(rossdata_short$oil),
oecd = mean(rossdata_short$oecd))
Now we feed this new data into predict
pred.res <- predict(model2, newdata = xnew, interval="confidence")
To plot these predicted values we have to create a data frame containing both the predicted values generated by predict and the data used to generate those values
mod2_pred_df <- cbind(xnew, pred.res)
mod2_pred_df
We now have a data-frame that can easily be taken by ggplot
ggplot(mod2_pred_df, aes(x = gdp, y = fit,
ymin = lwr, ymax = upr)) +
geom_line() +
geom_ribbon(alpha = 0.2)+
labs(y = "Democracy", x = "GDP")+
theme_bw()
What about oil?
xnew <- expand.grid(gdp = mean(rossdata_short$gdp),
oil = seq(0,11),
oecd = mean(rossdata_short$oecd))
pred.res <- predict(model2, newdata = xnew, interval = "confidence")
mod2_pred_df <- cbind(xnew, pred.res)
ggplot(mod2_pred_df, aes(x = oil, y = fit,
ymin = lwr, ymax = upr)) +
geom_line() +
geom_ribbon(alpha = 0.2)+
labs(y = "Democracy", x = "Oil")+
theme_bw()
Let’s get predictions from the model with the interaction.
summary(model3)
xnew <- expand.grid(gdp = mean(rossdata_short$gdp),
oil = seq(0,11),
oecd = c(0,1))
pred.res <- predict(model3, newdata=xnew, interval="confidence")
mod3_pred_df <- cbind(xnew, pred.res)
Now we can ggplot it
ggplot(mod3_pred_df, aes(x = oil , y = fit,
ymin = lwr, ymax = upr,
color = factor(oecd),
fill = factor(oecd))) +
geom_line() +
geom_ribbon( alpha = 0.7) +
labs(title = "Predicted Democracy by GDP and OECD membership",
y = "Predicted Value of Democracy",
x = "Oil") +
scale_fill_discrete(name = "OECD",
labels = c("Non Member","Member")) +
scale_color_discrete(name = "OECD",
labels = c("Non Member","Member")) +
theme_bw()
Science should be open! Here at Cornell and everywhere, this lab is released under a Creative Commons Attribution-ShareAlike 3.0 Unported.