For this lab we will use the replication data from Michael Ross’s “The Oil Curse: How Petroleum Wealth Shapes the Development of Nations.”
We will be exploring the relationship between oil dependency and democracy.
This lab will use some libraries you’ve seen before and we should load them now plus broom
and coefplot
which you might need to install
library(tidyverse)
library(broom)
library(coefplot)
Read in the Ross data
rossdata <- read.csv("https://raw.githubusercontent.com/GarciaRios/govt_6029/master/data/ross_2012.csv", stringsAsFactors = FALSE)
head (rossdata)
glimpse(rossdata)
This is a pretty big data-set, we do not need all the variables let’s subset our data to include only the variables we will use.
cty
year
polity
logoil_gdpcap2000_sup_Q_1
logGDPcap
oecd
rossdata_short
Some of those names are too long, we should change it to something meaningful and short. This can be done easily using dplyr
and rename
.
rossdata_short <- rossdata_short %>%
rename(oil = logoil_gdpcap2000_sup_Q_1,
gdp = logGDPcap)
This data-frame is way easier to glimpse at:
glimpse(rossdata_short)
A lot of missing values here. Let’s omit them and then look at a summary of the data set
rossdata_short <- rossdata_short %>%
na.omit()
rossdata_short %>%
summary()
Finally! we are ready to start data-analyzing..
We are going to be exploring the relationship between Democracy level (polity) and other covariates.
Let’s explore these relationships with plots.
Okay, let’s do it together
We begin simple…
Unfortunately, a simple scatter plot makes it hard to detect any relationship. However, ggplot2
makes it easy to add different colors and shapes which might help identify trends.
It seems like the higher the GDP the more democratic countries are, except if you are a high oil producer or a non-OECD. Let’s explore these relationships using a regression.
model1 <- lm(polity ~ oil, data = rossdata_short)
summary(model1)
Let’s now include controls for GDP per capita and OECD membership
model2<-lm(polity ~ gdp + oil + oecd, data = rossdata_short)
summary(model2)
Recall the OECD membership clustering? Let’s try an interaction
model3 <- lm(polity ~ gdp + oil*oecd, data = rossdata_short)
summary(model3)
broom
has three main functions, all of which return data frames (not lists, numeric vectors, or other types of object). glance
returns a data frame with a single row summary of the model:
glance(model2)
tidy
returns a data frame with a row for each coefficient estimate:
tidy(model2)
augment
returns the original data frame used in the model with additional columns for fitted values, the standard errors of those fitted values, residuals, etc.
head(augment(model2))
How about a coefficient plot, roppeladder… etc.
ggplot(tidy(model2) %>% filter(term != "(Intercept)"),
aes(x = term, y = estimate,
ymin = estimate - 2 * std.error,
ymax = estimate + 2 * std.error)) +
geom_pointrange() +
coord_flip()
We can also use coefplot
coefplot(model2, coefficients = c("oecd", "oil", "gdp"))
Several packages (stargazer
, texreg
, apsrtable
) are useful for creating publication type regression tables. stargazer
and texreg
are the most complete package. Both allow output to either LaTeX or HTML tables for many types of statistical models. We’ll use stargazer here:
library(stargazer)
stargazer(model1, model2, model3, type = "html")
We are going to use predict
to get predicted values. We first have to set up a newdata
xnew <- data.frame(
gdp = 5.9, oil = 0, oecd = 1)
predict(model2, newdata = xnew, interval = "confidence")
xnew2 <- data.frame(gdp = 5.9, oil = 0, oecd = 0)
predict(model2, newdata = xnew2, interval = "confidence")
What is this really doing?
model2
names(model2)
pe2<-model2$coefficients
pe2
1*pe2[1] + 5.9*pe2[2] + 0*pe2[3] + 1*pe2[4]
1*pe2[1] + 5.9*pe2[2] + 0*pe2[3] + 0*pe2[4]
We can create a matrix of hypothetical data to obtain predictions for a range of values:
xnew <- expand.grid(gdp = seq(4,11),
oil = 0,
oecd = 1)
xnew <- expand.grid(gdp = seq(4,11),
oil = mean(rossdata_short$oil),
oecd = mean(rossdata_short$oecd))
Now we feed this new data into predict
pred.res <- predict(model2, newdata = xnew, interval="confidence")
To plot these predicted values we have to create a data frame containing both the predicted values generated by predict and the data used to generate those values
mod2_pred_df <- cbind(xnew, pred.res)
mod2_pred_df
We now have a data-frame that can easily be taken by ggplot
ggplot(mod2_pred_df, aes(x = gdp, y = fit,
ymin = lwr, ymax = upr)) +
geom_line() +
geom_ribbon(alpha = 0.2)+
labs(y = "Democracy", x = "GDP")+
theme_bw()
What about oil?
xnew <- expand.grid(gdp = mean(rossdata_short$gdp),
oil = seq(0,11),
oecd = mean(rossdata_short$oecd))
pred.res <- predict(model2, newdata = xnew, interval = "confidence")
mod2_pred_df <- cbind(xnew, pred.res)
ggplot(mod2_pred_df, aes(x = oil, y = fit,
ymin = lwr, ymax = upr)) +
geom_line() +
geom_ribbon(alpha = 0.2)+
labs(y = "Democracy", x = "Oil")+
theme_bw()
Let’s get predictions from the model with the interaction.
summary(model3)
xnew <- expand.grid(gdp = mean(rossdata_short$gdp),
oil = seq(0,11),
oecd = c(0,1))
pred.res <- predict(model3, newdata=xnew, interval="confidence")
mod3_pred_df <- cbind(xnew, pred.res)
Now we can ggplot it
ggplot(mod3_pred_df, aes(x = oil , y = fit,
ymin = lwr, ymax = upr,
color = factor(oecd),
fill = factor(oecd))) +
geom_line() +
geom_ribbon( alpha = 0.7) +
labs(title = "Predicted Democracy by GDP and OECD membership",
y = "Predicted Value of Democracy",
x = "Oil") +
scale_fill_discrete(name = "OECD",
labels = c("Non Member","Member")) +
scale_color_discrete(name = "OECD",
labels = c("Non Member","Member")) +
theme_bw()
Science should be open! Here at Cornell and everywhere, this lab is released under a Creative Commons Attribution-ShareAlike 3.0 Unported.