Due via Canvas: Fri, March 4
Instructions
- Create a new RMarkdown called
hw1_lastname
(where lastname
should be your actual last name, of course.
- Do all your analysis in the R markdown document.
- Compile an html document
- Submit the html and RMarkdown through Canvas. This should contain all necessary code and materials for another person to run your RMarkdown file.
- Try to work on your own as much as possible. Often, troubleshooting as a group is helpful but please try to push yourself as much as possible
- Do not use new packages that we haven’t used in class/labs. I know some additional packages will make things easier but that’s not the point.
Some other guidance
- All problems should be answerable in at most a few lines of R code. Questions which require looking up values should be answered using R code and not manually checking the value through the RStudio GUI.
- Problems are thematically divided but each bullet point should be seen as a separate exercise
- Do not print unnecessary output you will be penalized for printing long strings of unnecessary output. These reports should be clean and as concise as possible.
- You can use additional options in each code chunk to control how much the html output renders
- You can find more information on how to control display options here
- Use
ggplot2
for your plots. Some of the plots in the problem set might be easier using base R but the purpose of the problem set is to use the skills we are learning.
- Like everything else in the world of coding, there are multiple ways to do this; some more simple (require only one or two verbs or lines of code) other more complex where you might need to combine multiple verbs and perhaps might need to do some googling. I actually want you to do this.
- When writing interpretation to the questions use markdown, do not use comments inside the chunks
Data
The file democracy.csv contains data from Przeworski et. al, Demoracy and Deveolpment: Political Institutions and Well-Being in the Worlds, 1950-19901. The data have been slightly recoded, to make higher values indicate higher levels of political liberty and democracy.
COUNTRY |
numerical code for each country |
CTYNAME |
name of each country |
REGION |
name of region containing country |
YEAR |
year of observation |
GDPW |
GDP per capita in real international prices |
EDT |
average years of education |
ELF60 |
ethnolinguistic fractionalization |
MOSLEM |
percentage of Muslims in country |
CATH |
percentage of Catholics in country |
OIL |
whether oil accounts for 50+% of exports |
STRA |
count of recent regime transitions |
NEWC |
whether county was created after 1945 |
BRITCOL |
whether country was a British colony |
POLLIB |
degree of political liberty (1–7 scale, rising in political liberty) |
CIVLIB |
degree of civil liberties (1–7 scale, rising in civil liberties) |
REG |
presence of democracy (0=non-democracy, 1=democracy) |
Problems
- Initial set up
- Load the Democracy dataset into memory as a dataframe. Use the
read.csv
function, and the stringsAsFactors = FALSE
option. Note that missing values are indicated by “.
” in the data. Find the option in read.csv
that controls the string used to indicate missing values.
- Initial data exploration
- Report summary statistics (means and medians, at least) for all variables.
- Create a histogram for political liberties.
- Now, create a histogram for political liberties in which each unique value of the variable is in its own panel. What is new in this plot as compared to the previous one?
- Create a histogram for GDP percapita.
- Create a histogram for log GDP per capita. How is this histogram different than the one for GDP per capita when it was not logged?
- Explore relationships
- Create a scatterplot of political liberties against GDP per capita. That is, political liberties is the dependent variable.
- When there is a lot of overlap in a scatter plot it is useful to “jitter” the points (randomly move them up and down). Make the previous plot but jitter the points to mitigate the problem of overplotting. (Only jitter the points vertically). You can use
geom_jitter
in ggplot2 for this.
- Create a scatterplot of political liberties against log GDP per capita. Jitter the points. How is the relationship different than when GDP per capita was not logged.
- Create a boxplot of GDP per capita for oil producing and non-oil producing nations, make sure to have both values in one single graph.
- Add a substantive interpretation to this graph.
- Now, create a graph with boxplots of each region’s GDP per capita where oil producing and non-oil show different color.
- Add a substantive interpretation to this graph. How does it compare to the previous graph?
- Transform data and analyze
- Calculate the mean GDP per capita in countries with at least 40 percent Catholics. How does it compare to mean GDP per capita for all countries?
- Calculate the average GDP per capita in countries with greater than 60% ethnolinguistic fractionalization, less than 60%, and missing ethnolinguistic fractionalization. Hint: you can calculate this with the dplyr verbs:
filter
,mutate
, group_by
and/or summarise
.
- What was the median of the average years of education in 1985 for all countries? One country is right at the median, which country is this?
- Which countries were closest to the median years of education in 1985 among all countries?
- What was the median of the average years of education in 1985 for democracies?
- Which democracy was (or democracies were) closest to the median years of education in 1985 among all democracies?
- What were the 25th and 75th percentiles of ethnolinguistic fractionalization for new and old countries?
Notes:
1 Przeworski, Adam, Michael E. Alvarez, Jose Antonio Cheibub, and Fernando Limongi. 2000. Democracy and Development: Political Institutions and Well-Being in the World, 1950-1990. Cambridge University Press.