Learning Objectives

Installing R

Install R and RStudio

Orientation with RStudio

R is the name of the programming language, and RStudio is a convenient and widely used interface to that language.

Since you will be using it for the remainder of the course, you should familiarize yourself with the RStudio GUI.

RStudio GUI

It consists of four windows,

RStudio documentation can be found at http://www.rstudio.com/ide/docs/. Of those, the most likely to be useful to you are:

  1. Play with the settings
    • Go to Tools > Global Options.
      • Change the font and color of the editor and console. Which one do you like the best?
      • Can you guess why the order of the panels on my session are different?

Working Directory and R Projects

R Projects

Keeping all the files associated with a project organized together -input data, R scripts, analytical results, figures- is such a wise and common practice that RStudio has built-in support for this via its projects. Read this for more information about RStudio projects.

You will use RStudio projects for your labs and homeworks, and final paper. Create a RStudio project that you will use for all your labs.

  • File -> New Project
  • Select “New Directory”
  • Select “Empty Project”
  • Select a name for your project as Directory Name. Then choose where to put this directory with “Create project as sub-directory of”. Don’t worry about the other options.

Creating your first R Markdown Document

For this course, you will be we using R Markdown documents for homeworks. Create your firs

Cheat sheets and additional resources about R Markdown are available at http://rmarkdown.rstudio.com/.

Using R as a calculator

Although it is so much more, you can use R as a calculator. For example, to add, subtract, multiply or divide:

2 + 3
2 - 3
2 * 3
2 / 3

The power of a number is calculated with ^, e.g. \(4^2\) is,

4 ^ 2

R includes many functions for standard math functions. For example, the square root function is sqrt, e.g. \(\sqrt{2}\),

sqrt(2)

And you can combine many of them together

(2 * 4 + 3 ) / 10
sqrt(2 * 2)

Variables and Assignment

In R, you can save the results of calculations into objects that you can use later. This is done using the special symbol, <-. For example, this saves the results of 2 + 2 to an object named foo

foo <- 2 + 2

You can see that foo is equal to 4

foo

And you can reuse foo in other calculations,

foo + 3
foo / 2 * 8 + foo
Note:

You can use = instead of <- for assignment. You may see this in some other code. There are some technical reasons to use <- instead of =, but the primary reason we will use <- instead of = is that this is the convention used in modern R programs.

  1. Creating “Objects”
    • Create a variable named whatever strikes your fancy and set it equal to the square root of 2.
    • Then multiply it by 4.
  • Multiply the variable by itself and reassign it to the same variable name. What happens?

Comments

Any R code following a hash (#) is not executed. These are called comments, and can and should be used to annotate and explain your code. For example, this doesn’t do anything.

#thisisacomment 

And in this, nothing after the # is executed,

#this is still a comment
2 + 2 # this is also a comment

Challenge: What is this equal to?

5 * 4 # + 3 # - 8

Missing Data

Missing data is particularly important

foo <- c(1, 2, NA, 3, 4)

The function na.omit is particularly useful.

It removes any row in a dataset with a missing value in any column.

For example:

dfrm <- data.frame(x = c(NA, NA, 4, 3), 
                   y = c(NA, NA, 7, 8)
                   )

dfrm

na.omit(dfrm)
  1. Dealing with NA’s
    • What is the result of 2 + NA
    • What is the result of mean(foo)
    • Look at the documentation of mean to change how that function handles missing values.
    • How does median(foo) work?
    • foo > 2. Are all the entries TRUE and FALSE?
    • What does is.na(foo) do? What about ! is.na(foo) ?
    • What does foo[! is.na(foo)] do?
    • Try the following:
    dfrm2 <- data.frame(x = c(NA, 2, NA, 4), y = c(NA, NA, 7, 8))
    
    dfrm2
    
    na.omit(dfrm2)
    • Did you keep all the data? Did you do something wrong?

Loading Data into R

For the remainder of this lab you will be using a dataset of GDP per capita and fertility from Gapminder.

Download the csv (“comma-separated values”) from here.

Then load the file

gapminder <- read.csv("gapminder.csv", stringsAsFactors = FALSE)

This creates a data frame. A data frame is a type of R object that corresponds to what you usually think of as a dataset or a spreadsheet — rows are observations and columns are variables.

  1. Taking a look at your data
    • What happens when you do the following?
    gapminder
    • How much can you tell about the dataset from doing that?

This is a lot of information. How can we get a more useful picture of the dataset as a whole?

dim(gapminder)
names(gapminder)
head(gapminder)
tail(gapminder)
summary(gapminder)
  1. Given this, let’s try again:
    • What are the variables in the dataset?
    • How many observations are there?
    • What is the unit of observation?
    • What types of data are the different variables?
    • What is the range of years in the data?
    • What are the mean and median life expectancy?

Working with variables in Data Frames

You can extract single variables (or columns) and perform different operations on them. To extract a variable, we use the dollar sign ($) extraction operator.

gapminder$lifeExp

Again, perhaps a summary may be more interesting. We can do more specific operations on this variable alone:

mean(gapminder$lifeExp)



median(gapminder$lifeExp)
sd(gapminder$lifeExp)
min(gapminder$lifeExp)
max(gapminder$lifeExp)
quantile(gapminder$lifeExp)
  1. Descriptive Statistics
    • What are the mean and median of GDP per capita?
    • Find the 30th percentile of GDP per capita?
    • How many countries are there in the dataset? How many years?
      • The function length() calculates the length of a vector.
      • The function unique() returns the number of unique values in a vector.
    • Choose a variable and multiply it by 2. What happens? Check the variable in the data set to see if it changed.

On your own

Make sure your lab compiles neatly. Make sure that you are not printing unnecessary output.

You can find the RMarkdown code that I used to create this document on the class website. Download it and check the code that I use to keep it nice and clean as well and the Markdown code that I use through the text (e.g to create lists and other styling)


footnotes:


Science should be open! Here at Cornell and everywhere, this lab is released under a Creative Commons Attribution-ShareAlike 3.0 Unported.