R for reproducible scientific analysis
Introduction to R and RStudio
Learning Objectives
- To gain familiarity with the RStudio IDE
Introduction to RStudio
Welcome to the R portion of the Software Carpentry workshop.
The best way to learn how to program is to do something useful, so this introduction to R is built around a common scientific task: data analysis. You will learn some of the fundamentals of the R language, but our real goal is for you to learn to conduct analyses efficiently and to do so in a way that is reproducible (by you and others). We use R in our lessons because:
we have to use something for examples; it’s free, well-documented, and runs almost everywhere; it has a large (and growing) user base among scientists; and it has a large library of external packages available for performing diverse tasks.
We’ll be using RStudio: a free, open source R integrated development environment. It provides a built in editor, works on all platforms (including on servers) and provides many advantages such as integration with version control and project management.
Basic layout
When you first open RStudio, you will be greeted by three panels:
- The interactive R console (entire left)
- Workspace/History (tabbed in upper right)
- Files/Plots/Packages/Help (tabbed in lower right)
Once you open files, such as R scripts, an editor panel will also open in the top left.
Work flow within RStudio
There are two main ways one can work within RStudio.
- Test and play within the interactive R console then copy code into a .R file to run later.
- This works well when doing small tests and initially starting off.
- It quickly becomes laborious
- Start writing in an .R file and use RStudio’s command / short cut to push current line, selected lines or modified lines to the interactive R console.
- This is a great way to start; all your code is saved for later
- You will be able to run the file you create from within RStudio or using R’s
source()
function.
Introduction to R
Much of your time in R will be spent in the R interactive console. This is where you will run all of your code, and can be a useful environment to try out ideas before adding them to an R script file. This console in RStudio is the same as the one you would get if you just typed in R
in your commandline environment.
The first thing you will see in the R interactive session is a bunch of information, followed by a “>” and a blinking cursor. In many ways this is similar to the shell environment you learned about during the shell lessons: it operates on the same idea of a “Read, evaluate, print loop”: you type in commands, R tries to execute them, and then returns a result.
R Packages
It is possible to add functions to R by obtaining a package written by someone else. As of this writing, there are over 7,000 packages available on CRAN (the comprehensive R archive network). R and RStudio have functionality for managing packages:
- You can see what packages are installed by typing
installed.packages()
- You can install packages by typing
install.packages("packagename")
, wherepackagename
is the package name, in quotes. - You can update installed packages by typing
update.packages()
- You can remove a package with
remove.packages("packagename")
- You can make a package you have installed available for use in your session with
library(packagename)
Install the following packages: ggplot2
, plyr
, gapminder
if you have not done so already.