Most of R’s functions are vectorised, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time. This makes writing code more concise, easy to read, and less error prone. Remember how when we multiplied two columns of the gapminder data it automatically paired the values?

x <- 1:4
x * 2
## [1] 2 4 6 8

The multiplication happened to each element of the vector.

Comparison operators, logical operators, and many functions are also vectorized:

Comparison operators

a <- (x > 2)
a
## [1] FALSE FALSE  TRUE  TRUE

Apply functions

We can take advantage of vectorization to apply a function to each item in a list. This is similar to comparing each item in a list to a value or multiplying each pair of values from two lists together. There are multiple apply functions. lapply takes a list and applies the function to each item in the list and outputs a list. sapply is similar to lapply except that it attempts to produce the most logical (simple) output format. mapply allows multiple inputs to the function.

This example provides our baseline goal: a function that allows us to calculate the sum of the population on a continent in a given year.

calc_continent_pop<-function(continent,year) {
  sum(gapminder[gapminder$year == year & gapminder$continent == continent,"pop"])
}
calc_continent_pop('Oceania',2007)
## [1] 24549947

Here I have simplified the function to assume the year 2007. This allows us to use lapply to calculate the sum of the population on each continent. I first checked that it worked for one continent before applying to all data.

calc_continent_pop<-function(continent) {
  sum(gapminder[gapminder$year == 2007 & gapminder$continent == continent,"pop"])
}
calc_continent_pop('Oceania')
## [1] 24549947
results <- lapply(unique(gapminder$continent), calc_continent_pop)
names(results) <- unique(gapminder$continent)
results
## $Asia
## [1] 3811953827
## 
## $Europe
## [1] 586098529
## 
## $Africa
## [1] 929539692
## 
## $Americas
## [1] 898871184
## 
## $Oceania
## [1] 24549947

Here you see that sapply outputs a vector (to which we have added names), rather than the list output by lapply.

results <- sapply(unique(gapminder$continent), calc_continent_pop)
names(results) <- unique(gapminder$continent)
results
##       Asia     Europe     Africa   Americas    Oceania 
## 3811953827  586098529  929539692  898871184   24549947

To select a particular year we go back to our original function. We can now use mapply. Note the rearrangement of the variables.

calc_continent_pop<-function(continent,year) {
  sum(gapminder[gapminder$year == year & gapminder$continent == continent,"pop"])
}
results <- mapply(calc_continent_pop,unique(gapminder$continent), 2007)
names(results) <- unique(gapminder$continent)
results
##       Asia     Europe     Africa   Americas    Oceania 
## 3811953827  586098529  929539692  898871184   24549947

Note that the second argument to mapply may be a list. In that case the sum for the first continent would be calculated for the first item in the list only. What is the following calculating?

results <- mapply(calc_continent_pop,unique(gapminder$continent), c(1952, 1957, 1962, 1967, 1972))
names(results) <- unique(gapminder$continent)
results
##       Asia     Europe     Africa   Americas    Oceania 
## 1395357352  437890351  296516865  480746623   16106100

Challenge

Write a function that takes the gapminder dataset and gets the mean gdp per capita (you need to weight by the size of the population) for a given year. Apply this to all continents and all years. Hint: first use mapply for a single year, then sapply this function to the list of years.