Most of R’s functions are vectorised, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time. This makes writing code more concise, easy to read, and less error prone. Remember how when we multiplied two columns of the gapminder data it automatically paired the values?
x <- 1:4
x * 2
## [1] 2 4 6 8
The multiplication happened to each element of the vector.
Comparison operators, logical operators, and many functions are also vectorized:
Comparison operators
a <- (x > 2)
a
## [1] FALSE FALSE TRUE TRUE
We can take advantage of vectorization to apply
a function to each item in a list. This is similar to comparing each item in a list to a value or multiplying each pair of values from two lists together. There are multiple apply
functions. lapply
takes a list and applies the function to each item in the list and outputs a list. sapply
is similar to lapply
except that it attempts to produce the most logical (simple) output format. mapply
allows multiple inputs to the function.
This example provides our baseline goal: a function that allows us to calculate the sum of the population on a continent in a given year.
calc_continent_pop<-function(continent,year) {
sum(gapminder[gapminder$year == year & gapminder$continent == continent,"pop"])
}
calc_continent_pop('Oceania',2007)
## [1] 24549947
Here I have simplified the function to assume the year 2007. This allows us to use lapply
to calculate the sum of the population on each continent. I first checked that it worked for one continent before applying to all data.
calc_continent_pop<-function(continent) {
sum(gapminder[gapminder$year == 2007 & gapminder$continent == continent,"pop"])
}
calc_continent_pop('Oceania')
## [1] 24549947
results <- lapply(unique(gapminder$continent), calc_continent_pop)
names(results) <- unique(gapminder$continent)
results
## $Asia
## [1] 3811953827
##
## $Europe
## [1] 586098529
##
## $Africa
## [1] 929539692
##
## $Americas
## [1] 898871184
##
## $Oceania
## [1] 24549947
Here you see that sapply
outputs a vector (to which we have added names), rather than the list output by lapply
.
results <- sapply(unique(gapminder$continent), calc_continent_pop)
names(results) <- unique(gapminder$continent)
results
## Asia Europe Africa Americas Oceania
## 3811953827 586098529 929539692 898871184 24549947
To select a particular year we go back to our original function. We can now use mapply
. Note the rearrangement of the variables.
calc_continent_pop<-function(continent,year) {
sum(gapminder[gapminder$year == year & gapminder$continent == continent,"pop"])
}
results <- mapply(calc_continent_pop,unique(gapminder$continent), 2007)
names(results) <- unique(gapminder$continent)
results
## Asia Europe Africa Americas Oceania
## 3811953827 586098529 929539692 898871184 24549947
Note that the second argument to mapply
may be a list. In that case the sum for the first continent would be calculated for the first item in the list only. What is the following calculating?
results <- mapply(calc_continent_pop,unique(gapminder$continent), c(1952, 1957, 1962, 1967, 1972))
names(results) <- unique(gapminder$continent)
results
## Asia Europe Africa Americas Oceania
## 1395357352 437890351 296516865 480746623 16106100
Write a function that takes the gapminder dataset and gets the mean gdp per capita (you need to weight by the size of the population) for a given year. Apply this to all continents and all years. Hint: first use mapply for a single year, then sapply this function to the list of years.