Get Data

USGS North American Breeding Bird Survey
shortname: breed-bird-survey
reference: http://www.pwrc.usgs.gov/BBS/
citation: Pardieck, K.L., D.J. Ziolkowski Jr., M.-A.R. Hudson. 2015. North American Breeding Bird Survey Dataset 1966 - 2014, version 2014.0. U.S. Geological Survey, Patuxent Wildlife Research Center
description: A Cooperative effort between the U.S. Geological Survey’s Patuxent Wildlife Research Center and Environment Canada’s Canadian Wildlife Service to monitor the status and trends of North American bird populations.

Obtained using Data Retriever: http://www.data-retriever.org/ For details on data retrieval view the Rmd file.

#information about species
spp <- read.csv("breed_bird_survey_species.csv")

#species count data from surveys
spp_counts_10<-read.csv("breed_bird_survey_counts_10years_wide.csv", check.names=F)

Tidy Data

Our ability to work with data depends on how it is organized. Data that is easily read by humans is not always in the best format to be read by a computer. Consider our survey of birds. When groups go out to do the survey they record the location and date of that survey (route, state, etc.) and then the number of each species they observed. Logically you might enter these data in a single row - i.e. one “observation” per row. No information is repeated. Take a look at spp_counts_10.

Descriptions of the variables:
ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/MetaData-NorthAmericanBreedingBirdSurvey_Dataset_1966-2015_version_2015_1.xml

However, this actually results in multiple observations per row if you consider each species to be an observation. We consider this “wide” format. Our analyses tend to work better in “long” format. To convert between formats we use the tidyverse library, which includes ggplot2, dplyr, and tidyr. We use the gather function in tidyr to convert from wide to long.

library(tidyverse)
spp_counts_10l <- spp_counts_10 %>% gather(`10`:`22860`, key = "aou", value = "speciestotal", na.rm = TRUE)

Now each species is counted separately for each survey. There is only one count per row. Below you will see how this enables better data analysis.

Relational Data

To avoid duplicating data it is helpful to organize it into different tables. If you use databases this is very familiar. Notice the repetition in the following table. Obviously if you know the species you would know the family and order.

aou route year speciestotal species_id english_common_name spanish_common_name sporder family genus species
3120 1 2000 4 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 1 2010 3 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 2 1980 9 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 2 1990 14 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 2 2000 3 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 10 1970 6 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 11 1970 4 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 11 1990 1 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 11 2000 2 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 13 1970 2 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 14 1970 2 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 72 1970 32 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 72 1980 5 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 126 2010 1 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 201 1980 62 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 201 2000 2 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 212 1980 8 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 212 1990 4 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 212 2000 1 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 212 2010 2 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 223 1990 4 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 313 2010 6 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 401 2010 5 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 402 2000 2 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 402 2010 6 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 410 2010 7 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 411 2010 3 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 412 2000 1 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 413 1980 9 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata
3120 413 1990 2 522 Band-tailed Pigeon Patagioenas fasciata Columbiformes Columbidae Patagioenas fasciata

Instead we can have a table including species information separately.

knitr::kable(head(spp %>% select(aou,english_common_name,sporder,family,genus,species),n=30))
aou english_common_name sporder family genus species
1780 Fulvous Whistling-Duck Anseriformes Anatidae Dendrocygna bicolor
1710 Greater White-fronted Goose Anseriformes Anatidae Anser albifrons
1760 Emperor Goose Anseriformes Anatidae Chen canagica
1690 Snow Goose Anseriformes Anatidae Chen caerulescens
1691 (Blue Goose) Snow Goose Anseriformes Anatidae Chen caerulescens (blue form)
1700 Ross’s Goose Anseriformes Anatidae Chen rossii
1730 Brant Anseriformes Anatidae Branta bernicla
1740 (Black Brant) Brant Anseriformes Anatidae Branta bernicla nigricans
1725 Cackling Goose Anseriformes Anatidae Branta hutchinsii
1720 Canada Goose Anseriformes Anatidae Branta canadensis
1782 Mute Swan Anseriformes Anatidae Cygnus olor
1810 Trumpeter Swan Anseriformes Anatidae Cygnus buccinator
1800 Tundra Swan Anseriformes Anatidae Cygnus columbianus
10210 Muscovy Duck Anseriformes Anatidae Cairina moschata
1440 Wood Duck Anseriformes Anatidae Aix sponsa
1350 Gadwall Anseriformes Anatidae Anas strepera
1360 Eurasian Wigeon Anseriformes Anatidae Anas penelope
1370 American Wigeon Anseriformes Anatidae Anas americana
1330 American Black Duck Anseriformes Anatidae Anas rubripes
1320 Mallard Anseriformes Anatidae Anas platyrhynchos
1331 (Mexican Duck) Mallard Anseriformes Anatidae Anas platyrhynchos diazi
1340 Mottled Duck Anseriformes Anatidae Anas fulvigula
1326 hybrid Mallard x Black Duck or Mottled Duck Anseriformes Anatidae Anas platyrhynchos x rubripes or fulvigula
1400 Blue-winged Teal Anseriformes Anatidae Anas discors
1410 Cinnamon Teal Anseriformes Anatidae Anas cyanoptera
1420 Northern Shoveler Anseriformes Anatidae Anas clypeata
1430 Northern Pintail Anseriformes Anatidae Anas acuta
1390 Green-winged Teal Anseriformes Anatidae Anas crecca
1470 Canvasback Anseriformes Anatidae Aythya valisineria
1460 Redhead Anseriformes Anatidae Aythya americana

Another table contains counts for each species (labeled by a species identifier)

knitr::kable(head(spp_counts_10l %>% select(aou,route,year,speciestotal),n=30))
aou route year speciestotal
110 10 36 2000 16
244 10 302 2010 2
263 10 330 2000 6
264 10 330 2010 4
496 10 420 2000 1
497 10 420 2010 1
1550 10 13 1990 111
1558 10 20 2000 1
1579 10 33 1990 1
1607 10 111 1990 5
1638 10 132 2010 5
2256 10 68 2000 4
2265 10 74 2000 1
2266 10 74 2010 2
2292 10 122 2010 11
2474 10 3 1990 1
2475 10 3 2000 2
2482 10 5 2010 1
2493 10 9 1990 7
2508 10 14 2000 1
2509 10 14 2010 3
2528 10 21 1990 2
2529 10 21 2000 1
2530 10 21 2010 2
2584 10 50 1990 16
2626 10 73 1990 1
2651 10 81 1990 10
2659 10 83 2000 1
2808 10 158 1990 6
2809 10 158 2010 4

You have already loaded your data from separate tables.

When we want information by order or family we can join these two tables by a key (in this case the species identifier).

spp_counts_10l$aou <- as.numeric(spp_counts_10l$aou)
jointable <- spp_counts_10l %>% select(aou,route,year,speciestotal) %>% left_join(spp, by = "aou")  %>%   select(-french_common_name)

By joining this information we can look at the data not only by species but by family or order.

ggplot(jointable,aes(x= year, y=speciestotal, color=sporder))+geom_smooth(method=lm,se = FALSE)+ theme(legend.position="none")