University of Rhode Island
Department of Biological Sciences
College of the Environment and Life Sciences
BIO 439X/594-14
Big Data Analysis
Spring 2017
Instructor: Dr. Rachel Schwartz
Office Location: CBLS 377
Telephone: 4-5404
Email: rsschwartz@uri.edu
Please note that I will do my best to respond promptly between 9-5 on weekdays.
Office Hours: Friday 12-2pm CBLS 252
Please feel free to come by without an appointment during this time. To talk with me during other times please set up a meeting by email.
Class Days / Time: T/Th 1:30-2:45pm
Classroom: Coastal Institute 117A
Prerequisites: Graduate standing (BIO594) or junior standing (BIO439X) or instructor permission.
Course Description: This course is intended to help you learn how to analyze large datasets correctly and efficiently. We will discuss methods for analysis of big data, and how to getresearch done more efficiently using basic scientific computing skills. This course will consist of limited lecture time and extensive hands-on time. We will cover data management, statistical methods, task automation, and how to make data analysis clear and reproducible. No prior programming experience is required. Although most of the data will be biological orenvironmental, the material learned will be fully translatable to other fields.
Course Goals:
At the end of this course students should be able to...
Student Learning Outcomes:
At the end of this course students enrolled in both courses should be able to...
Students enrolled in BIO594 should also be able to...
Required text
Bioinformatics Data Skills. Vince Buffalo. 2015. O'Reilly Media.
R for Data Science. Hadley Wickham and Garrett Grolemund. 2016. free online at http://r4ds.had.co.nz/
Data Science from Scratch. Joel Grus. 2015. O’Reilly Media. (available free online via the library and Safari Books)
The Practice of Reproducible Research. Justin Kitzes, Daniel Turek, Fatma Deniz (Eds.) 2017. free online at https://www.practicereproducibleresearch.org/ UC Press
A Whirlwind Tour of Python. Jake VanderPlas. 2016. Available free at https://github.com/jakevdp/WhirlwindTourOfPython
Python Data Science Handbook. Jake VanderPlas. available free at https://github.com/jakevdp/PythonDataScienceHandbook
Interesting supplementary reads
Data Science at the Command Line. Jeroen Janssens. 2014. O'Reilly Media.
Python Data Science Handbook. Jake VanderPlas. 2015. O’Reilly Media.
Doing Data Science. Rachel Schutt and Cathy O'Neill. 2013. O'Reilly Media.
Big Data and Social Science: A Practical Guide to Methods and Tools. 2016. Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, and Julia Lane, eds. CRC Press.
On Being a Data Skeptic. Cathy O'Neil. 2013. O'Reilly Media.
Other equipment / material requirements
Mac, Linux, or Windows laptop (not a tablet, Chromebook, etc.) with administrative privileges. Please contact the instructor if you do not have a laptop and purchasing one would be a financial difficulty.
Meeting Schedule
Note: Updated April 3, 2017 - Still subject to change
Week | Topic | Process | Readings | Assignment |
---|---|---|---|---|
1 (1/23) | Introduction / Software installation | Kitzes et al. P1 | ||
Automating repetitive tasks | Intro to the command line | Buffalo Ch. 3 | ||
2 (1/30) | Automating repetitive tasks | Shell scripts | Buffalo Ch. 7 | Commenting a shell script |
3 (2/6) | Big data and answerable questions | Discussion of your research | Buffalo Ch. 1 | |
High performance computing | Intro to the cluster | Buffalo Ch. 4 | ||
4 (2/13) | Sharing, collaboration, and research transparancy | Git | Buffalo Ch. 5 | Share your git repo |
5 (2/20) | Coding for data analysis | Intro to R and RStudio | Buffalo Ch. 8 | |
Data organization and management | R: functions | Buffalo Ch. 2, Kitzes P1 | ||
Data visualization for exploration | R: ggplot2 | Wickham ch.2,3,28 | ||
6 (2/27) | Good data practice | R Markdown | Wickham ch. 27,29,30 | Exploratory analysis |
Analyzing data by category | R: plyr/dplyr | Wickham ch. 5 | ||
7 (3/6) | Tidy Data and Relational data | R: dplyr,tidyr | Wickham ch. 12,13 | |
Testing and Scripting | R - shell | Buffalo ch. 8 | ||
8 (3/20) | Recap | |||
Appropriate analyses and statistics | Foster et al. | |||
9 (3/27) | Automating repetitive tasks | Python: intro | Grus Ch. 2 / Vanderplas: whirlwind | |
Pattern counting | Python: basics | Vanderplas ch. 1 | Pattern counting | |
10 (4/3) | Data structures and complex data | Python: pandas | Vanderplas ch. 2-3 | |
11 (4/10) | Packaging complex data | Python: pandas joins | ||
Making sure it's right | Python: tests | Complete data analysis | ||
12 (4/17) | Review and work on projects | |||
Command line | Python: scripts and tests | |||
13 (4/24) | Visualization | Python: matplotlib | ||
Wrap up |
Assignments and Grading Policy
There will be five assignments during the course of the semester. Each assignment will focus on the skill learned in class during the week and build on skills and concepts learned in previous weeks. These assignments will be initiated in class and are due at the end of the following week.
For each assignment you will earn points. The more points you have earned, the more you have learned. Your grade will be based on the total number of points you accumulate during the semester. Assignments will be worth a total of 500 points. Students in BIO594 will also do a cummulative project involving two of the following: (1) a detailed written proposal on research involving big data, (2) analysis of research data, (3) a ~1000 word paper on your research using data, and/or (4) a poster of your research with data. The project will be worth 500 points.
Grading for BIO594
A 900-1000
A- 880-899
B+ 860-879
B 800-859
B- 780-799
C+ 760-779
C 700-759
C- 680-699
D+ 660-679
D 600-659
F <600
Grading for BIO439X
A 450-500
A- 440-449
B+ 430-439
B 400-429
B- 390-399
C+ 380-389
C 350-379
C- 340-349
D+ 330-339
D 300-329
F <300
Instructor Policies
Any student with a documented disability should contact me as soon as possible so that we may arrange reasonable accommodations. As part of this process, please be in touch with Disability Services for Students Office at 302 Memorial Union, Phone 401-874-2098.
Students are expected to treat faculty and fellow classmates with dignity and respect.
Students are responsible for being familiar with and adhering to the published “Student Code of Conduct” which can be accessed in the University Student Handbook (http://web.uri.edu/studentconduct/student-handbook/).
Students are expected to be honest in all academic work. A student’s name on any written work, quiz or exam shall be regarded as assurance that the work is the result of the student’s own independent thought and study. Work should be stated in the student’s own words, properly attributed to its source. Students have an obligation to know how to quote, paraphrase, summarize, cite and reference the work of others with integrity. The following are examples of academic dishonesty.
Using material, directly or paraphrasing, from published sources (print or electronic) without appropriate citation
Claiming disproportionate credit for work not done independently
Unauthorized possession or access to exams
Unauthorized communication during exams
Unauthorized use of another’s work or preparing work for another student
Taking an exam for another student
Altering or attempting to alter grades
The use of notes or electronic devices to gain an unauthorized advantage during exams
Fabricating or falsifying facts, data or references
Facilitating or aiding another’s academic dishonesty
Submitting the same paper for more than one course without prior approval from the instructors.
Classroom Protocol
Bring your laptop to class every day.
Be kind to others. Remember that people have different backgrounds and experiences with the material. Students who learn the material faster should consider helping others (teaching is an excellent way to improve your understanding of the material).
If you must come in late, please do not disrupt the class. Please ensure all cell phones, pagers, and other electronic devices are set to silent.
Students are expected to come to class. Work on assignments will be done in class. If you are unable to attend due to illness, severe weather, religious holiday, personal or family emergency, or sanctioned University event please let me know as soon as possible to arrange assistance in learning the material to complete the assignment.