Introduction to Biostatistics and Bioinformatics Fall 2014
 
 
The Sackler Institute of Graduate Biomedical Sciences at NYU School of Medicine
 
NYUMC Center for Health Informatics and Bioinformatics
 
NYUMC Department of Population Health, Division of Biostatistics
 
NYUMC High Performance Computing Facility
Introduction to Biostatistics and Bioinformatics Fall 2014 (BMSC-GA 4451)

Lecturers:
Yindalon Aphinyanaphongs (yin.a@nyumc.org)
Stuart Brown (Stuart.Brown@nyumc.org)
David Fenyö (David@FenyoLab.org)
Judy Zhong (Judy.Zhong@nyumc.org)

Tutorial Instructors:
Pamela Wu (Pamela.Wu@nyumc.org)
Amanda Ernlund (Amanda.Ernlund@nyumc.org)

Course Overview

The goal for the Introduction to Biostatistics and Bioinformatics course is to provide an introduction to statistics and informatics methods for the analysis of data generated in biomedical research. Practical examples covering both small-scale lab experiments and high-throughput assays will be explored. The course covers a wide range of topics in a short time so the focus will be on the basic concepts, and in the practical programming exercises the students explore these basic concept and common pitfalls. An introduction of basic Python and R programming will be given throughout the course and many exercises will involve programming.

Learning objectives

The student will be introduced to entry-level methods in the biostatistics and bioinformatics.

Course Assessment

  • Readings and participation (10%): Students are required to attend class, to complete reading assignments and to participate in discussions and engage in healthy exchange of ideas. Each student is required to lead at least one reading from the assigned weekly readings. This discussion lead will be graded.
  • Assignments (40%): Programming assignment will be given at the end of each class, and the solutions to these assignments should be e-mailed to Assignments@FenyoLab.org within a week.
  • Exam (40%): There will be one exam in this class and it will cover the entire course material.
Missed Exams and Grade Appeals

Make-up examinations (for final only) will be given under special circumstances. Documentation will be required to verify a student’s claim. If a make-up exam is permitted, a different exam will be written for that student and may have a different format than the regular examination.

The assignments must be turned in on time and no late assignments will be accepted.

If there is a time that you believe that there is a mistake in grading of an assignment/exam, you will have a chance to appeal your exam grade within a week after you receive your grade. If you think this is the case, you must write a note describing the error, attach it to the original exam, and give it to me within a week of the return of your exam. I will review your argument and my initial grading, and then return your exam with a decision to you in a timely manner.

General Policies

  • Late/missed work: You must adhere to the due dates for all required submissions. If you miss a deadline, then you will not get credit for that assignment/post. Try to avoid last minute submissions.
  • Incompletes: No “Incompletes” will be assigned for this course unless we are at the very end of the course and you have an emergency.
  • Responding to Messages: I will check e-mails daily during the week, and I will respond to course related questions within 48 hours.
  • Announcements: I will make announcements throughout the semester by e-mail. Make sure that your email address is updated; otherwise you may miss important emails from me.
  • Safeguards: Always back up your work on a safe place (electronic file with a backup is recommended) and make a hard copy. Do not wait for the last minute to do your work. Allow time for deadlines.
  • Plagiarism: Plagiarism, the presentation of someone else's words or ideas as your own, is a serious offense and will not be tolerated in this class. The first time you plagiarize someone else's work, you will receive a zero for that assignment. The second time you plagiarize, you will fail the course with a notation of academic dishonesty on your official record.
Recommended Readings

Fundamentals of Biostatistics by Bernard Rosner

Understanding Bioinformatics by Marketa Zvelebil and Jeremy O. Baum

Think Python by Allen B. Downey

Lectures

Lecture 1 Exploring Data (September 9, 2014 TRB 120 2pm)
Lecturer: Fenyo ( Video , Slides )
Tutorial Instructor: Wu ( Video , Document , first.py , open.py , open.txt )
Homework (due date: September 18)

Reading List

  • Data visualization: A view of every Points of View column
  • Python Basics for Bioinformatics by Stuart Brown

    Additional Reading
  • The Visual Display of Quantitative Information by Edward R. Tufte
  • Visualize This: The FlowingData Guide to Design, Visualization, and Statistics by Nathan Yau
  • Data Analysis with Open Source Tools by Philipp K. Janert
  • The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data, Facts, and Figures

    Comments:


    Lecture 2 Descriptive Statistics (September 11, 2014 TRB 120 2pm)
    Lecturer: Fenyo ( Video , Slides )

    Reading List
  • Think Stats by Allen B. Downey Chapter 2
  • Importance of being uncertain
  • Visualizing samples with box plots
  • Error bars

    Comments:


    Tutorial Python Programming (September 13, 2014 TRB 120 3pm)
    Tutorial Instructor: Wu


    Lecture 3 Data types and Representations in Molecular Biology (September 16, 2014 TRB 120 2pm)
    Lecturer: Brown ( Video , Slides )
    Tutorial Instructor: Wu ( ecogene.fasta , seq_id.list )
    Homework (due date: September 22)

    Reading List
  • Understanding Bioinformatics Chapter 3
  • What is Bioinformatics
  • Entrez Help

    Comments:


    Lecture 4 Probability (September 18, 2014 TRB 120 2pm)
    Lecturer: Zhong ( Video , Slides )

    Reading List
  • Fundamentals of Biostatistics Chapter 3
  • Think Stats by Allen B. Downey Chapter 5

    Comments:


    Lecture 5 Sequence Alignment Concepts (September 23, 2014 TRB 120 2pm)
    Lecturer: Brown ( Slides )
    Tutorial Instructor: Wu ( Document , Dmel-UniP.fasta , fly_test.fasta )
    Homework (due date: September 29)

    Reading List
  • Understanding Bioinformatics Chapters 4.1-4.5 and 5.1-5.4
  • Smith Waterman
  • FASTA
  • Emboss dotmatcher

    Comments:


    Lecture 6 Sequence Database Searching (September 25, 2014 TRB 120 2pm)
    Lecturer: Brown ( Slides )

    Reading List
  • BLAST Chapter 4
  • Altshul-BLAST
  • The BLAST Sequence Analysis Tool

    Comments:


    Lecture 7 Distributions (September 30, 2014 TRB 120 2pm)
    Lecturer: Zhong ( Slides )
    Tutorial Instructor: Wu ( Document , central_limit.py , functions.py )
    Homework (due date: October 6)

    Reading List
  • Fundamentals of Biostatistics Chapters 4 & 5
  • Think Stats by Allen B. Downey Chapters 4 and 6

    Comments:


    Lecture 8 Estimation I (October 2, 2014 TRB 120 2pm)
    Lecturer: Zhong ( Video , Slides )

    Reading List
  • Fundamentals of Biostatistics Chapter 6
  • Think Stats by Allen B. Downey Chapter 8

    Comments:


    Lecture 9 Estimation II (October 7, 2014 TRB 120 2pm)
    Lecturer: Zhong ( Video , Slides )
    Tutorial Instructor: Wu ( confInt.py )
    Homework (due date: October 13)

    Reading List
  • Fundamentals of Biostatistics Chapter 6

    Additional Reading
  • Think Bayes by Allen B. Downey

    Comments:


    Lecture 10 Hypothesis Testing I (October 9, 2014 TRB 120 2pm)
    Lecturer: Zhong ( Video , Slides )

    Reading List
  • Fundamentals of Biostatistics Chapter 7 & 8
  • Significance, P values and t-tests
  • Comparing samples - part I
  • Think Stats by Allen B. Downey Chapter 7

    Comments:


    Lecture 11 Hypothesis Testing II (October 14, 2014 TRB 120 2pm)
    Lecturer: Zhong ( Video , Slides )
    Tutorial Instructor: Wu ( hypTesting.py , hyptesting1.png , hyptesting2.png )
    Homework (due date: October 27)

    Reading List
  • Fundamentals of Biostatistics Chater 7 & 8
  • Comparing samples - part II
  • Power and sample size

    Comments:


    Lecture 12 Multiple Sequence Alignment (October 16, 2014 TRB 120 2pm)
    Lecturer: Brown ( Video , Slides )

    Reading List
  • Understanding Bioinformatics Chapter 6.2-6.4
  • CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
  • Using ClustalX for multiple sequence alignment
  • Hidden Markov Models

    Comments:


    Lecture 13 Sequence Motifs (October 21, 2014 TRB 120 2pm)
    Lecturer: Brown ( Video , Slides )
    Tutorial Instructor: Wu ( Slides , script.py , MA0083.1.sites , random_fragments.fasta , srf_chip.fasta )
    Homework (due date: November 3)

    Reading List
  • Understanding Bioinformatics Chapters 4.8-4.10, 6.1, 6.6
  • Sequence Logos
  • Finding Candidate Binding Sites for Known Transcription Factors via Sequence Matching

    Comments:


    Lecture 14 Phylogenetics (October 23, 2014 TRB 120 2pm)
    Lecturer: Brown ( Video , Slides )

    Reading List
  • Understanding Bioinformatics Chapter 7 & 8
  • Building Phylogenetic Trees from Molecular Data with MEGA
  • MEGA - Molecular Evolutionary Genetics Analysis

    Comments:


    Lecture 15 Analysis of Variance (October 28, 2014 TRB 120 2pm)
    Lecturer: Zhong ( Video , Slides )
    Tutorial Instructor: Wu
    Homework (due date: November 10)

    Reading List
  • Fundamentals of Biostatistics Chaper 12

    Comments:


    Lecture 16 Categorical Data Methods (October 30, 2014 TRB 120 2pm)
    Lecturer: Zhong ( Video , Slides )

    Reading List
  • Fundamentals of Biostatistics Chapter 10

    Comments:


    Lecture 17 Non-Parametric Methods (November 4, 2014 TRB 120 2pm)
    Lecturer: Zhong ( Video , Slides )
    Tutorial Instructor: Wu ( Slides )
    Homework (due date: November 17)

    Reading List
  • Fundamentals of Biostatistics Chapter 9
  • Nonparametric tests

    Comments:


    Lecture 18 Regression and Correlation (November 6, 2014 TRB 120 2pm)
    Lecturer: Zhong ( Video , Slides )

    Reading List
  • Fundamentals of Biostatistics Chapter 11

    Comments:


    Lecture 19 Proteomics Informatics (November 11, 2014 TRB 120 2pm)
    Lecturer: Fenyo ( Video , Slides )
    Tutorial Instructor: Wu ( proteomics_no_replicate.py , proteomics_one_replicate.py , proteomics_three_replicates.py , NUP1-more-stringent-wash.mgf , NUP1-less-stringent-wash.mgf , two-sample-three-replicate-comparison.txt )
    Homework (due date: November 24)

    Reading List
  • Mass spectrometric protein identification using the global proteome machine
  • Protein quantitation using mass spectrometry

    Comments:


    Lecture 20 Gene Expression (November 13, 2014 TRB 120 2pm)
    Lecturer: Brown ( Video , Slides )

    Reading List
  • Understanding Bioinformatics Chapters 15.1,16.1-16.5
  • Microarray data analysis: from disarray to consolidation and consensus
  • Using Bioconductor with Microarray Analysis

    Comments:


    Lecture 21 Next Generation Sequencing Informatics I (November 18, 2014 TRB 120 2pm)
    Lecturer: Brown ( Video , Slides )
    Tutorial Instructor: Wu ( Document , R-Intro.doc , ArrayData.zip )
    Homework (due date: December 1)

    Reading List
  • Next Generation Sequencing-ChIPseq
  • Fast and accurate long-read alignment with Burrows–Wheeler transform

    Comments:


    Lecture 22 Next Generation Sequencing Informatics II (November 20, 2014 TRB 120 2pm)
    Lecturer: Brown ( Video , Slides )

    Comments:


    Lecture 23 Sequence Variation (November 25, 2014 TRB 120 2pm)
    Lecturer: Brown ( Video , Slides )
    Tutorial Instructor: Wu ( Chipseq.zip )
    Homework (due date: December 8)

    Comments:


    Lecture 24 Signal Processing (December 2, 2014 Skirball 3rd Floor Seminar Room 2pm)
    Lecturer: Fenyo ( Video , Slides )
    Tutorial Instructor: Wu ( peak-noise-smooth.py , peak-noise-smooth2.py )
    Homework (due date: December 9)

    Comments:


    Lecture 25 Bioimage Informatics (December 4, 2014 TRB 120 2pm)
    Lecturer: Fenyo ( Video , Slides )

    Reading List
  • Introduction to the Quantitative Analysis of Two-Dimensional Fluorescence Microscopy Images for Cell-Based Screening

    Comments:


    Lecture 26 Experimental Design (December 9, 2014 TRB 120 2pm)
    Lecturer: Fenyo ( Video , Slides )
    Tutorial Instructor: Wu ( basic_image_analysis.py , image_for_basic_image_analysis.tif )

    Reading List
  • Designing comparative experiments
  • Analysis of variance and blocking
  • Replication
  • Bias as a threat to the validity of cancer molecular-marker research by David F. Ransohoff, Nat Rev Cancer 5 (2005) 142-149

    Additional Reading
  • Design and Analysis of Experiments by Douglas C. Montgomery
  • Fundamentals of Biostatistics Chaper 13

    Comments:


    Lecture 27 Machine Learning (December 11, 2014 TRB 120 2pm)
    Lecturer: Aphinyanaphongs ( Video , Slides )

    Reading List
  • ROC Graphs: Notes and Practical Considerations for Researchers by Tom Fawcett

    Additional Reading
  • An Introduction to Statistical Learning by Gareth James et el.
  • A Gentle Introduction to Support Vector Machines in Biomedicine: Theory and Methods (Volume 1) by Alexander Statnikov et al.
  • A Gentle Introduction to Support Vector Machines in Biomedicine: Case Studies and Benchmarks (Volume 2) by Alexander Statnikov et al.

    Comments:


    Lecture 28 Modeling and Simulation (December 16, 2014 TRB 120 2pm)
    Lecturer: Fenyo ( Video , Slides )
    Tutorial Instructor: Wu

    Additional Reading
  • Modeling Complex Systems by Nino Boccara
  • Evolutionary Dynamics: Exploring the Equations of Life by Martin A. Nowak

    Comments:


    Exam (December 18, 2014 TRB 120 2pm)

    Comments: