Machine Learning Fall 2018
 
 
The Sackler Institute of Graduate Biomedical Sciences at NYU School of Medicine
 
Machine Learning Fall 2018 (BMSC-GA 4439 and BMIN-GA 1004)

Course Directors:
David Fenyö (David@FenyoLab.org)
Kasthuri Kannan (Kasthuri.Kannan@nyumc.org)

Teaching Assistant:
Anna Yeaton (Anna.Yeaton@nyumc.org)

Learning objectives

The student will learn and understand the most commonly used machine learning methods.

Course Material

Required Reading:

  • Introduction to Statistical Learning: with Applications in R. James G, Witten D, Hastie T, Tibshirani R. Springer 2013.
  • Applied Predictive Modeling by Max Kuhn & Kjell Johnson, Springer 2013.
Recommended Reading:
  • Pattern Classification, 2nd Edition,Richard O. Duda, Peter E. Hart, David G. Stork, ISBN: 978-0-471-05669-0
  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Hastie T, Tibshirani R, Friedman J. Springer: 2011.
  • Pattern Recognition and Machine Learning (Information Science and Statistics) by Christopher Bishop (Author) ISBN-10: 0387310738

General Policies

Late/missed work: You must adhere to the due dates for all required submissions. If you miss a deadline, then you will not get credit for that assignment/post.

Incompletes: No "Incompletes" will be assigned for this course unless we are at the very end of the course and you have an emergency.

Responding to Messages: I will check e-mails daily during the week, and I will respond to course related questions within 48 hours.

Announcements: I will make announcements throughout the semester by e-mail.

Make sure that your email address is updated; otherwise you may miss important emails from me.

Safeguards: Always back up your work on a safe place (electronic file with a backup is recommended) and make a hard copy. Do not wait for the last minute to do your work. Allow time for deadlines.

Plagiarism: Plagiarism, the presentation of someone else's words or ideas as your own, is a serious offense and will not be tolerated in this class. The first time you plagiarize someone else's work, you will receive a zero for that assignment. The second time you plagiarize, you will fail the course with a notation of academic dishonesty on your official record.

Course Assessment

  • Weekly Problem Sets (50%)
  • Discussions (20%)
  • Final Project (30%)
Lectures

Lecture 1 Course Overview (September 6, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
Lecturer: David Fenyo ( Slides )

Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 1-2
  • Applied Predictive Modeling by Kuhn & Johnson, Chapters 1-4
  • DREAM Challenges

    Additional Reading
  • Coursera: Machine Learning


    Lecture 2 Unsupervised Learning: Clustering (September 10, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Wenke Liu ( Slides )

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 10
  • The Elements of Statistical Learning by Hastie et al. Chapter 14

    Additional Reading
  • Cluster analysis (5ed). Everitt BS, Landau S, Leese M, Stahl D. Wiley: 2011.


    Lecture 3 Unsupervised Learning: Dimension Reduction (September 13, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Wenke Liu ( Slides )
    Tutorial Instructor: Anna Yeaton


    Lecture 4 Student Project Plan Presentation (September 17, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)


    Lecture 5 Supervised Learning: Regression (September 20, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Wilson McKerrow ( Slides )
    Tutorial Instructor: Wilson McKerrow ( RegressionExamples.R )

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 3
  • Applied Predictive Modeling by Kuhn & Johnson, Chapters 5-6

    Additional Reading
  • The Elements of Statistical Learning by Hastie et al. Chapter 3


    Lecture 6 Supervised Learning: Classification (September 24, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Kasthuri Kannan ( Slides )

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 4
  • Applied Predictive Modeling by Kuhn & Johnson, Chapters 11-12

    Additional Reading
  • The Elements of Statistical Learning by Hastie et al. Chapter 4


    Lecture 7 Supervised Learning: Performance Estimation & Regularization (October 1, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Kasthuri Kannan ( Video , Slides )
    Tutorial Instructor: Anna Yeaton

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapters 5 & 6

    Additional Reading
  • The Elements of Statistical Learning by Hastie et al. Chapter 7


    Lecture 8 Tree-Based Methods (October 4, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Kasthuri Kannan ( Video , Slides )

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 8
  • Applied Predictive Modeling by Kuhn & Johnson, Chapters 8 & 14
  • Carter H, Chen S, Isik L, et al. Cancer-specific High-throughput Annotation of Somatic Mutations: computational prediction of driver missense mutations. Cancer research. 2009
  • Waks Z, Weissbrod O, Carmeli B, Norel R, Utro F, Goldschmidt Y. Driver gene classification reveals a substantial overrepresentation of tumor suppressors among very large chromatin-regulating proteins. Scientific Reports. 2016


    Lecture 9 Support Vector Machines (October 8, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Kasthuri Kannan ( Video , Slides )
    Tutorial Instructor: Anna Yeaton

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 9
  • Hyeran Byun and Seong-Whan Lee, Applications of Support Vector Machines for Pattern Recognition: A Survey, SVM 2002, LNCS 2388, pp. 213-236, 2002.
  • Mao Y, Chen H, Liang H, Meric-Bernstam F, Mills GB, Chen K. CanDrA: Cancer-Specific Driver Missense Mutation Annotation with Optimized Features. Adamovic T, ed. PLoS ONE. 2013

    Additional Reading
  • The Elements of Statistical Learning by Hastie et al. Chapter 10


    Lecture 10 Student Project Exploratory Data Analysis Presentation (October 11, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)


    Lecture 11 Neural Networks (October 22, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: David Fenyo ( Slides )
    Tutorial Instructor: Anna Yeaton

    Reading List
  • The Elements of Statistical Learning by Hastie et al. Chapter 11
  • Alipanahi, Babak, et al. "Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning." Nature biotechnology 33.8 (2015): 831-838.
  • Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
  • Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).

    Additional Reading
  • Neural Networks and Deep Learning by Michael Nielsen


    Lecture 12 Markov Models (October 25, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Kasthuri Kannan ( Video , Slides )
    Tutorial Instructor: Anna Yeaton


    Lecture 13 Feature selection (October 29, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Kasthuri Kannan ( Slides )
    Tutorial Instructor: Anna Yeaton

    Reading List
  • Applied Predictive Modeling by Kuhn & Johnson, Chapters 18-19


    Lecture 14 Machine Learning Applied to Healthcare (November 1, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Narges Razavian ( Slides )


    Lecture 15 Machine Learning Applied to Omics Data (November 8, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Kelly Ruggles

    Reading List
  • Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet. 16 (2015) 321-32.
  • Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 46 (2014) 310-5.


    Lecture 16 Machine Learning Applied to Text Data (November 12, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)
    Lecturer: Yindalon Aphinyanaphongs


    Lecture 17 Student Project Presentation (November 26, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)


    Lecture 18 Student Project Presentation (November 29, 2018 Science Building, 345 East 30th St, Ground Floor, Room G19 5:30pm)