Machine Learning Spring 2018
 
 
The Sackler Institute of Graduate Biomedical Sciences at NYU School of Medicine
 
Machine Learning Spring 2018 (BMSC-GA 4439 and BMIN-GA 1004)

Course Directors:
David Fenyö (David@FenyoLab.org)
Kasthuri Kannan (Kasthuri.Kannan@nyumc.org)

Teaching Assistants:
Sofia Nomikou (Sofia.Nomikou@nyumc.org)

Learning objectives

The student will learn and understand the most commonly used machine learning methods.

Course Material

Required Reading:

  • Introduction to Statistical Learning: with Applications in R. James G, Witten D, Hastie T, Tibshirani R. Springer 2013.
Recommended Reading:
  • Pattern Classification, 2nd Edition,Richard O. Duda, Peter E. Hart, David G. Stork, ISBN: 978-0-471-05669-0
  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction.Hastie T, Tibshirani R, Friedman J. Springer: 2011.
  • Pattern Recognition and Machine Learning (Information Science and Statistics) by Christopher Bishop (Author) ISBN-10: 0387310738

General Policies

Late/missed work: You must adhere to the due dates for all required submissions. If you miss a deadline, then you will not get credit for that assignment/post.

Incompletes: No "Incompletes" will be assigned for this course unless we are at the very end of the course and you have an emergency.

Responding to Messages: I will check e-mails daily during the week, and I will respond to course related questions within 48 hours.

Announcements: I will make announcements throughout the semester by e-mail.

Make sure that your email address is updated; otherwise you may miss important emails from me.

Safeguards: Always back up your work on a safe place (electronic file with a backup is recommended) and make a hard copy. Do not wait for the last minute to do your work. Allow time for deadlines.

Plagiarism: Plagiarism, the presentation of someone else's words or ideas as your own, is a serious offense and will not be tolerated in this class. The first time you plagiarize someone else's work, you will receive a zero for that assignment. The second time you plagiarize, you will fail the course with a notation of academic dishonesty on your official record.

Course Assessment

  • Weekly Problem Sets (50%)
  • Discussions (20%)
  • Final Project (30%)
Lectures

Lecture 1 Course Overview (January 22, 2018 Alexandria West 508 5pm)
Lecturer: David Fenyo & Gustavo Stolovitzky ( Slides )

Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 1-2
  • DREAM Challenges

    Additional Reading
  • Coursera: Machine Learning


    Lecture 2 Unsupervised Learning: Clustering (January 25, 2018 Alexandria West 629 5:30pm)
    Lecturer: Wenke Liu ( Slides )
    Tutorial Instructor: Wenke Liu

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 10
  • The Elements of Statistical Learning by Hastie et al. Chapter 14

    Additional Reading
  • Cluster analysis (5ed). Everitt BS, Landau S, Leese M, Stahl D. Wiley: 2011.


    Lecture 3 Unsupervised Learning: Dimension Reduction (January 29, 2018 Alexandria West 508 5pm)
    Lecturer: Wenke Liu ( Slides )
    Tutorial Instructor: Wenke Liu


    Lecture 4 Student Project Plan Presentation (February 1, 2018 Alexandria West 508 5:30pm)


    Lecture 5 Supervised Learning: Regression (February 5, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5pm)
    Lecturer: David Fenyo ( Slides )
    Tutorial Instructor: Sofia Nomikou

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 3

    Additional Reading
  • The Elements of Statistical Learning by Hastie et al. Chapter 3


    Lecture 6 Supervised Learning: Classification (February 12, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5pm)
    Lecturer: David Fenyo ( Slides )
    Tutorial Instructor: Sofia Nomikou

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 4

    Additional Reading
  • The Elements of Statistical Learning by Hastie et al. Chapter 4


    Lecture 7 Supervised Learning: Performance Estimation & Regularization (February 15, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5:30pm)
    Lecturer: Kasthuri Kannan ( Video , Slides )
    Tutorial Instructor: Sofia Nomikou

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapters 5 & 6

    Additional Reading
  • The Elements of Statistical Learning by Hastie et al. Chapter 7


    Lecture 8 Tree-Based Methods (February 26, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5pm)
    Lecturer: Kasthuri Kannan ( Video , Slides )
    Tutorial Instructor: Kasthuri Kannan

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 8
  • Carter H, Chen S, Isik L, et al. Cancer-specific High-throughput Annotation of Somatic Mutations: computational prediction of driver missense mutations. Cancer research. 2009
  • Waks Z, Weissbrod O, Carmeli B, Norel R, Utro F, Goldschmidt Y. Driver gene classification reveals a substantial overrepresentation of tumor suppressors among very large chromatin-regulating proteins. Scientific Reports. 2016


    Lecture 9 Student Project Exploratory Data Analysis Presentation (March 1, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5:30pm)


    Lecture 10 Support Vector Machines (March 5, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5pm)
    Lecturer: Kasthuri Kannan ( Video , Slides )
    Tutorial Instructor: Kasthuri Kannan

    Reading List
  • An Introduction to Statistical Learning by Gareth James et al. Chapter 9
  • Hyeran Byun and Seong-Whan Lee, Applications of Support Vector Machines for Pattern Recognition: A Survey, SVM 2002, LNCS 2388, pp. 213-236, 2002.
  • Mao Y, Chen H, Liang H, Meric-Bernstam F, Mills GB, Chen K. CanDrA: Cancer-Specific Driver Missense Mutation Annotation with Optimized Features. Adamovic T, ed. PLoS ONE. 2013

    Additional Reading
  • The Elements of Statistical Learning by Hastie et al. Chapter 10


    Lecture 11 Neural Networks: Backpropagation (March 19, 2018 Science Building, 345 East 30th St, Ground Floor, Room SB108 5pm)
    Lecturer: David Fenyo ( Slides )
    Tutorial Instructor: Sofia Nomikou

    Reading List
  • The Elements of Statistical Learning by Hastie et al. Chapter 11
  • Alipanahi, Babak, et al. "Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning." Nature biotechnology 33.8 (2015): 831-838.
  • Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
  • Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).

    Additional Reading
  • Neural Networks and Deep Learning by Michael Nielsen


    Lecture 12 Markov Models (March 26, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5pm)
    Lecturer: Kasthuri Kannan ( Video , Slides )
    Tutorial Instructor: Kasthuri Kannan


    Lecture 13 Probabilistic Graphical Models (March 29, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5:30pm)
    Lecturer: Narges Razavian ( Slides )

    Reading List
  • Zhang L, Kim S. Learning gene networks under SNP perturbations using eQTL datasets. PLoS Comput Biol. 2014 Feb 27
  • Dobra A, Hans C, Jones B, Nevins JR, Yao G, West M. Sparse graphical models for exploring gene expression data. Journal of Multivariate Analysis. 2004 Jul 1


    Lecture 14 Machine Learning Applied to Text Data (April 5, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5:30pm)
    Lecturer: Yindalon Aphinyanaphongs


    Lecture 15 Machine Learning Applied to Omics Data (April 9, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5pm)
    Lecturer: Kelly Ruggles ( Slides )

    Reading List
  • Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet. 16 (2015) 321-32.
  • Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 46 (2014) 310-5.


    Lecture 16 Machine Learning Applied to Clinical Data (April 12, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5:30pm)
    Lecturer: Yindalon Aphinyanaphongs


    Lecture 17 Student Project Presentation (May 14, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5pm)


    Lecture 18 Student Project Presentation (May 17, 2018 Science Building, 345 East 30th St, Ground Floor, Room G06 5:30pm)