UROP Openings

Have a UROP opening you would like to submit?

Please fill out the form.

Submit your UROP opening

Machine learning integration of single-cell data with GWAS to understand disease mechanisms




6: Electrical Engineering and Computer Science

Faculty Supervisor:

Manolis Kellis

Faculty email:


Apply by:

September 11, 2020


Contact via email to sungil@mit.edu with attached resume/CV and short statements of interest and time commitment

Project Description

Understanding human genetic variations offers a great potential in uncovering underlying disease mechanisms by identifying new disease-associated variants and in discovering potential drug targets and therapeutics. In this project, we are leveraging single cell genomics to understand functional consequences of variants and linking to candidate genes by (1) network-based approach to construct cell-type-specific gene networks, (2) machine learning method to link variants to genes, and (3) extracting relevant functional information using NLP. There are great room for creativity and method development using different ML/DL methods, feature engineering, optimization, and learning from noisy labels. From biological point of view, this project provides an excellent opportunity to work with diverse functional and experimental data set. Particular diseases of focus are autoimmune diseases. UROP is going to be mentored by current graduate student Sam Kim with a weekly 1:1 meeting as well as weekly group meetings. Highly motivated UROPs who are enthusiastic about this research, able to self-learn new skills, and put efforts are highly encouraged to contact via email to sungil@mit.edu. Faculty-sponsored funding is available.


CS background; proficiency in python or/and R Knowledge of graph/network algorithms (random walk, clustering, etc) Knowledge of ML and DL with particular emphasis on classification models Knowledge of basic statistics (t-stat, etc) (Desired) experience working with single-cell RNA-seq data and GWAS summary statistics (Desired) Knowledge of cluster/GPU usage to train CNNs