UROP Openings

Have a UROP opening you would like to submit?

Please fill out the form.

Submit your UROP opening

Data Engineering for Metabolomics [remote]


Term:

Summer

Department:

20: Biological Engineering

Faculty Supervisor:

Ernest Fraenkel

Faculty email:

fraenkel@mit.edu

Apply by:

June 26, 2020 (but as soon as possible)

Contact:

Swapnil Chhabra, Computational Research Scientist . Currntly reachable via email: chhabras@mit.edu

Project Description

Our lab develops computational and experimental approaches to understand human biology. New experimental methods make it possible to measure cellular changes across the genome, epigenome, proteome, and metabolome. These technologies include genome-wide measurements of transcription, of protein-DNA interactions, of chromatin accessibility, of genetic interactions, and of protein and metabolite modifications. Each data source provides a very narrow view of the cellular changes. By computationally integrating these data we can reconstruct signaling pathways and identify previously unrecognized regulatory mechanisms. As part of the DARPA-MBA program (https://www.darpa.mil/news-events/2019-11-22) our current goal is to tie externally observable physical, behavioral, and cognitive features and traits of highly successful military personnel with measurable elements of their biology to understand and ultimately anticipate how they might perform in various situations over time. The resulting information can then be used to improve training and development at an individual level. This project provides a unique opportunity for a student to join the development of a comprehensive set of data engineering tools to mine, extract, process, store and model biological measurements captured in systematic reviews as well as those available in online repositories. You will work with a team of interdisciplinary researchers to identify relevant data sources and sinks, architect cloud-based or on-site database solutions, build pipelines and develop strategies to integrate external data collected with internal repositories. In addition, you will explore natural language processing frameworks to automate the extraction of clinically relevant concepts from the literature.

Pre-requisites

Interest in computational biology, experience in machine learning and facility with large datasets; understanding of core Data Engineering concepts and experience with SQL/NoSQL databases; experience with Python/R/SQL.