UROP Openings

Have a UROP opening you would like to submit?

Please fill out the form.

Submit your UROP opening

Using Machine Learning and Electronic Health Record to Simulate Clinical Trials to Re-purpose Drugs for Unmet Medical Needs




15: Management

Faculty Supervisor:

Row Welsch

Faculty email:


Apply by:

May 1, 2020



Project Description

The aim of the project is to develop and validate methods to repurpose FDA approved drugs, drawing on concepts from statistics, data science, and machine learning, applied to a large electronic health records (EHR) dataset. The specific context of our work will be an effort to repurpose medicines that are currently FDA approved and marketed for certain conditions that can be shown to offer therapeutic value in treating significant unmet medical needs, including Alzheimer's Disease and cancer. We aim to use observational data to construct and compare cohorts of patients in a fashion that emulates clinical trials, effectively conducting “in-silico” or “synthetic trials.” For conducting this work, we are able to access the UK’s Clinical Practice Research Datalink (CPRD) that chronicles some 20 million persons who received primary medical care over a period as long as thirty years; the Explorys EHR database containing records for over 50 million patients; Medicaid and MarketScan insurance claims records of over 100 million patients; and linked claims and EHR data for a subset of approximately 5 million patients. EHRs and claims data offer great potential in CER, however they have many issues that make analyses challenging and complicated. The data sets are large and getting larger; there is a significant amount of missing data; it is high dimensional and the dimensionality is growing rapidly; there are errors and outliers; some patients enter the database and then leave or leave and come back; patients enter the data base at varying times in their life and new patients are always arriving and others leave as they die or move away. Our aims are to develop analytical methods that address these issues and facilitate rigorous comparison of the clinical effectiveness of candidate drugs with a reference therapy for selected medical disorders. We expect to make contributions facilitating development of analytical strategies suitable for application to large numbers of clinical studies at once, without relying extensively on clinical judgment for each analysis.


Interest in medicine or health care and facility with large datasets; experience with SQL and Python or R