(12/22/14 - This section of the website is a little out of date, but it is being updated. For more details on recent projects, you should check out the Health Information Privacy Laboratory and the Health Data Science Center.)
I am a data scientist. My research is motivated by a belief that, given enough data and computational power, you can learn the secrets of any disease. This is not to say data will save the world, but it can support testing and refinement of novel hypotheses at scale and low cost. To prepare for such a profession, I trained as a molecular biologist and transitioned into computer science. In many respects, life as a data scientist could not be better. Various research funding agencies are heavily investing in "big data" and workforce training. Engineers are inventing sensors to precisely measure phenomena ranging from molecular to environmental to social interactions. At the same time, we can digitize and stockpile all data on the cheap in perpetuity. There is now an opportunity to adopt high performance computing and networking technologies to share, as well as repurpose, data about human research subjects on a broad scale. This can lead to hypothesis testing over massive datasets, enabling detection with statistical significance - even for rare disorders. Moreover, by sharing data beyond traditional scientific communities, we can foster novel analytics and discovery.
I am not, however, a conventional data scientist. I am driven by a concern that our society lacks the infrastructure to make the most of the data we generate. As such, I complemented my education with training in public policy and management to investigate how biology, computer science, and societal affairs can be blended to maximize the potential. However, there are numerous challenges to translating biomedical data into actionable knowledge, including i) a lack of data sharing incentives, ii) concerns over human subjects protections, and iii) beliefs that data collected for one purpose (e.g., healthcare) has limited utility in research. Moreover, these are frequently voiced as facts to justify the status quo. As a scientist, I view these statements as claims based on insufficient evidence, mainly founded on capabilities of existing, rather than future, infrastructure. Thus, I frame these claims as hypotheses and rigorously test their veracity. Often, I disprove such statements via the invention of new methodologies that break down biomedical data sharing barriers.
... Elected to the American Institute for Medical and Biological Engineering (3/1/21)
... Elected to the International Academy of Health Sciences Informatics (8/27/20)
... Distinguished Paper Award at the 2019 AMIA Annual Symposium (12/19/19)
... Best Data Science Paper Award at the 2019 AMIA Informatics Summit (4/4/19)
... Elected to the National Academy of Medicine (10/15/18)
... our Op-Eds on why sharing COVID-19 test results with law enforcement is a problem (5/2020) ... but we must share aggregate counts on infections - especially in schools! (8/2020)
... on legal challenges to genetic data privacy (5/2019)
... on our winning solution to the iDASH Genome Privacy Competition (GenomeWeb story and Vanderbilt story) (12/2016)
... for the U.S. Commission on Evidence-based Policymaking on applications of homomorphic cryptography for statistical computation (2/24/2017)
... National Cancer Institute Cancer Moonshot Seminar (1/27/2021)
... talk at the NIH/OD Office of Data Science Strategy Special Track at the Intelligent Systems for Molecular Biology (ISMB) Conference (7/14/2020)
... keynote at the Translational Data Science Workshop at University of Florida (2/3/2020)
"Is it time for a universal genetic forensic database?"
... Journal of the American Medical Informatics Association:
SynTEG: A Framework for Temporal Structured Electonic Health Data Simulation
... AMIA 2021 Informatics Summit:
Blending Knowledge in Deep Recurrent Networks for Adverse Event Prediction at Hospital Discharge
... Journal of the American Medical Informatics Association:
SCOR: A Secure International Informatics Architecture to Investigate COVID-19