(12/22/14 - This section of the website is a little out of date, but it is being updated. For more details on recent projects, you should check out the Health Information Privacy Laboratory and the Health Data Science Center.)


I am a data scientist. My research is motivated by a belief that, given enough data and computational power, you can learn the secrets of any disease. This is not to say data will save the world, but it can support testing and refinement of novel hypotheses at scale and low cost. To prepare for such a profession, I trained as a molecular biologist and transitioned into computer science. In many respects, life as a data scientist could not be better. Various research funding agencies are heavily investing in "big data" and workforce training. Engineers are inventing sensors to precisely measure phenomena ranging from molecular to environmental to social interactions. At the same time, we can digitize and stockpile all data on the cheap in perpetuity. There is now an opportunity to adopt high performance computing and networking technologies to share, as well as repurpose, data about human research subjects on a broad scale. This can lead to hypothesis testing over massive datasets, enabling detection with statistical significance - even for rare disorders. Moreover, by sharing data beyond traditional scientific communities, we can foster novel analytics and discovery.

I am not, however, a conventional data scientist. I am driven by a concern that our society lacks the infrastructure to make the most of the data we generate. As such, I complemented my education with training in public policy and management to investigate how biology, computer science, and societal affairs can be blended to maximize the potential. However, there are numerous challenges to translating biomedical data into actionable knowledge, including i) a lack of data sharing incentives, ii) concerns over human subjects protections, and iii) beliefs that data collected for one purpose (e.g., healthcare) has limited utility in research. Moreover, these are frequently voiced as facts to justify the status quo. As a scientist, I view these statements as claims based on insufficient evidence, mainly founded on capabilities of existing, rather than future, infrastructure. Thus, I frame these claims as hypotheses and rigorously test their veracity. Often, I disprove such statements via the invention of new methodologies that break down biomedical data sharing barriers.

Current Research Areas and Artifacts: