Undergrad Research Project - Interpretability

Spring 2018

Umang Bhatt
José Moura
Project description

Recurrent neural networks today beget a robust strength in prediction power in decision-making processes. However, due to a mismatch between prediction objectives (i.e. test set performance) and the real world costs of deployment, there is an unfulfilled demand for interpretability. Laymen render models lacking interpretability effectively useless since it near impossible to follow knowledge extraction of those models. Though there exists no concrete definition of interpretability, it broadly refers to explaining a model in humanly understandable terms: many desiderata for modern systems, like robustness, fairness, and trust, are also commonly grouped with interpretability. As recurrent neural nets become pervasive (i.e. diagnostic systems, speech translation, etc.), there exist few ways to uncover the causes of a good or bad prediction by this system. We are testing techniques (joint model training with HMMs or RuleFit, attention mapping, maximum activation analysis, etc.) to develop a clear-box system that allows a non-expert to understand an RNNs' prediction. Working with MIMIC-III (an electronic health record database), we hope to pair the current high predictive accuracy for a patient’s diagnosis with our new system to tell the patient why the diagnosis was given. Such explainability is essential for mass adoption of machine learning systems.

Return to project list