Title: Architecture and System Support for Correct, Reliable Parallel Software

Brandon Lucia

Wednesday, May 7th, 12-1pm
HH-1107

Abstract:

Limits on power-neutral, single-thread performance scaling has caused a shift toward computer architectures that call for increasingly concurrent and parallel software. Parallel code reaps performance and energy benefits in architectures like now-pervasive multi-cores and many domains like servers, mobile devices, and cloud applications require concurrent software. Unfor-tunately, writing correct, reliable concurrent software is extremely difficult. In this talk, I will discuss my research on using architecture and system support to make programs easier to debug and less prone to failure. First, I will present Recon, a new technique for concurrency debugging. Using a simple statistical model, Recon isolates and reconstructs the root cause of failures to help programmers understand their errors. With hardware support, Recon works efficiently even in production. In experiments with real, buggy programs (e.g., MySQL, Apache) we showed Recon reveals bug root causes with few – often 0 – false positives. Second, I will present Aviso, a new technique for avoiding failures in buggy concurrent programs. Aviso traces program events. When an execution fails, Aviso uses its event trace and a statistical model to generate thread schedule constraints that prevent the same failure from occurring in the future. Collections of systems running Aviso work cooperatively to find and share effective constraints. Our experiments with real software show that Aviso decreases failure rates by orders of magnitude with tolerable performance overheads. I will close with a discussion of some computer architecture research challenges for the future. I will key in on my ongoing work at Microsoft Research on specialized, heterogeneous parallel architectures and intermittently powered systems, two important, emerging trends in computer system design.

Bio:

Brandon Lucia is a Researcher at Mi-crosoft Research in Redmond, Wash-ington. Brandon's research focuses on designing new computer archi-tectures and system designs that make computers programmable, re-liable, and efficient. His current work aims to address all these issues with new programming and execution models for intermittently powered devices and heterogeneous comput-er architectures. Brandon's prior and ongoing work primarily deals with concurrency and parallelism. That work develops architecture and sys-tem support for new programming and execution models, new debug-ging techniques, and new failure avoidance mechanisms for concur-rent software.