MICRO logo

Annual IEEE/ACM International Symposium on Microarchitecture

MICRO Test of Time Award

List of Eligible Papers for the 2016 Award

View the 2016 call for nominations.

MICRO 1994

Paper TitleAuthors
Static Branch Frequency and Program Profile AnalysisYoufeng Wu, James R. Larus
Using Branch Handling Hardware to Support Profile-Driven OptimizationThomas M. Conte, Burzin A. Patel, J. Stan Cox
Branch Classification: A New Mechanism for Improving Branch Predictor PerformancePo-Yung Chang, Eric Hao, Tse-Yu Yeh, Yale Patt
Techniques for Compressing Program Address TracesAndrew R. Pleszkun
Height Reduction of Control Recurrences for ILP ProcessorsMichael Schlansker, Vinod Kathail, Sadun Anik
Theoretical Modeling of Superscalar Processor PerformanceDerek B. Noonburg, John P. Shen
Iterative Modulo Scheduling: An Algorithm for Software Pipelining LoopsB. Ramakrishna Rau
Minimum Register Requirements for a Modulo ScheduleAlexandre E. Eichenberger, Edward S. Davidson, Santosh G. Abraham
Minimizing Register Requirements Under Resource-Constrained Rate-Optimal Software PipeliningR. Govindarajan, Erik R. Altman, Guang R. Gao
Software Pipelining with Register Allocation and SpillingJian Wang, Andreas Krall, M. Anton Ertl, Christine Eisenbeis
Reducing Memory Traffic with CRegsPeter Dahl, Matthew O'Keefe
Dynamic Memory Disambiguation for Array ReferencesDavid Bernstein, Doron Cohen, Dror E. Maydan
A Study of Pointer Aliasing for Software Pipelining Using Run-Time DisambiguationBogong Su, Stanley Habib, Wei Zhao, Jian Wang, Youfeng Wu
Data Relocation and Prefetching for Programs with Large Data SetsYoji Yamada, John Gyllenhall, Grant Haab, Wen-mei Hwu
Cache Designs with Partial Address MatchingLishing Liu
Minimizing Branch Misprediction Penalties for Superpipelined ProcessorsChing-Long Su, Alvin M. Despain
Facilitating Superscalar Processing Via a Combined Static/Dynamic Register Renaming SchemeEric Sprangle, Yale Patt
Improving Resource Utilization of the MIPS R8000 Via Post-Scheduling Global Instruction DistributionRaymond Lo, Sun Chan, Fred Chow, Shin-Ming Liu
A Comparison of Two Pipeline OrganizationsMichael Golden, Trevor Mudge
A Fill-Unit Approach to Multiple Instruction IssueManoj Franklin, Mark Smotherman
A High-Performance Microarchitecture with Hardware-Programmable Functional UnitsRahul Razdan, Michael D. Smith
The Anatomy of the Register File in a Multiscalar ProcessorScott E. Breach, T. N. Vijaykumar, Gurindar S. Sohi
Register File Port Requirements of Transport Triggered ArchitecturesJan Hoogerbrugge, Henk Corporaal
The Effects of Predicated Execution On Branch PredictionGary Scott Tyson
Analysis of the Conditional Skip Instructions of the HP Precision ArchitectureJonathan P. Vogel, Bruce K. Holmer
Characterizing the Impact of Predicated Execution On Branch PredictionScott A. Mahlke, Richard E. Hank, Roger A. Bringmann, John C. Gyllenhaal, David M. Gallagher, Wen-mei W. Hwu
The Effect of Speculatively Updating Branch History On Branch Prediction Accuracy, RevisitedEric Hao, Po-Yung Chang, Yale N. Patt

MICRO 1995

Paper TitleAuthors
Performance Issues in Correlated Branch Prediction SchemesNicolas Gloy, Michael D. Smith, Cliff Young
Dynamic Path-Based Branch CorrelationRavi Nair
The Predictability of Branches in LibrariesBrad Calder, Dirk Grunwald, Amitabh Srivastava
The Performance Impact of Incomplete Bypassing in Processor PipelinesPritpal S. Ahuja, Douglas W. Clark, Anne Rogers
Efficient Instruction Scheduling Using Finite State AutomataVasanth Bala, Norman Rubin
Critical Path Reduction for Scalar ProgramsMichael Schlansker, Vinod Kathail
A Limit Study of Local Memory Requirements Using Value Reuse ProfilesAndrew S. Huang, John P. Shen
Zero-Cycle Loads: Microarchitecture Support for Reducing Load LatencyTodd M. Austin, Gurindar S. Sohi
A Modified Approach to Data Cache ManagementGary Tyson, Matthew Farrens, John Matthews, Andrew R. Pleszkun
Petri Net Versus Modulo Scheduling for Software PipeliningVicki H. Allan, U. R. Shah, K. M. Reddy
Modulo Scheduling with Multiple Initiation IntervalsNancy J. Warter-Perez, Noubar Partamian
Spill-Free Parallel Scheduling of Basic BlocksB. Natarajan, M. Schlansker
Improving Instruction-Level Parallelism by Loop Unrolling and Dynamic Memory DisambiguationJack W. Davidson, Sanjay Jinturkar
Self-Regulation of Workload in the Manchester Data-Flow ComputerJohn R. Gurd, David F. Snelling
The M-Machine MulticomputerMarco Fillo, Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich, Whay S. Lee
Region-Based Compilation: An Introduction and MotivationRichard E. Hank, Wen-Mei W. Hwu, B. Ramakrishna Rau
An Experimental Study of Several Cooperative Register Allocation and Instruction Scheduling StrategiesCindy Norris, Lori L. Pollock
Register Allocation for Predicated CodeAlexandre E. Eichenberger, Edward S. Davidson
Partial Resolution in Branch Target BuffersBarry Fagin, Kathryn Russell
A System Level Perspective On Branch Architecture PerformanceBrad Calder, Dirk Grunwald, Joel Emer
Dynamic Rescheduling: A Technique for Object Code Compatibility in VLIW ArchitecturesThomas M. Conte, Sumedh W. Sathaye
Improving CISC Instruction Decoding Performance Using a Fill UnitMark Smotherman, Manoj Franklin
SPAID: Software Prefetching in Pointer- and Call-Intensive EnvironmentsMikko H. Lipasti, William J. Schmidt, Steven R. Kunkel, Robert R. Roediger
An Effective Programmable Prefetch Engine for On-Chip CachesTien-Fu Chen
Cache Miss Heuristics and Preloading Techniques for General-Purpose ProgramsToshihiro Ozawa, Yasunori Kimura, Shin'ichiro Nishizaki
Alternative Implementations of Hybrid Branch PredictorsPo-Ying Chang, Eric Hao, Yale N. Patt
Control Flow Prediction with Tree-Like Subgraphs for Superscalar ProcessorsSimonjit Dutta, Manoj Franklin
The Role of Adaptivity in Two-Level Adaptive Branch PredictionStuart Sechrest, Chih-Chieh Lee, Trevor Mudge
Design of Storage Hierarchy in Multithreaded ArchitecturesLucas Roh, Walid A. Najjar
An Investigation of the Performance of Various Instruction-Issue Buffer TopologiesStéphan Jourdan, Pascal Sainrat, Daniel Litaize
Decoupling Integer Execution in Superscalar ProcessorsSubbarao Palacharla, J. E. Smith
Exploiting Short-Lived Variables in Superscalar ProcessorsLuis A. Lozano, Guang R. Gao
Partitioned Register File for TTAsJohan Janssen, Henk Corporaal
Disjoint Eager Execution: An Optimal Form of Speculative ExecutionAugustus K. Uht, Vijay Sindagi, Kelley Hall
Unrolling-Based Optimizations for Modulo SchedulingDaniel M. Lavery, Wen-Mei W. Hwu
Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo ScheduleAlexandre E. Eichenberger, Edward S. Davidson
Hypernode Reduction Modulo SchedulingJosep Llosa, Mateo Valero, Eduard Ayguadé, Antonio González

MICRO 1996

Paper TitleAuthors
A Persistent Rescheduled-Page Cache for Low Overhead Object Code Compatibility in VLIW ArchitecturesThomas M. Conte, Sumedh W. Sathaye, Sanjeev Banerjia
Integrating a Misprediction Recovery Cache (MRC) Into a Superscalar PipelineJames O. Bondi, Ashwini K. Nanda, Simonjit Dutta
Accurate and Practical Profile-Driven Compilation Using the Profile BufferThomas M. Conte, Kishore N. Menezes, Mary Ann Hirsch
Efficient Path ProfilingThomas Ball, James R. Larus
Profile-Driven Instruction Level Parallel Scheduling with Application to Super BlocksC. Chekuri, R. Johnson, R. Motwani, B. Natarajan, B. R. Rau, M. Schlansker
Speculative Hedge: Regulating Compile-Time Speculation Against Profile VariationsBrian L. Deitrich, Wen-mei W. Hwu
Hot Cold Optimization of Large Windows/NT ApplicationsRobert Cohn, P. Geoffrey Lowney
Java Bytecode to Native Code Translation: The Caffeine Prototype and Preliminary ResultsCheng-Hsueh A. Hsieh, John C. Gyllenhaal, Wen-mei W. Hwu
Analysis Techniques for Predicated CodeRichard Johnson, Michael Schlansker
Global Predicate Analysis and Its Application to Register AllocationDavid M. Gillies, Dz-ching Roy Ju, Richard Johnson, Michael Schlansker
Modulo Scheduling of Loops in Control-Intensive Non-Numeric ProgramsDaniel M. Lavery, Wen-mei W. Hwu
Assigning Confidence to Conditional Branch PredictionsErik Jacobsen, Eric Rotenberg, J. E. Smith
Compiler Synthesized Dynamic Branch PredictionScott Mahlke, Balas Natarajan
Wrong-Path Instruction PrefetchingJim Pierce, Trevor Mudge
Design Decisions Influencing the UltraSPARC's Instruction Fetch ArchitectureRobert Yung
Increasing the Instruction Fetch Rate Via Block-Structured Instruction Set ArchitecturesEric Hao, Po-Yung Chang, Marius Evers, Yale N. Patt
Instruction Fetch Mechanisms for VLIW Architectures with Compressed EncodingsThomas M. Conte, Sanjeev Banerjia, Sergei Y. Larin, Kishore N. Menezes, Sumedh W. Sathaye
Tango: A Hardware-Based Data Prefetching Technique for Superscalar ProcessorsShlomit S. Pinter, Adi Yoaz
Exceeding the Dataflow Limit Via Value PredictionMikko H. Lipasti, John Paul Shen
The Performance Potential of Data Dependence Speculation & CollapsingYiannakis Sazeides, Stamatis Vassiliadis, James E. Smith
Heuristics for Register-Constrained Software PipeliningJosep Llosa, Mateo Valero, Eduard Ayguadé
Software Pipelining Loops with Conditional BranchesMark G. Stoodley, Corinna G. Lee
Combining Loop Transformations Considering Caches and SchedulingMichael E. Wolf, Dror E. Maydan, Ding-Kai Chen
Instruction Scheduling and Executable EditingEric Schnarr, James R. Larus
Instruction Scheduling for the HP PA-8000David A. Dunn, Wei-Chung Hsu
Meld Scheduling: Relaxing Scheduling Constraints Across Region BoundariesSantosh G. Abraham, Vinod Kathail, Brian L. Deitrich
Custom-Fit Processors: Letting Applications Define ArchitecturesJoseph A. Fisher, Paolo Faraboschi, Giuseppe Desoli
Optimization for a Superscalar Out-of-Order MachineAnne M. Holler
Optimization of Machine Descriptions for Efficient UseJohn C. Gyllenhaal, Wen-mei W. Hwu, B. Ramabriohna Rau

MICRO 1997

Paper TitleAuthors
The Bi-Mode Branch PredictorChih-Chieh Lee, I-Cheng K. Chen, Trevor N. Mudge
Path-Based Next Trace PredictionQuinn Jacobson, Eric Rotenberg, James E. Smith
Alternative Fetch and Issue Policies for the Trace Cache Fetch MechanismDaniel Holmes Friendly, Sanjay Jeram Patel, Yale N. Patt
Reducing the Performance Impact of Instruction Cache Misses by Writing Instructions Into the Reservation Stations Out-of-OrderJared Stark, Paul Racunas, Yale N. Patt
On High-Bandwidth Data Cache Design for Multi-Issue ProcessorsJude A. Rivers, Gary S. Tyson, Edward S. Davidson, Todd M. Austin
Run-Time Spatial Locality Detection and OptimizationTeresa L. Johnson, Matthew C. Merten, Wen-Mei W. Hwu
A Comparison of Data Prefetching On an Access Decoupled and Superscalar MachineG. P. Jones, N. P. Topham
The Design and Performance of a Conflict-Avoiding CacheNigel Topham, Antonio González, José González
Prediction Caches for Superscalar ProcessorsJames E. Bennett, Michael J. Flynn
A Framework for Balancing Control Flow and PredicationDavid I. August, Wen-mei W. Hwu, Scott A. Mahlke
Evaluation of Scheduling Techniques On a SPARC-Based VLIW TestbedSeongbae Park, SangMin Shim, Soo-Mook Moon
Tuning Compiler Optimizations for Simultaneous MultithreadingJack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay S. Parekh, Dean M. Tullsen
Exploiting Dead Value InformationMilo M. Martin, Amir Roth, Charles N. Fischer
Trace ProcessorsEric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, Jim Smith
The Multicluster Architecture: Reducing Cycle Time Through PartitioningKeith I. Farkas, Paul Chow, Norman P. Jouppi, Zvonko Vranesic
Out-of-Order Vector ArchitecturesRoger Espasa, Mateo Valero, James E. Smith
Initial Results On the Performance and Cost of Vector MicroprocessorsCorinna G. Lee, Derek J. DeVries
The Filter Cache: An Energy Efficient Memory StructureJohnson Kin, Munish Gupta, William H. Mangione-Smith
Improving Code Density Using Compression TechniquesCharles Lefurgy, Peter Bird, I-Cheng Chen, Trevor Mudge
Procedure Based Program CompressionDarko Kirovski, Johnson Kin, William H. Mangione-Smith
Improving the Accuracy and Performance of Memory Communication Through RenamingGary S. Tyson, Todd M. Austin
Microarchitecture Support for Improving the Performance of Load Target PredictionChung-Ho Chen, Akida Wu
Streamlining Inter-Operation Memory Communication Via Data Dependence PredictionAndreas Moshovos, Gurindar S. Sohi
The Predictability of Data ValuesYiannakis Sazeides, James E. Smith
Value ProfilingBrad Calder, Peter Feller, Alan Eustace
Can Program Profiling Support Value Prediction?Freddy Gabbay, Avi Mendelson
Highly Accurate Data Value Prediction Using Hybrid PredictorsKai Wang, Manoj Franklin
ProfileMe: Hardware Support for Instruction-Level Profiling On Out-of-Order ProcessorsJeffrey Dean, James E. Hicks, Carl A. Waldspurger, William E. Weihl, George Chrysos
Procedure Placement Using Temporal Ordering InformationNikolas Gloy, Trevor Blackwell, Michael D. Smith, Brad Calder
Predicting Data Cache Misses in Non-Numeric Applications Through Correlation ProfilingTodd C. Mowry, Chi-Keung Luk
Available Paralellism in Video ApplicationsHeng Liao, Andrew Wolfe
MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons SystemsChunho Lee, Miodrag Potkonjak, William H. Mangione-Smith
Cache Sensitive Modulo SchedulingF. Jesús Sánchez, Antonio González
Unroll-and-Jam Using Uniformly Generated SetsSteve Carr, Yiping Guan
Resource-Sensitive Profile-Directed Data Flow Analysis for Code OptimizationRajiv Gupta, David A. Berson, Jesse Z. Fang

MICRO 1998

Paper TitleAuthors
A Bandwidth-Efficient Architecture for Media ProcessingScott Rixner, William J. Dally, Ujval J. Kapasi, Brucek Khailany, Abelardo López-Lagunas, Peter R. Mattson, John D. Owens
Exploiting Instruction Level Parallelism in Geometry Processing for Three Dimensional Graphics ApplicationsChia-Lin Yang, Barton Sano, Alvin R. Lebeck
Simple Vector Microprocessors for Multimedia ApplicationsCorinna G. Lee, Mark G. Stoodley
Evaluating MMX Technology Using DSP and Multimedia ApplicationsRavi Bhargava, Lizy K. John, Brian L. Evans, Ramesh Radhakrishnan
Analyzing the Working Set Characteristics of Branch ExecutionSangwook P. Kim, Gary S. Tyson
Dataflow Analysis of Branch Mispredictions and Its Application to Early Resolution of Branch OutcomesAlexandre Farcy, Olivier Temam, Roger Espasa, Toni Juan
The YAGS Branch Prediction SchemeAvinoam N. Eden, Trevor Mudge
Task Selection for a Multiscalar ProcessorT. N. Vijaykumar, Gurindar S. Sohi
Split-Path Enhanced Pipeline Scheduling for Loops with Control FlowsSangMin Shim, Soo-Mook Moon
Effective Cluster Assignment for Modulo SchedulingErik Nystrom, Alexandre E. Eichenberger
Better Global Scheduling Using Path ProfilesCliff Young, Michael D. Smith
Predictive Techniques for Aggressive Load SpeculationGlenn Reinman, Brad Calder
Compiler-Directed Early Load-Address GenerationBen-Chung Cheng, Daniel A. Connors, Wen-mei W. Hwu
Load Latency Tolerance in Dynamically Scheduled ProcessorsSrikanth T. Srinivasan, Alvin R. Lebeck
Improving I/O Performance with a Conditional Store BufferLambert Schaelicke, Al Davis
Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache MicroprocessorsDaniel Holmes Friendly, Sanjay Jeram Patel, Yale N. Patt
Cooperative Prefetching: Compiler and Hardware Support for Effective Instruction Prefetching in Modern ProcessorsChi-Keung Luk, Todd C. Mowry
Code Compression Based on Operand FactorizationGuido Araujo, Paulo Centoducatte, Mario Cartes, Ricardo Pannain
Understanding the Differences Between Value Prediction and Instruction ReuseAvinash Sodani, Gurindar S. Sohi
A Novel Renaming Scheme to Exploit Value Temporal Locality Through Physical Register Reuse and UnificationStephen Jourdan, Ronny Ronen, Michael Bekerman, Bishara Shomar, Adi Yoaz
A Dynamic Multithreading ProcessorHaitham Akkary, Michael A. Driscoll
Widening Resources: A Cost-Effective Technique for Aggressive ILP ArchitecturesDavid López, Josep Llosa, Mateo Valero, Eduard Ayguadé
The Cascaded Predictor: Economical and Adaptive Branch Target PredictionKarel Driesen, Urs Hölzle
Improving Prediction for Procedure Returns with Return-Address-Stack Repair MechanismsKevin Skadron, Pritpal S. Ahuja, Margaret Martonosi, Douglas W. Clark
Predicting Indirect Branches via Data CompressionJohn Kalamatianos, David R. Kaeli
Improving Locality Using Loop and Data Transformations in an Integrated FrameworkMahmut Kandemir, Alok Choudhary, J. Ramanujam, Prithviraj Banerjee
Precise Register Allocation for Irregular ArchitecturesTimothy Kong, Kent D. Wilken
Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File MicroarchitecturesEmre Özer, Sanjeev Banerjia, Thomas M. Conte