Due to the shrinking of feature size and reduction in supply voltages, nanoscale circuits have become more susceptible to single event transient (SET) faults. Traditionally, soft errors have been a much greater concern in memories than in logic, due to three factors that prevented logic from becoming more susceptible to soft errors: logical, electrical, and timing (latching-window) masking. Technology scaling decreases the impact of these three masking factors on SETs: in particular, reduction in feature sizes and supply voltages allows lower energy particles to result in SETs, while reduced logic depth and smaller gate delays decrease attenuation and increase in frequency reduces latching-window masking. Our work on reliability modeling, analysis, and optimization in digital systems focuses on symbolic methods and efficient models or mitigation techniques. Our symbolic reliability modeling and analysis framework is capable of characterizing internal gate error impact or output errors susceptibility, for both single and multiple transient faults, while doing so at 5000X speedup compared to detailed HSPICE simulation and with less than 8% accuracy loss. The impact of uncertainty due to process variations or post-layout delay increase is also handled. Our results show that joint modeling of the three masking factors is key to the high accuracy level achieved, underscoring the need for their simultaneous modeling. To address the increased impact of soft errors in digital systems, we propose several mitigation techniques that can theoretically achieve a cumulative 95% reduction in error rate. Specifically, our results show that using Redundancy Addition and Removal (RAR), Selective Voltage Scaling (SVS), and Clock Skew Scheduling (CSS) for soft error mitigation, 15-30% reduction in error rate can be achieved when these techniques are applied individually, with a potential for allowing synergistic usage in a complete flow.
Driven by aggressive technology scaling, a marked increase in the variability of process technology parameters introduces uncertainty in the guaranteed level of performance for most digital systems. In addition, due to increased power density and stricter thermal envelopes, environmental parameter variability increases as well. While a significant body of work exists for characterizing performance and power consumption in the presence of process-driven variability at the interface between physical-gate levels, these effects need to be modeled at higher levels of abstraction as well. In support of a complete probabilistic design flow, high-level modeling of variability effects is needed for determining design choices that are most likely to meet initial design constraints. Our work focuses on modeling the effects of process variation at system level and shows that a design style relying on multiple voltage-frequency islands (VFIs) copes in a more robust manner with the effect of process technology parameter variation. Furthermore, our work details how these effects are manifested in systems implemented using 3D integration, as opposed to just the classic two-dimensional one. Our results show that modeling process variation effects at system level provides a 145X speedup compared to full Monte Carlo simulation. Furthermore, for systems with generic topologies, a VFI design style is shown to potentially double the yield for both latency-constrained and throughput-constrained applications. Finally, 3D systems are shown to be more heavily affected by process variations than their 2D counterparts, when a 3D-specific macromodel for characterizing design variability is employed.
Continuous technology scaling has enabled the integration of hundreds of processing cores on the same silicon substrate, therefore allowing for the concurrent execution of multiple applications on a chip. In such systems, on-chip power consumption represents one of the main bottlenecks in providing increased performance and enhanced capabilities. Increased power consumption results not only in higher on-die temperature and reduced lifetime reliability, but also leads to faster discharge of battery-powered mobile devices. On-chip power management has therefore become a critical component of every step in the many-core design flow, from physical design all the way up to micro-architecture and system-level design. Our work addresses the challenges of dynamic power management of many-core systems by proposing techniques for maintaining appropriate performance levels for applications running on the system, both in the context of turning on/off cores and using selectively power states, or in the context of using Dynamic Voltage Frequency Scaling (DVFS) for enabling a certain performance level at a minimum power. Our proposed power management techniques have shown that the use of Voltage Frequency Islands (VFIs) enables fine grain power management for many-core systems, while allowing for both hardware and software-based solutions for power control. Techniques ranging from fully-distributed to fully-centralized control-theoretic or heuristic approaches show that power consumption can be reduced by a factor of 5-10X, with no impact on performance for a wide variety of applications.