Research


Fault Dictionary Compaction for On-Chip Diagnosis

Matthew Beckler
Matthew Beckler
Shawn Blanton
Shawn Blanton


The impact of manufacturing variations at future technology nodes is expected to have a significant impact on yield, both at the time of manufacturing and later in the field. As a result, chips and systems must be designed with robustness in mind to overcome these challenges. The Stanford-Carnegie-Mellon approach achieves robustness by (i) on-chip testing at runtime, (ii) identifying the location of any faults, and (iii) repairing the affected modules. One way to locate faults is through the use of fault dictionaries. A fault dictionary contains pre-computed circuit responses in the presence of modeled faults, and can be used to identify specific faults affecting the circuit under test through a dictionary-lookup operation. Conventional fault dictionaries require too much memory for on-chip use however, and are limited to locating only simple faults. A new method has been developed for creating highly-compacted fault dictionaries for chip designs with module-level repair capabilities.

Initial experiments use module-level representations of the ISCAS-85 benchmark circuits. Fig. 1 compares fault dictionary sizes from a variety of compaction techniques. Dictionary size is expressed in bits, plotted on a log scale, and normalized against the full-response dictionary size (blue). Faults that do not lie within a module can be eliminated from consideration (green). Due to the focus on module-level fault localization, a number of faults can be eliminated from consideration due to their equivalence with or dominance over other faults in the same module (red). Applying a common fault dictionary compaction technique, a pass/fail dictionary, results in significant overall compaction (orange). Other compaction and dictionary optimization techniques are in development to continue to reduce the dictionary size and improve diagnosis-ability. But even without further enhancements, dictionary size is reduced by more than 100X in many cases.

figure 1

Fig 1. More than 100x reduction in the amount of memory storage is needed for a very conservative failure model.