# A Systematic Approach to Modeling and Analysis of Transient Faults in Logic Circuits

Natasa Miskov-Zivanov, Diana Marculescu Department of Electrical and Computer Engineering Carnegie Mellon University {nmiskov,dianam}@ece.cmu.edu

## Abstract

With technology scaling, the occurrence rate of not only single, but also multiple transients resulting from a single hit is increasing. In this work, we consider the effect of these multipleevent transients on the outputs of logic circuits. Our framework allows for the analysis of soft errors in logic circuits, including several aspects: estimation of the effect of both single and multiple transient faults on both combinational and sequential circuits, analysis of the impact of multiple flip-flop upsets in sequential circuits, and analysis of transient behavior of the soft error rate in the cycles following the hit. The proposed framework can be used to estimate the impact of transient faults stemming not only from radiation, but also other physical phenomena. The results obtained using the proposed framework show that output error rates, resulting from multiple-event transient or multiple-bit upsets can vary across different circuits by several orders of magnitude.

## 1. Introduction

The scaling of device feature sizes, operating voltages and design margins raises a great concern about the susceptibility of circuits to *transient faults*, which can be caused by different physical phenomena, such as high-energy particle hits originating from cosmic rays, capacitive coupling, electromagnetic interference, or power transients [10].

Transient faults induced by radiation, also called Single-Event Transients (SETs), are claimed to be a major challenge for future scaling [1] and have thus been examined by many researchers in recent years. An error that results from an SET (glitch or pulse) is most often referred to as *soft error* or a *single-event upset* (SEU). The effect of soft errors is measured by the *soft error rate* (SER) in FITs (*failure-in-time*), which is defined as one failure in  $10^9$  hours.

In the past, soft errors used to be a concern only in memories, thus resulting in widely used Error Correcting Codes (ECC) mitigation techniques. However, these techniques may not longer be very efficient with the current and future technology nodes, due to the more often occurrence of a new failure type, namely Multiple-Bit Upset (MBU). The MBU is defined as several adjacent bit fails, simultaneously induced by a unique particle hit. Up to five such tied up bit fails have been observed in a 130nm SRAM [5].

On the other hand, with the reduction of device dimensions and operating voltage, the impact of radiation in logic circuits is increasing and fast reaching the soft error rates in memories [11]. Therefore, SETs in logic circuits are becoming an important reliability concern for future technology nodes. Furthermore, since the distances between junctions are decreasing with scaling, and the critical charge is reducing, the energy of radiation particles that is required to cause a multiple transient fault is decreasing. The probability that a single high energetic particle affects the output of more than one circuit node (Figure 1) is no longer negligible [8] and if the two (or more) affected nodes belong to different logic gates, multiple transient faults (often referred to as Multiple-Event Transients or METs) can be generated and propagated to logic circuit outputs.

Hence, the evaluation of accurate SER stemming from not only single transients, but also multiple transients becomes mandatory for very deep submicron technologies. Therefore, as described above, the importance of realistic and accurate projection of the SET- and MET-induced SER in logic (combinational and sequential) circuits is crucial to identifying the features needed for future reliable high-performance microprocessors. In this work, we present an efficient and accurate methodology for the evaluation of the impact of both single and multiple transient faults in combinational and sequential circuits. The framework described here allows for unified treatment and probabilistic analysis of important aspects of transient fault propagation in logic circuits. It further provides the analysis of the effects of transient faults in the cycles following the hit as well as the impact of multiple flip-flop upsets.

The rest of this work is organized as follows. In Section 2, we describe previous work on soft error modeling and analysis and briefly outline the contributions of our work. Section 3 provides an overview of the preliminaries of an SET analysis in logic circuits. The proposed approach to modeling and analysis of multiple transients in logic circuits is described in Section 4. In Section 5, we describe the model for multiple flip-flop upsets that is incorporated into our framework, while in Section 6 we present the model for overall error computation. Finally, in Section 7 we show the experimental results obtained using the proposed framework and with Section 8 we conclude our work.

## 2. Related work

Among all transient faults, radiation-induced faults have received most of the attention in recent years, since they are considered as one of the major barriers for future technology scaling [1]. Intensive research has been done so far in the area of modeling, analysis, and protection for radiation-induced transient faults [2],[3],[6],[7],[10],[11]. Since our focus is on *modeling* of these faults and analyzing their effect on *logic circuits*, we give a brief overview of the work related to those aspects of transient faults in the sequel.

#### **2.1.** Single-event transient analysis

One obvious approach to analyze the impact of transient faults is to inject the fault into a given node of the circuit and simulate the circuit for different input vectors in order to find whether the fault propagates. However, this approach becomes intractable for large circuits and large number of inputs, and thus gives way to approximate approaches that use analytical and symbolic methods to evaluate circuit susceptibility to transient faults.

A number of methods have been proposed recently to evaluate the susceptibility of combinational logic circuits to soft errors, among them several symbolic models [6]. An example of such symbolic modeling approach is the one that uses Binary Decision Diagrams (BDDs) and Algebraic Decision Diagrams (ADDs) to model the propagation of transient faults in logic circuits [6]. This model has been shown to be both efficient and accurate, and thus we incorporate its main ideas into our work. In contrast to existing methods for modeling soft error susceptibility of combinational circuits, sequential circuits have received less attention. A transient fault resulting from single particle hit can affect outputs of a sequential circuit during several clock cycles. To consider this effect, the analysis of the propagation of an SET through sequential circuit is necessary for more than one clock cycle. An approach that tackles this issue in an efficient and accurate manner is described in [7].

### **2.2.** Multiple-event transient analysis

The problem of METs has been addressed in the past, but it focused mostly on their effect in memories [4],[5]. Multiple transients in logic received very little attention, until recently [8],[9].

Similar to SETs, previous work that focused on METs in logic circuits used, to the best of our knowledge, only simulation. For example, the authors in [9] used simulation to estimate the sizes of multiple transients resulting from a single nuclear reaction, as well as the impact of different gate input combinations on the output transient current. In case of approaches focusing on the effect of multiple transients on the error rate of the overall circuit, previous work used simulations at either device [8] or circuit level [4],[9] and the usual approach was to inject faults at different nodes in the circuit, and then estimate the impact of those faults on circuit outputs for different input combinations.

### 2.3. Paper contribution

With respect to single- and multiple-event transients, the main contributions of this work, when compared to previous work, improve state-of-the-art by allowing for:

- Accurate and efficient modeling and analysis of the impact of both SETs and METs in logic (combinational and sequential) circuits;
- Evaluation of changes in error rates due to SETs and METs following the cycle when the transient fault occurred within the circuit;
- Evaluation of the impact of multiple flip-flop upsets in sequential circuits.



Figure 1. Schematic of a device with oblique incident angle of particle hit causing double transient (r = radius of the particle track).

It is also important to note that the specific parameters related to radiation-induced transients (*e.g.*, particle hit rate, ratio of effective hits) are not directly part of the proposed framework, but instead are included as inputs to the framework. Thus, our framework can be applied to any type of a transient fault, irrespective of its origin.

## **3.** Transient fault model

In this section, we briefly describe the main principles of generation and propagation of single- and multiple-event transients.

## **3.1.** Fault generation

When a high-energy charged particle passes through a semiconductor material, it frees electron-hole pairs along its path as it loses energy. Charge collection generally occurs within a few microns of the junction. The collected charge for the radiation-induced events in silicon can range from one to several hundreds of fCs [3]. The device sensitivity to this excess charge is defined primarily by the node capacitance, operating voltages, the strength of feedback or fanout transistors, all defining the amount of critical charge required to trigger a change in the data state. Critical charge for technology nodes below 90nm decreases to 10fC [11].

When an energetic particle hits a device at an oblique angle, there is a small, but non-zero probability of disturbing more than one sensitive junction, as shown in Figure 1. The larger the particle track and the closer the junctions are, the larger the probability is for upsetting more than one junction [4]. The most probable location of the occurrence of multiple transients is at the outputs of neighboring gates. In other words, a gate and its fanin or fanout neighbors are possible candidates for MET generation. Another possibility would be gates that have a common fanin or fanout neighbor.

There are several factors that need to be considered when modeling multiple-event transients. First, the exact relationship between, for example, two METs generated by an energetic particle hitting two junctions, and the same particle affecting only one junction cannot be determined in a straightforward manner. For example, given the particle hit, it is necessary to know how the charge collected by a single junction,  $Q_{coll}$ , compares to the charge induced by the same energetic particle, but collected by two or more junctions  $(Q_{coll,l}, Q_{coll,2},...,Q_{coll,n})$ . One possible assumption is that:

$$Q_{coll} \ge \sum_{i=1}^{n} Q_{coll,i} \tag{1}$$

where the inequality stems from the fact that the charge spread across several nodes may result in less overall charge being collected. However, the exact relationship between the charge collected by a single node,  $Q_{coll}$  and the sum of the charges collected by multiple nodes is also affected by the incident particle angle and the collection capacity of nodes.

Next, the multiple transients that can result from a single hit are not necessarily uniform and can be of different sizes. Even if we assume *equality* in equation (3), the relationship between resulting glitch sizes is not the same as relationship between the collected charges, *i.e.*:

$$D_{SET} \neq \sum_{i}^{n} D_{MET,i}$$
<sup>(2)</sup>



**Figure 2.** HSPICE simulation results for a NAND gate with different load and different collected charge combinations: a) FO1, first glitch 10fC b) FO1, first glitch 20fC, c) FO2, first glitch 40fC, d) first gate FO1, first glitch 20fC and second gate FO2.

where  $D_{SET}$  is the duration of the glitch resulting from  $Q_{coll}$  and  $D_{MET,i}$  is the duration of the glitch in MET resulting from  $Q_{coll\,i}$ . One possible option is to use HSPICE simulations to determine this relationship in order to be able to compare the impact of an SET to the impact of corresponding METs on the overall circuit reliability. We conducted HSPICE simulations of different gates, in order to determine the sizes of glitches resulting from a given collected charge. We show the results for a NAND gate in Figure 2, assuming different gate load values (fan-out-of-1, FO1, and fan-out-of-2, FO2) and different collected charge (from 10 to 200fC). The original curve in Figure 2 represents the duration of the glitch resulting from a given collected charge, while the sum curve represents the sum of glitch durations, assuming one glitch results from a fixed charge: (a) 10fC, (b) 20fC, (c) 40fC, and (d) 20fC. The collected charge for the second glitch is varied, starting with the same value as for the first glitch and increasing until their sum reaches 200fC. Also, in Figure 2(d), one gate is assumed to be a FO1 and the other one is FO2. As it can be seen from Figure 2, for a FO1 gate, the sum of the glitch sizes exceeds the single glitch size, when they result from the same overall collected charge, while it is smaller than the single pulse size for the FO2 gate for smaller collected charge values. For larger collected charge values, the curve sum converges to the curve original, as it can be seen in the figure.

Finally, when considering a specific kind of a transient fault, that is, a transient fault induced by a specific event (*e.g.*, cosmic rays, capacitive coupling, electromagnetic interference, etc.), it is important to define the range of glitch sizes that can occur due to those events. For example, for a radiation-induced transient fault in 130nm technology, it has been shown that the duration of a glitch lies in the interval from 30 to 300ps [3], with most glitches having the duration between 100 and 250ps [9]. It has been shown that, among multiple-event transient induced errors, 90% are the result of *two* simultaneous glitches [5], and thus we only considered this case in our work.

## **3.2.** Fault propagation

Soft errors used to be a much greater concern in memories than in logic circuits, mostly due to the impact of three important masking factors that affect the propagation of a glitch through combinational circuit [6]:

- *logical masking* occurs if the glitch arrives to the input of a gate when at least one of its other inputs has a controlling value;
- electrical masking can attenuate or even completely mask the glitch that is not large enough compared to the delay of a gate through which the glitch propagates;
- *latching-window masking* occurs when the glitch does not arrive on time at the input of the latch to satisfy its setup and hold time conditions.

With technology scaling, the impact of these masking factors is decreasing, thus leading to the increased *SER* in logic circuits.

By taking into account the joint dependence of the three masking factors on circuit topology and input vectors, a unified treatment of the three masking factors [6] also allows for accurate analysis of reconvergent glitches. Reconvergent glitches occur whenever a pulse originating at a given gate in the circuit propagates on more than one path to another gate. In case of a MET, not only glitches originating from a single gate, but instead all glitches that resulted from the same particle hit are considered reconvergent. There are several possible cases of reconvergent glitches that can occur for both SETs and METs, as shown in [6]. Two glitches that are to be merged may arrive at gate inputs with both controlling or both non-controlling values, or one controlling and the other noncontrolling value. These different cases lead to different output glitches varying in their size and delay, compared to the original glitches.

### **4.** Proposed transient fault analysis framework

In this section, we describe our approach to modeling and analysis of SET and MET generation and their propagation through logic circuits. The pseudo code of our algorithm is given in Figure 3.

## 4.1. Fault generation implementation

There are several possible approaches to the modeling of an SET in terms of the details included in its model description. For example, simple models, like triangular or trapezoidal, include information about glitch duration and amplitude and possibly about the slope. On the other hand, there are approaches that use more accurate models, and consequently need more information about the glitch, like doubleexponential current pulse [2]. However, there is a tradeoff between the accuracy of the glitch model and the time needed for a method based on such a model to estimate the impact of the glitch on circuit outputs. Therefore, in this work we use the former approach that assumes only glitch duration and amplitude as the glitch parameters and, as it has been shown in [6],[7], allows for the analysis of circuit soft-error susceptibility that is within 4% accurate and provides 5000X speedup when compared to detailed HSPICE simulations.

A practical implementation for SET modeling was described in [6] where a topologically sorted list of gates for a given circuit is generated first, and then, in one pass through the circuit, all possible glitches that can occur in the circuit are created and propagated to the primary outputs. In case of a MET, it is also necessary to determine sets of gates that can be affected by a single particle hit, that is, gates where glitches determined by a single MET set originate. While in general the potential victims of a particle hit can be best determined by

| main {                                               |
|------------------------------------------------------|
| set technology parameters;                           |
| parse input netlist;                                 |
| create gate node list;                               |
| create topologically sorted gate list (sorted list); |
| for each gate in sorted list {                       |
| mergeGlitches(glitch list);                          |
| maskGlitches; //logical and electrical masking       |
| createNewGlitch; //new glitches originating at gate  |
| sendGlitches; //propagate to output neighbors        |
| }                                                    |
| compute error probabilities from final output ADDs:  |
| )                                                    |

#### Figure 3. The proposed algorithm.

using the layout information, at logic level, in the absence of layout information, the two cases described in Section 3.1 (*i.e.*, a gate and its fanin or fanout neighbors, or two gates with common fanin or fanout neighbors) may be considered as good candidates for analyzing METs. Thus, our framework takes as inputs the gate-level description of the circuit and information about fanin and fanout neighbors of each gate and assumes that METs occur at pairs of gates that are neighbors as defined by the two cases above. This does not affect the generality of our framework, since the layout information can easily be incorporated into the input circuit description.

#### **4.2.** Fault propagation implementation

Since it has been shown in [6] that a unified treatment of the three masking factors (logical, electrical and latchingwindow masking) is important and most of the previous work analyzed those factors independently, we apply here the unified symbolic model proposed in [6].

The main idea of the approach proposed in [6] is that the impact of the three masking factors can be modeled using BDDs and ADDs. When duration and amplitude ADDs representing a glitch originating at a given gate  $G_i$  are created, they are further propagated to the fanout neighbors of gate  $G_i$ , and there they are modified according to logical masking, the delay of those gates, and the attenuation model. This approach can be modified such that it accounts for multiple glitches occurring as a result of a single particle strike.

The important advantage of the proposed model is that it *concurrently* computes the propagation and the impact of single-event transients originating at different internal gates of the circuit. This is made possible by assigning the originating gate identifier to the duration-amplitude ADD pair associated with each glitch. To be able to apply the same concurrent computation to multiple glitch propagation, we need to include the following modifications to the model.

First, instead of assuming the occurrence of only one glitch at a time, it is necessary to keep track of several glitches. This requires the specific information of the list of gates at which glitches occurred to be assigned to all duration-amplitude ADD pairs that correspond to glitches within a given multipleevent transient set. Thus, when compared to the SET case, the propagation of METs requires more information.

The complexity of simple glitch propagation and attenuation functions is not much affected, due to the fact that the number of glitches that need to be considered at one pass through the circuit is 2N, where N is the number of glitches considered in the single-event transient case. However, the number of cases that need to be considered when reconvergent glitches are merged increases, as described next.

Whenever glitches belonging to the same MET set arrive to the inputs of the same gate, the reconvergent glitch merging



merge two glitches according to reconvergent glitch cases;

### Figure 4. Glitch merging algorithm.

algorithm [6] can be applied (Figure 4). The main difference between the SET and MET case, when considering reconvergence, is the fact that, in the case of METs, two glitches to be merged may originate at different gates affected by the same particle hit. A single glitch, created at some gate inside the circuit, can represent a glitch in different MET sets, as long as it is not merged with its coupled (coming from the same MET set) glitch. The information specific for a given MET set of glitches is generated and is not applicable anymore to other MET sets (that include a glitch originating from the same gate as the merged glitch). Once glitches are merged, it is necessary to represent the resulting glitch(es) as new glitch(es) with their specific MET set lists. Therefore, this increases the number of glitches, but at the same time decreases the size of individual glitch MET gate lists.

### **4.3.** Multiple-event upset probability computation

To find the probability that an MET, representing a set of glitches (1,2, ..., n), originating at a set of gates  $(G_1, G_2, ..., G_n)$  is latched at a given output F, all possible values for the duration and amplitude of glitches arriving to the output F are found. For each original particle hit, there may be several corresponding glitches that propagated to the output F from different gates. To this end, we define the following event:

 $\mathcal{E}$  – an event that occurs when any of the glitches originating from one of the gates of the MET set is latched at the output F.

More specifically, we can find, for each output  $F_j$ , the probability of failing due to an MET with initial durations  $\mathbf{d}_{init} = (d_{init,1}, d_{init,2}, ..., d_{init,n})$  and initial amplitudes  $\mathbf{a}_{init} = (a_{init,1}, a_{init,2}, ..., a_{init,n})$  that originated at a given set of gates  $\mathbf{G}^i = (\mathbf{G}_1, \mathbf{G}_2, ..., \mathbf{G}_n)$  as:

$$P(\mathcal{E}_{i,j}^{\mathbf{d}_{init},\mathbf{a}_{init}}) = P(\mathbf{F}_{j} fails \mid \mathbf{G}^{i} fails \cap glitches = (\mathbf{d}_{init}, \mathbf{a}_{init})) \quad (3)$$

Similar to [6], we find the probability of event  $\mathcal{E}$  by summing over all possible glitch durations,  $D_k$  that occur at a given output and result from the propagation of glitches from a given gate MET set:

$$P(\mathcal{E}) = \sum_{k} \frac{D_{k} - (t_{setup} + t_{hold})}{T_{clk} - d_{init}} \cdot P(D = D_{k})$$
<sup>(4)</sup>

where  $T_{clk}$  is the clock period,  $t_{setup}$  and  $t_{hold}$  are the setup and hold time of the latch, respectively, and  $d_{init}$  is the initial duration of the glitch that has duration  $D_k$  at the output.

 TABLE I

 Algorithm runtime for several benchmarks circuits for the three cases:

 OFT
 Item in the image of the image o

| an SET, two simultaneous METs, and two correlated MBUs |     |      |      |      |       |       |       |  |
|--------------------------------------------------------|-----|------|------|------|-------|-------|-------|--|
|                                                        | S27 | S208 | S298 | S444 | S526  | S1196 | S1238 |  |
| SET                                                    | 1   | 8    | 20   | 240  | 420   | 28    | 30    |  |
| MET 2                                                  | 1   | 10   | 23   | 540  | 1020  | 85    | 110   |  |
| MBU 2                                                  | 1   | 2800 | 3600 | 9000 | 20000 | 48    | 50    |  |

We can now compute the *Mean Error Susceptibility (MES)* of a given output  $F_j$ , for a given assumed set of initial glitch durations and amplitudes, ( $d_{inib}$ ,  $a_{init}$ ), as the average probability of output  $F_j$  failing due to all possible MET sets that can occur in the circuit, given different input probability distributions:

$$MES(\mathsf{F}_{j}^{\mathsf{d}_{mit},\mathbf{a}_{init}}) = \frac{\sum_{k=1}^{n_{f}} \sum_{i=1}^{n_{G}} P(\mathcal{E}_{i,j}^{\mathsf{d}_{init},\mathbf{a}_{init}})}{n_{G} \cdot n_{f}}$$
(5)

where  $n_{G_i}$  is the cardinality of the set of MET gate sets of the circuit, {**G**<sup>*i*</sup>} and  $n_f$  is the cardinality of the set of probability distributions, {*f*<sub>k</sub>}, associated to the input vector stream.

## 5. MBU modeling

While previous discussion focused mainly on the occurrence of transient faults and their propagation through combinational part of the circuit, in this section we describe our model for multiple flip-flop upsets in sequential circuits.

The main idea here is to determine the impact on the final computed *SER* when transient faults stemming from a single particle hit affect the state of more than one flip-flop and thus propagate through the circuit as multiple errors in the cycles following the cycle when the hit occurred.

The analysis of sequential circuits can be split into two main stages [7]: Stage I, representing the cycle when the hit occurs and Stage II, representing all the following cycles. In Stage I, it is necessary to include the impact of all three masking factors, while in Stage II, after glitches affected flipflop states, only logical masking is considered. Therefore, when final glitch duration and amplitude ADDs at primary outputs or next state lines are found in Stage I, it is possible to extract the information about the error correlations between different state lines. In other words, it is possible to find the probability of two or more next state lines failing due to an SET at a given gate (or, an MET at a given set of gates). The computation of conditional probabilities in Stage II will assume multiple errors, which allows us to apply the modeling described in Section 4. The average error probability at a given output at Stage II can then be found using the conditional probabilities computed at Stage II, and multiple event (i.e., double, triple, etc.) error probabilities in Stage I:

$$P(\mathsf{F}_{j}^{k,\mathbf{d}_{init},\mathbf{a}_{init}}) = \sum_{l} P(\mathsf{F}_{j}^{k} | \mathsf{F}_{l}^{1,\mathbf{d}_{init},\mathbf{a}_{init}}) \cdot P(\mathsf{F}_{l}^{1,\mathbf{d}_{init},\mathbf{a}_{init}})$$

$$+ \sum_{l_{1}} \sum_{l_{2}} P(\mathsf{F}_{j}^{k} | \mathsf{F}_{l_{1}}^{1,\mathbf{d}_{init},\mathbf{a}_{init}} \cap \mathsf{F}_{l_{2}}^{1,\mathbf{d}_{init},\mathbf{a}_{init}}) \cdot P(\mathsf{F}_{l_{1}}^{1,\mathbf{d}_{init},\mathbf{a}_{init}} \cap \mathsf{F}_{l_{2}}^{1,\mathbf{d}_{init},\mathbf{a}_{init}})$$

$$+ \dots + \sum_{l_{1}} \sum_{l_{2}} \dots \sum_{l_{n_{i}}} P(\mathsf{F}_{j}^{k} | \bigcap_{l_{i}} \mathsf{F}_{l_{i}}^{1,\mathbf{d}_{init},\mathbf{a}_{init}}) \cdot P(\bigcap_{l_{i}} \mathsf{F}_{l_{i}}^{1,\mathbf{d}_{init},\mathbf{a}_{init}})$$

$$(6)$$

where  $P(\mathbf{F}_{j}^{k,\mathbf{d}_{mit},\mathbf{a}_{mit}})$  is the probability of output *j* at the substage *k* failing, given an initial glitch duration and amplitude sets,  $\mathbf{d}_{init}$  and  $\mathbf{a}_{init}$ .  $P(F_{j}^{k} | F_{l}^{1,a_{mit},d_{mit}})$  is the probability of error at the output *j* at the stage *k*, given that an error was latched at the



Figure 5. *SER* for different benchmarks, when two simultaneous METs (MET2, MBU1), two correlated MBUs (MET1, MBU2) or an SET are assumed at Stage I.

state line l after the *first* stage with the probability of error at state line l given by:

$$P(\mathsf{F}_{l}^{1,\mathbf{d}_{init},\mathbf{a}_{init}}) = \frac{\sum_{i=1}^{n_{\mathsf{G}}} P(\mathfrak{E}_{i,l}^{1,\mathbf{d}_{init},\mathbf{a}_{init}})}{n_{\mathsf{G}}}$$
(7)

As described above, we can find the probabilities of multiple state-line errors, similarly to the single error from equation (7) and include those probabilities in equation (6).

### 6. Error computation

The overall probability of output  $F_j$  failing,  $P(F_j)$ , due to different MET sets, with different initial glitch durations and amplitudes and for different input vector probability distributions, can be defined using the *MES* metric. Assuming a uniform distribution of duration-amplitude pairs (d,a) along the surface  $S = (d_{max} - d_{min}) \cdot (a_{max} - a_{min})$ , for individual glitches, by partitioning the surface of each glitch from the MET set into sub-surfaces (as described in [6]), we can find the probability  $P(F_j)$  as the weighted average of an *MES* across all combinations of MET element sub-surfaces:  $P(F_j) =$ 

$$\frac{1}{S_n} \sum_{l_n=1}^{n_{l_n}} \sum_{m_n=1}^{n_{m_n}} \left( \dots \frac{1}{S_1} \sum_{l_1=1}^{n_{l_1}} \sum_{m_1=1}^{n_{m_1}} \left( MES(\mathsf{F}_{j^{(d_{l_1},d_{l_2},\dots,d_{l_n}),(a_{m_1},a_{m_2},\dots,a_{m_n})}) \cdot \Delta d_1 \cdot \Delta a_1 \right) \right)$$
  
$$\cdot \dots \cdot \Delta d_n \cdot \Delta a_n \tag{8}$$

where, for each i=1,...,n:

$$d_{l_i} = d_{\min} + l_i \cdot \Delta d_i$$
 and  $d_{\max} = d_{\min} + n_{l_i} \cdot \Delta d_i$ 

 $a_{m_i} = a_{\min} + m_i \cdot \Delta a_i$  and  $a_{\max} = a_{\min} + n_{m_i} \cdot \Delta a_i$ .

Without any loss of generality, in equation (8), we assume that all MET glitch size combinations have equal probability of occurrence. It is, however, straightforward from the equation (8) that different MET probabilities can be easily included.

One of the important aspects of the proposed modeling framework is that it is independent of the transient fault source as well as the circuit implementation as long as the function of the circuit can be described using BDDs and ADDs. Next, the final ADDs created for individual gate-output pairs are free of any fault-specific information and include only information about circuit topology and technology node parameters. They can further be used to compute output and circuit error probabilities when more specific information about circuit inputs is provided. Finally, the inclusion of transient-fault origin specific parameters as external inputs to the framework allows for computation of error rates.

For example, in case of radiation-induced soft errors, the *SER* can be found by using the output error probabilities from equation (8) as:



**Figure 6.** Changes in error probability in the cycles following particle hit for several benchmarks for 80ps initial glitch, assuming two simultaneous METs (top two charts) or two correlated MBUs (bottom two charts) at Stage I.

$$SER_{F_{i}} = P(\mathsf{F}_{i}) \cdot R_{eff} \cdot R_{PH} \cdot A_{circuit} \tag{9}$$

where  $R_{PH}$  is the particle hit rate per unit of area,  $R_{eff}$  is the fraction of particle hits that result in charge generation, and  $A_{circuit}$  is the total silicon area of the circuit.

As an example of how the proposed framework can be applied to transient faults, we use the soft error specific parameters defined above to compute the error rates, as shown in the next section.

### 7. Experimental results

In this section, we present the results obtained using the proposed framework. The technology used is 70nm, Berkeley Predictive Technology Model [12]. The clock cycle period  $(T_{clk})$  used is 250ps, and setup  $(t_{setup})$  and hold  $(t_{hold})$  times for the latches are assumed to be 10ps each.  $V_{dd}$  is assumed to be 1V. The benchmark circuits are chosen from *ISCAS'89* and *mcnc'91* suites. The proposed framework is implemented in C++, and run on a 3GHz Pentium 4 workstation running Linux.

In TABLE I, we report the runtime of our framework for several benchmarks, and compare the times for three cases: single-event transient (SET), two simultaneous transients stemming from a single hit (MET 2) and two correlated bit upsets resulting from a single fault (MBU 2).

We show in Figure 5 the SER results for different benchmark circuits, assuming double glitches and double flipflop upsets. It is important to note here that the results of previous research have shown double-event transients to be most probable among all multiple-event transients (92% of all multiple transients) [5]. Hence, the simultaneous occurrence rate of three or more transients as a result of a single particle hit may be assumed negligible. The SER values in the case of two METs are obtained by averaging across several glitch size combinations (80ps and 60ps, 80ps and 40ps, 60ps and 40ps). The SER values for two MBUs are averaged across different initial SET glitch sizes (100ps, 80ps, 60ps and 40ps) and the output probability values are obtained by summing over all MBU pair combinations that can occur in Stage I. Note that these MBU results show only a part of the sum in equation (8) and thus the MBU values are smaller than the SET values. As can be seen from the presented results, the impact of multipleevent transients (METs) and multiple-state line errors (MBUs) varies across different circuits. For example, in case of benchmarks *S1196* and *S1238*, including only the impact of two correlated MBUs, results in an underestimation of circuit's soft error rate, while for circuits *S526* and *S208* it is very close to the SET case.

In Figure 6, we show the changes in average output error probability for several benchmarks in cycles following the particle hit, assuming initial glitch duration of 80ps and two simultaneous METs (top two charts) or two correlated MBUs (bottom two charts) in Stage I. The results again show that different circuits behave differently with respect to multiple faults or multiple flip-flop upsets. The probability of error at the output, in the cycles after the particle hit, can follow all three trends: decrease rapidly, remain at about the same level, or increase.

### 8. Conclusion

In this work, a probabilistic symbolic modeling methodology for efficient and accurate estimation of the susceptibility of logic circuits to transient faults is proposed. The main idea behind the proposed work is to allow for the analysis of the susceptibility of individual outputs to errors stemming from single and multiple transient faults. We have demonstrated the efficiency of our method by applying it on a subset of *ISCAS'89* and *mcnc'91* benchmarks of various complexities.

## 9. References

[1] S. Borkar, "Thousand Core Chips – A Technology Perspective," in *Proc. of Design Automation Conference (DAC)*, pp. 746-749, June 2007.

[2] M. R. Choudhury, Q. Zhou and K. Mohanram, "Design optimization for single-event upset robustness using simultaneous dual-VDD and sizing techniques, " in *Proc. of International Conference on Computer Aided Design (ICCAD)*, pp. 204-209, November 2006.

[3] P. E. Dodd, "Physics-Based Simulation of Single-Event Effects," in *IEEE Transactions on Device and Materials Reliability*, Vol. 5, No. 3, pp. 343-357, September 2005.

[4] R. C. Martin and N. M. Ghoniem, "The size effect of ion charge tracks on single-event multiple-bit upset," in *IEEE Transactions on Nuclear Science*, Vol. NS-34, No. 6, pp. 1305-1309, December 1987.

[5] T. Merelle, F. Saigne, B. Sagnes, G. Gasiot, Ph. Roche, T. Carriere, M.-C. Palau, F. Wrobel and J.-M. Palau, "Monte-Carlo Simulations to Quantify Neutron-Induced Multiple Bit Upsets in Advanced SRAMs," in *IEEE Transactions on Nuclear Science*, Vol. 52, No. 5, pp. 1538-1544, October 2005.

[6] N. Miskov-Zivanov, D. Marculescu, "MARS-C: Modeling and Reduction of Soft Errors in Combinational Circuits," in *Proc. of Design Automation Conference (DAC)*, pp. 767-772, July 2006.

[7] N. Miskov-Zivanov, D. Marculescu, "Soft Error Rate Analysis for Sequential Circuits," in *Proc. of Design, Automation and Test in Europe* (*DATE*), pp. 1436-1441, April 2007.

[8] D. Rossi, M. Omana, F. Toma and C. Metra, "Multiple Transient Faults in Logic: An Issue for Next Generation ICs?," in *Proc. of International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT)*, pp. 352-360, October 2005.

[9] C. Rusu, A. Bougerol, L. Anghel, C. Weulerse, N. Buard, S. Benhammadi, N. Renaud, G. Hubert, F. Wrobel and R. Gaillard, "Multiple Event Transient Induced by Nuclear Reactions in CMOS Logic Cells," in *Proc. of International On-Line Testing Symposium (IOLTS)*, pp. 137-145, July 2007.

[10]G. P. Saggese, N. J. Wang, Z. T. Kalbarczyk, S. J. Patel and R. K. Iyer, "An Experimental Study of Soft Errors in Microprocessors," in *IEEE Micro*, Vol. 25, No. 6, pp. 30-39, November 2005.

[11]P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi,, "Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic," in *Proc. of International Conference on Dependable Systems and Networks*, pp. 389-398, 2002.

[12]Berkeley Predictive Technology Model (BPTM): http://wwwdevice.eecs.berkeley.edu/~ptm.