Non-Operating Reliability

Carnegie Mellon University
18-849b Dependable Embedded Systems
Spring 1999

Author: Michael Carchia


Abstract:

Today, large portions of safety critical embedded systems such as automotive electronics or safety equipment spend the majority of their life in the non-operating state. The Non-Operating environment is characterized by parts or systems that are connected to a functioning device where there is a reduction or elimination of the physical and electrical stresses compared with the operating condition. While current literature may focus on the operating reliability of embedded systems, the non-operating state is one that requires attention by system designers. Specifically, the non-operating environment will be explained, various failure mechanisms will be described, and some reliability models outlined. Systems designed for high operating reliability do not necessarily perform well (or at all) after long periods of exposure to the non-operating environment. For proper handling of the non-operating environment, issues relating to non-operating failures need to be taken into consideration from the design stage of the lifecycle. Furthermore, the relevant environmental concerns that need to be taken into consideration depend on the environmental factors associated with each different target environment. To combat this, a physics of failure based approach to the design cycle is mentioned.


Contents:


Introduction

Consider for a moment a missile defense system that may lay inactive in times of peace. The reliability of this system is crucial since its correct operation could literally save the lives of millions. Similarly, the correct operation of a fire alarm system in a city skyscraper has a comparable role in the lives of many. Safety critical embedded systems are everywhere and some of them spend a large portion of their lives in the inactive state. When these systems are needed in action, it is important they work flawlessly. In order for this to become a reality, designers need to consider the effects of the non-operating environment closely and compensate for them early in the design phase.

There is a distinction between dormancy and storage, but for the sake of this discussion we will group them together. This text is meant to give the reader an introductory understanding of non-operating reliability. If the distinction is important to the reader, further information can be found in the references section.

Dormancy is defined as the state in which the equipment is in its normal operational configuration and connected, but not operating. For testing purposes, equipment in the dormant state may be cycled on and off. During dormancy, the electrical stresses normally experienced under operational conditions are usually eliminated or reduced. [Pecht95]

Storage is defined as the state in which the system, subsystem, or component is totally inactive and resides in a storage area. The product may have to be unpacked and connected to a power source to be tested. [Pecht95]

Together, these two conditions form the non-operating state and are quite common in the useful lives of many embedded systems. Harris [1980] has compiled a list of typical values for time spent in dormancy of many different types of equipment. This list, shown below in Figure 1, demonstrates that the non-operating state can make up a considerable portion of the lifetime of a system.

Figure 1. Typical Values for Percentage of Calendar Time For Equipment in the Dormant Condition [Harris, 1980].

DOMESTIC APPLIANCES

- Television Sets

- Kitchen Electrical Appliances

 

75%

97%

CARS

- Personal Use

- Taxis

 

93%

38%

PROFESSIONAL EQUIPMENT

- Personal Calculators

- Small Copying Machine

- Electronic Test Equipment

 

98%

>75%

>90%

INDUSTRIAL EQUIPMENT

- Safety Equipment

- Standby Power

- Valves (most)

- Air Conditioning

- Built-in Test Equipment (MIL)

 

98%

>90%

>75%

50-80%

99%


Key Concepts

Systems shown to be reliable under operating conditions aren’t necessarily going to be reliable after periods of exposure to a non-operating environment. What follows is a description of the non-operating environment, its subtleties, and some failure mechanisms associated with them.

The Non-Operating Environments

A system may be situated in numerous non-operating environments throughout its lifetime. Some of these environments may be of concern due to the possibility of causing harm to a system while others may be of negligible importance. Systems may lay inactive in the field (subject to possible harsh environmental factors) or elsewhere (possibly in route for maintenance). During these times, systems may come into contact with numerous environmental stresses which may be natural (such as adverse weather) or man made (such as mishandling or abuse). The following is an overview taken from Pecht [1995] of some of the possible environments designers should be aware of aside from the field environment.

Failure Mechanisms

Aside from the subtle non-operating environments mentioned previously, one has to be concerned with to what extent the designed system will lay inactive in the target field environment. To approach this, exposure to some of the failure mechanisms is useful so that one knows what breaks, and can go about protecting against system failure. Four main classes of failure mechanisms are outlined, mechanical, electrical, corrosion, and radiation failure mechanisms.


Available tools, techniques, and metrics

Parts that spend a large portion of their life in the dormant state require special attention when doing a reliability analysis. The following are a few methods for assessing and predicting non-operating reliability. However, it is often the case that the models are at best crude and approximate. Furthermore, many rely on field data that may not be existent for your particular application. They are mentioned to give a survey of some techniques available to predict reliability. Afterwards, it is followed by a physics of failure approach to design and reliability assessment.

Many of the above models tend to disregard the details of specific components and group similar parts into the same category. In doing so, the accuracy of the reliability prediction is compromised.
If one knows what can go wrong in a system, then one can design around such faults from the early stages of a project. Thus, a physics-of-failure based approach seems reasonable starting point for finding a method to achieve satisfactory non-operating reliability. Pecht [1995] outlines a physics-based approach with the following steps:
The above approach to design, reliability assessment, testing and screening uses knowledge about the cause of potential failures and circumvents them via robust design and manufacturing practices.


Relationship to other topics

A study of non-operating reliability is an extension of other reliability topics such as traditional reliability. Furthermore, it is peripherally related to subjects such as field data, maintenance as well as many others.

Related Topics:


Conclusions

The following are the key ideas for this topic:


Annotated References

Further Reading


Index of other topics

Home page