



## Where Are We Now?

### Where we've been:

• Lots and lots of places

### Where we're going today:

• Loose ends: System resets, robustness, power management, etc.

#### Where we're going next:

- Bluetooth & CAN
- Second exam Wed, 22 April 2015
  - Notes sheet in your own handwriting; NO calculator
  - Same rules and procedures as for Exam #1
- · Final projects hand-ins
  - Last day to demo Wed of final exam week
  - Last day to hand in report Thursday of final exam week

## **Preview** Special interrupts • External interrupt pin • Is SWI maskable? • System traps and NMI System resets Boot Loader • How to get clean system resets • Multi-tasking watchdog strategy Improving system robustness Transient vs. permanent faults ٠ • Timeouts & retries Power management · Power reduction via voltage and clock frequency change • Power reduction via sleeping · Major factors in power consumption and battery drain

• Thermal issues

# **Another Look At Interrupt Masking**

## • We've assumed that the interrupt mask always works

- *Almost* always true, but not always true
- Some interrupts shouldn't be masked for example "System Reset"(!)

Below are "special" interrupts:

### Table 7-1. CPU12 Exception Vector Map

| Vector Address | Source                               |
|----------------|--------------------------------------|
| \$FFFE-\$FFFF  | System Reset                         |
| \$FFFC-\$FFFD  | Clock Monitor Reset                  |
| \$FFFA-\$FFFB  | COP Reset                            |
| \$FFF8-\$FFF9  | Unimplemented Opcode Trap            |
| \$FFF6-\$FFF7  | Software Interrupt Instruction (SWI) |
| \$FFF4\$FFF5   | XIRQ Signal                          |
| \$FFF2-\$FFF3  | IRQ Signal                           |
| *==** *===*    |                                      |

# IRQ – External <u>Interrupt <u>ReQ</u>uest</u>

## IRQ.L and XIRQ.L

- Connected to external pins for requesting interrupts
- Active low better noise margin on TTL high voltage (more noise resistant)
- IRQ.L is maskable
- XIRQ.L is <u>non</u>-maskable (generically, NMI = "Non-Maskable Interrupt")

## • General XIRQ rules:

- <u>Avoid use of non-maskable interrupts for servicing devices!</u>
- Problem is they can re-trigger during ISR, causing stack overflow
- XIRQ mainly useful for waking chip up from low-power "sleep" mode

## • IRQ is external maskable interrupt

- Allows interfacing to I/O devices outside the chip
- All external chips same the same IRQ pin
- But, it's also a shared interrupt vector ...
   ... so software has to poll I/O devices to see which one generated the IRQ

## **Servicing An IRQ**

## • If you get an IRQ, how do you know what caused it?

- On-chip devices go to a different interrupt vector per device
- BUT, only one IRQ pin multiple off-chip devices share the single IRQ vector

### IRQ servicing done via an oxymoron: "Interrupt Polling"

- 1. Interrupt is received via IRQ
- ISR uses I/O commands to read status register from each external device
   Poll each of them did you cause this interrupt? How about you? Or You? ...
- 3. When you find a device that caused the interrupt, execute appropriate ISR
  - Probably done via a subroutine call to appropriate handler from within IRQ ISR
  - IRQ might still be active when you are done due to a different external device!
  - But, that's OK; after RTI it just re-starts the polling loop  $\rightarrow$  go back to step 1
- Which interrupt do you service?
  - Can do prioritized (poll in same order every time this is the usual case)
  - Can do round-robin (remember last interrupt, and start polling from there)
  - In effect, the IRQ service routine is running a task switcher to pick next IRQ source to service; make sure you don't infinite loop if IRQ was a noise glitch 7

## SWI, Trap & Background Mode

#### SWI is non-maskable

- Primary use is for debugging and single-stepping
- If it were maskable, you couldn't single-step through ISRs!
- A fine point this is a big reason why RTI restores interrupt mask via CCR restore instead of just always clearing it with an RTI
  - What would happen if SWI took place during an ISR (for debugging) AND then RTI cleared the interrupt mask instead of restoring it?

### Unimplemented Opcode Trap

- Non-maskable fatal program error
- Don't want an ISR to keep executing if it has executed an illegal opcode!

### BGND instruction (opcode 00 – "Background Mode")

- Variant on the above invokes hardware supported breakpoint debugger
- If your code goes wild and crashes, often you stop at opcode 00
- (Why 00? The value 00 is a very common value in data. So you will often enter breakpoint mode if code goes wild and tries to execute data.)

# How Do You Reset Your System?

#### • Generally, it is important to have a way to reset the system!

- Debate do you have to unscrew the cover plate to reset your system?
  - A reset button on a thermostat tells the customer you don't trust your own software
  - No reset button on a thermostat ticks off the customer when they disassemble to reset it (assuming it is running on remote power and not batteries)

### Common reset methods:

- Hardware reset button connected to reset pin
- Soft reset button(s) (what if the software that reads the button crashes?)
  - PCs typically have an embedded microcontroller running the power system
  - Assume that embedded micro doesn't crash; it can reboot the Intel main CPU
- Remote reset for off-site maintenance support
  - $-\,$  New cable TV/set-top boxes can be reset remotely when you call in for help
- Cycle power; remove batteries; etc.

### Good news: Microsoft has trained people to reboot if there is a crash

• Bad news: if people reboot instead of complaining, you won't know about problems that might get worse in an unusual situation!

| ASSETTE RECEIVE                                                     | ER KS-F150                                                 |
|---------------------------------------------------------------------|------------------------------------------------------------|
| ame time for several seconds.<br>his will reset the built-in micro- | Select) and <b>d/I/ATT</b> (Standby/On/ATT) buttons at the |
| <b>선/I/ATT</b><br>(Standby/On/ATT) -<br>SEL (Select) -              |                                                            |









## **Boot Loader – What Happens At Power On?**

## • In small systems entire program is in flash memory

• Booting is simply init to configure all the I/O etc. and start operation

#### In bigger embedded system with an RTOS at power on...

- There is no operating system loaded, no file system, lots of stuff missing
- Need to load RTOS etc. from a disk or serial flash memory

### Booting (booting up; bootstrapping)

- From "to pull yourself up by the bootstraps"
- Load some initial program just smart enough to load other programs
  - Usually it is already in flash memory waiting to run
  - For example, smart enough to read sector 0 of hard disk or bulk flash ...
    - ... sector 0 of hard disk knows how to read OS image from hard disk
    - ... OS image from hard disk knows how to start user programs
    - ... etc.



http://blogs.independent.co.uk/wpcontent/uploads/2011/05/20a-02.jpg 15

## **Good Practices For System Booting**

• Validate image using secure digital signature (avoid Trojan Horse updates)

#### • Do a system self-test

- RAM test (can you read/write ones and zeros to all bytes?)
- Flash integrity check
  - Compute CRC across flash and check for match to stored CRC value
- Do timers seem to be timing? A/Ds converting? Etc?
- Do external pins seem to be working (can you talk to all external devices?)
- Is system you are controlling healthy and ready to run?
- It's hard to be thorough, but try to do the basics (use a vendor library if available)

#### Put system in known, defined state

- Put all I/O in known, defined state, even if you aren't using it
- · Turn off devices you aren't using to save power
- Start watchdog timer
- Make sure all outputs are in a <u>safe</u> state ("safe" is system dependent)
  - Ideally, HW reset automatically makes all outputs safe!
- Etc.
- Some designers periodically reset hardware to known state in case transient fault flips direction of I/O pin, etc.

|                      |                                                  | 90 to sen-u                                                    | est crit           | ical compo            | onents                        |      |
|----------------------|--------------------------------------------------|----------------------------------------------------------------|--------------------|-----------------------|-------------------------------|------|
| • For exa            | mple, CPU-actuated circuit                       | breaker                                                        |                    | _                     |                               |      |
|                      | -                                                |                                                                |                    |                       |                               |      |
| Group                | Test Components                                  | Component                                                      | Error              | Method                | Definitions as<br>per Annex H | In S |
|                      | CPU registers                                    |                                                                |                    |                       | of IEC 60730                  | Lib  |
| Microcontroller      | CPU program counter                              | 1. CPU                                                         |                    | <b>A</b>              |                               |      |
| specific             | Clock<br>Volatile and non-volatile memories      | 1.1 Register                                                   | Stuck at           | Static memory test    | H. 2.19.6                     | Y    |
| apecine              | Internal addressing (and external memory of any) | 1.3 Program counter                                            | Stuck at           | Logical monitoring of | H.2.18.10.2                   | YI   |
|                      | Internal data path                               |                                                                |                    | the program sequence  |                               |      |
|                      | Interrupt handling                               | <ol><li>Interrupt</li></ol>                                    | No interrupt or    | Time-slot monitoring  | H.2.18.10.4                   | Y    |
|                      | External communication                           |                                                                | too<br>frequent    |                       |                               |      |
|                      | Timing                                           |                                                                | interrupts         |                       |                               |      |
| Application specific | I/O peripherals                                  | 3. Clock                                                       | Wrong<br>frequency | Frequency monitor     | H.2.18.10.1                   | YE   |
|                      | Analog ADC and DAC                               |                                                                |                    |                       |                               |      |
|                      | Analog multiplexer                               | <ol> <li>Memory</li> </ol>                                     |                    |                       |                               |      |
|                      |                                                  | 4.1 Invariable                                                 | All single-bit     | Word protection with  | H.2.19.8.1                    | YI   |
|                      |                                                  | memory                                                         | faults             | multi-bit redundancy  |                               |      |
|                      |                                                  | 4.2 Variable memory                                            | DC fault           | Static memory test    | H.2.19.6                      | Y    |
|                      |                                                  | 4.3 Address                                                    |                    |                       |                               |      |
|                      |                                                  | (Related to 4.1 and                                            |                    |                       |                               |      |
|                      |                                                  | 4.2)                                                           |                    |                       | ļ                             |      |
|                      |                                                  | <ol> <li>Internal data path<br/>(Only with external</li> </ol> | •                  | •                     | · ·                           | N    |
|                      |                                                  | memory)                                                        |                    |                       |                               |      |
|                      |                                                  | 6. External                                                    |                    |                       |                               | -    |
|                      |                                                  | communication                                                  | Hamming            | Word protection with  | H.2.19.4.2                    | N    |
|                      |                                                  | (Not involved in STL)                                          | distance 3         | multi-bit redundancy  |                               |      |
|                      |                                                  | <ol> <li>Input/output<br/>periphery</li> </ol>                 | Function error     | Input comparison      | H.2.18.8                      | Ιv   |
|                      |                                                  | 7.1 Digital I/O                                                | i unclon choi      | Output verification   | H.2.18.12                     | Y Y  |
|                      |                                                  | 7.2 4/D                                                        | Eurotian array     | land comparison       | H.2.18.8                      | Y    |
|                      |                                                  | 7.2 A/D                                                        | Function error     | Input comparison      | H.2.18.8                      | Y    |

# How Much Do You Reboot?

### Small system – reset pin may do hard reset of entire system

### Large system – may have several levels of reboot

- Reset some I/O mechanism and related driver
- Kill a particular task; application has to deal with it
- Kill and automatically restart a task
- Kill and automatically restart the RTOS
- How hard you reboot the system depends on how wrong things have gone
   And how long you can wait before the plant becomes unstable or dangerous

### ♦ Software rejuvenation

- Idea that you periodically reboot system to clean out accumulated errors
   Maybe you kill and restart tasks; maybe the entire system
- Works for software defects such as memory leaks
- But, doesn't solve downright incorrect software that isn't fixed by rebooting







# **Robustness Resets** COP reset, Clock monitor reset, unimplemented opcode reset • Are all there to help reboot system when a problem occurs Simple approach: • When reset occurs, simply do a cold start of the system • Sometimes this works, but sometimes isn't enough! Problem: "yoyo" mode • Reboot only works if the error condition clears itself · If error condition persists, system will keep rebooting • Safety vulnerability in yoyo mode - What if a "dead" system is safe? - And a "live" system is safe? - But control loops are left running open-loop during boot? [Wikipedia] - And, system continually reboots? 22

## **Useful Reset Strategies/Patterns**

### • Reset a limited number of times

- · Permit system to reset a limited number of times within a time interval
- Accomplish by setting a flag for "I was just reset" in EEPROM
- Clear the "I was just reset" flag only after a time delay
- If reset again within time delay, turn off instead of resetting
  - (prevents yoyo mode)
- Alternate: count number of resets total in EEPROM, suicide when too many

### Never reboot

- · Go into shutdown and require maintenance call to restart
- Can be prudent for safety-critical systems, but service calls are expensive!
- Alternate: reset to a very simple "limp along" mode until service call

#### • Important design guidance

- Always log, track, or indicate resets so you know they happened!
- Ask yourself what happens to system during reset process



## **Reset In Reactive Systems**

### Simply resetting software usually isn't enough

- The system is in some physical state
- The physical state isn't changed by the fact the CPU reset!
- What if your car's power steering controller resets to "off" at 100 kph?

## • General strategies to get CPU back in synch with physical system:

### 1. Software collects system state after reset

- Query every sensor in system, figure out what is going on
- Re-prime control loops so ID parts of PID don't bump system outputs

## 2. Software forces system into known state after reset

- Do a "system shutdown" of the plant and restart when software resets
- Can be safer, but also creates system down time!
  - Do you want an entire petro refinery to shut down, ever? (No, you don't.)

25

## 3. Ignore the reset problem

- Only works sometimes ... usually risky!
- This is a bad strategy (even if people do it)



# **Dealing With Transient Faults**

## Most faults are transient

- They occur randomly
- They don't persist they clear all by themselves
   Transient faults are often 10x to 100x more frequent than permanent faults
- Hardware transients radiation causes bit upsets; power supply noise; lightning
- Software transients bad pointers, occasionally missed deadlines, timing jitter
- Phantom interrupt noise on interrupt line but no interrupt really there

## Robustness for transient faults – you might use all of these in a single system

- First strategy try again!
- Second strategy timeout
- Third strategy reboot system to try again
   Perhaps back off one notch on watchdog time interval? (depends on system)
- Fourth strategy realize when trying again isn't working
  - Some faults are *permanent*
- <u>Important note:</u> it is difficult to know a robust system is in trouble unless you make special effort to log and record problems! 27

## **Timeouts & Error Checking** Timeout is a good way to balance transient fault tolerance with detection of permanent faults • Code that is just waiting for an error to cause problems: serial\_byte\_out(b); // what if there is an error? Non-robust code if(!serial\_byte\_out(b)) {log\_error(...;)} • Code robust to transient faults, but vulnerable to permanent faults while(!serial\_byte\_out(b)) {log\_error(...);} Code that handles both transient and permanent faults retry=0; while(!serial\_byte\_out(b)) { log error(...); if (retry++ > 100) {permanent\_fault(...); break;} } 28

# **Remember This? Multitasking Watchdogs**

## • Consider a preemptive tasking system

- Assume there is a watchdog timer (a COP timer)
- kick() restarts the watchdog time at initial value

```
void task0(void) { .. Do stuff..; kick();}
void task1(void) { .. Do stuff..; kick();}
void task2(void) { .. Do stuff..; kick();}
void task3(void) { .. Do stuff..; kick();}
```

• What's wrong with the above approach?

| Better Multi-Tasking Watchdog Approach                                                                                                                                                                                                                       |    |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| <pre>void task0(void) { Do stuff; Alive(0x1);}</pre>                                                                                                                                                                                                         |    |
| <pre>void task1(void) { Do stuff; Alive(0x2);}</pre>                                                                                                                                                                                                         |    |
| <pre>void task2(void) { Do stuff; Alive(0x4);}</pre>                                                                                                                                                                                                         |    |
| <pre>void task3(void) { Do stuff; Alive(0x8);}</pre>                                                                                                                                                                                                         |    |
| <ul> <li>Main idea – each task sets a bit indicating it has run</li> <li>Separate watchdog monitor task kicks watchdog only when every task has reported in</li> <li>Needs to be modified to account for task periods, but this is the basic idea</li> </ul> |    |
| <pre>uint16 watch_flag = 0;<br/>void Alive(uint16 x)<br/>{ DisableInterrupts(); // Why do we need to do this?<br/>watch_flag  = x;<br/>EnableInterrupts();<br/>} // set task's "I'm Alive" bit</pre>                                                         |    |
| <pre>void taskw(void) // run periodically; maybe in scheduler { if (watch_flag == 0x0F) // if all tasks alive         { kick();</pre>                                                                                                                        |    |
|                                                                                                                                                                                                                                                              | 30 |



| Course CPU runs down to 0.25 M                                                                                                                                                                                                                                                            |                                      | -                             |                                |        |      |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|-------------------------------|--------------------------------|--------|------|
| • How much power does that save con                                                                                                                                                                                                                                                       | Ilpareu to                           | 23 WH 12                      | . :                            |        |      |
|                                                                                                                                                                                                                                                                                           |                                      |                               |                                |        |      |
| Table A-4. Oper                                                                                                                                                                                                                                                                           | ating Condit                         | ions                          |                                |        |      |
| Rating                                                                                                                                                                                                                                                                                    | Symbol                               | Min                           | Тур                            | Max    | Unit |
| I/O, Regulator and Analog Supply Voltage                                                                                                                                                                                                                                                  | V <sub>DD5</sub>                     | 2.97                          | 5                              | 5.5    | V    |
| Digital Logic Supply Voltage <sup>(1)</sup>                                                                                                                                                                                                                                               | V <sub>DD</sub>                      | 2.35                          | 2.5                            | 2.75   | V    |
| PLL Supply Voltage <sup>1</sup>                                                                                                                                                                                                                                                           | V <sub>DDPLL</sub>                   | 2.35                          | 2.5                            | 2.75   | V    |
| Voltage Difference V <sub>DDX</sub> to V <sub>DDA</sub>                                                                                                                                                                                                                                   | $\Delta_{VDDX}$                      | -0.1                          | 0                              | 0.1    | V    |
| Voltage Difference V <sub>SSX</sub> to V <sub>SSR</sub> and V <sub>SSA</sub>                                                                                                                                                                                                              | Avssx                                | -0.1                          | 0                              | 0.1    | V    |
| Bus Frequency                                                                                                                                                                                                                                                                             | f <sub>bus</sub> (2)                 | 0.25                          | -                              | 25     | MHz  |
| Operating Junction Temperature Range                                                                                                                                                                                                                                                      | T                                    | -40                           | -                              | 140    | °C   |
| <ol> <li>The device contains an internal voltage regulator to generate<br/>conditions apply when this regulator is disabled and the devi<br/>Using an external regulator, with the internal voltage regulato</li> <li>Some blocks e.g. ATD (conversion) and NVMs (program/era-</li> </ol> | ice is powered to<br>or disabled, an | from an exter<br>external LVR | rnal source.<br>I must be prov | /ided. |      |

## **Course CPU: Low Voltage Operation**

#### External voltage range is 2.97V to 5.5V

- Internal voltage regulator supplies 2.5V to chip
- You can supply 2.5V direct to V<sub>DD</sub> to reduce 5V→2.5V step-down power losses
- Analog conversion assume 5V interface levels
  - LVI triggers when they are impaired due to low voltage

## A.7.1 Voltage Regulator Operating Conditions

| Num | С | Characteristic                                                                                                                                                                 | Symbol                                                                           | Min                          | Тур                          | Max                          | Unit             |
|-----|---|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|------------------------------|------------------------------|------------------------------|------------------|
| 1   | Р | Input Voltages                                                                                                                                                                 | V <sub>VDDR, A</sub>                                                             | 2.97                         | -                            | 5.5                          | V                |
| 3   | Ρ | Output Voltage Core<br>Full Performance Mode                                                                                                                                   | V <sub>DD</sub>                                                                  | 2.35                         | 2.5                          | 2.75                         | v                |
| 4   | Ρ | Low Voltage Interrupt <sup>(1)</sup><br>Assert Level (xL45J mask set)<br>Assert Level (other mask sets)<br>Deassert Level (xL45J mask set)<br>Deassert Level (other mask sets) | V <sub>LVIA</sub><br>V <sub>LVIA</sub><br>V <sub>LVID</sub><br>V <sub>LVID</sub> | 4.30<br>4.00<br>4.42<br>4.15 | 4.53<br>4.37<br>4.65<br>4.52 | 4.77<br>4.66<br>4.89<br>4.77 | V<br>V<br>V<br>V |
| 5   | Ρ | Low Voltage Reset <sup>(2)</sup> , <sup>(3)</sup><br>Assert Level (xL45J mask set)<br>Assert Level (other mask sets)                                                           | V <sub>LVRA</sub>                                                                | 2.25<br>2.25                 | 2.3<br>2.35                  | _                            | v                |
| 7   | с | Power-on Reset <sup>(4)</sup><br>Assert Level<br>Deassert Level                                                                                                                | V <sub>PORA</sub><br>VPORD                                                       | 0.97                         | _                            | 2.05                         | v                |

## **Course CPU: Stopping The Clock To Save Power**

#### Stop instruction: "STOP"

- Pushes CPU information onto stack (same format as any interrupt)
- Full stop: halts all clocks; minimum power consumption mode
- Pseudo-stop: COP (watchdog) keeps running (configure chip for Full or Pseudo)
- BTW: "S" flag in CCR is "ignore STOP opcodes" defaults to "1"
   STOP circumvents watchdog!
  - If you enable STOP, decide if you should switch to Pseudo-Stop enabled as well

#### • Wait instruction: "WAI"

- Pushes CPU information on stack (same format as any interrupt)
- System clock still runs, but nothing happens
- Less power than NOP wait loop (no transistors switching in CPU); more than STOP

#### <u>Both STOP and WAI restart on reset or interrupt</u>

- STOP takes longer to restart because have to wait for oscillator stability
- Usually use unmasked interrupt to avoid masking the wakeup function

#### Reaction of other functional blocks (e.g., timers) to STOP/WAI varies

- Read data sheet carefully to find out all the details (especially Ch. 9 on clocking)
- You can turn off many blocks to save power via software (not automatic in HW)



## Battery capacity depends on charge/discharge rates

- Slow charge & discharge gives better capacity ("1c" = 1 hour discharge rate)
- Effective battery capacity goes down with increasing current
   This means that battery life extension is *more than linear* with power reduction
- This means that battery me extension is <u>more than thear</u> with power reduction



## Thermal Issues: Power Heat • Embedded systems may have thermal constraints: • No fan - Too much weight, power, noise • No air exchange outlet - How do you keep mud out of the air vent? • Temperature limits -temperature of exposed surface matters for human touch - Pain at perhaps 106-108 degrees F - Human skin starts scalding at about 110 degrees - hours of exposure to get burn - Risk gets more acute at 120 degrees - a few minutes exposure to burn in hot water • With a sealed unit, heat transfer is proportional to: • Temperature difference between case temp and outside temp • Surface area of case • Heat transfer efficiency of case (Is it insulated? Is wind blowing across it?) • For wearable computers, this can be a huge power limitation - Part of computer might be insulated or against a 98.6F human body - One solution: put a block of soft wax inside and melt wax during operation! 36

## Lower Power System Summary

#### Static techniques

- Minimize voltage –savings proportional to V<sup>2</sup>
  - Separate digital power supply at lower voltage than analog power supply
- Minimize frequency savings proportional to f
- Disable clock to portions of chip not being used savings proportional to C (# gates)
- Power down portions of chip not being used savings proportional to C (# gates)
- Minimize current drawn battery life gain proportional to I<sup>~1.3</sup>

#### • Dynamic techniques

- Go to sleep or minimize current when not busy (STOP, WAI)
  - Definitely go to sleep if you are computing for 1 second out of every 10 minutes
  - But, might be better to go slow continuously rather than very fast in bursts
  - » Depends on complex tradeoffs if "off" times aren't extremely long due to non-linear Peukert curve
- Dynamic voltage & frequency scaling
  - Increase voltage just enough to allow faster frequency to meet deadlines
- Turn off parts of chip if not being used, even if CPU is running

#### Low power extends battery life AND reduces heat problems

37

## Review

#### Special interrupts

- External interrupt pin
- Is SWI maskable?
- System traps and NMI

#### System resets

- Boot Loader
- How to get clean system resets
- Multi-tasking watchdog strategy

#### Improving system robustness

- Transient vs. permanent faults
- Timeouts & retries

#### Power management

- · Power reduction via voltage and clock frequency change
- Power reduction via sleeping
- · Major factors in power consumption and battery drain
- Thermal issues