#### **Lecture #8**

# **Memory & Processor Bus**

18-348 Embedded System Engineering Philip Koopman **Monday**, 8-Feb-2016





# **Precision GPS for Agriculture**

- Regular GPS has an accuracy of perhaps 20 meters
  - Works well if you can "snap" your position to the nearest road
  - Not good enough for precision agriculture
    - Want to be within an inch

#### Precision GPS uses augmentation

- Ground stations monitor received GPS signals and broadcast correction
- WAAS only gives 1 meter accuracy
- Private correction service can give 1 inch position accuracy
- Subscription service (how do you charge?)

#### Precision navigation saves money

- Minimal overlap between passes
- Adaptive fertilizer, pesticide, irrigation
- Tractor auto-pilot for poor evening operation and to reduce operator fatigue



# Where Are We Now?

- ♦ Where we've been:
  - Lectures on software techniques
- ♦ Where we're going today:
  - Memory bus (back to hardware for a lecture)
- Where we're going next:
  - Economics / general optimization
  - · Debug & Test
  - · Serial ports
  - Exam #1
    - Scope of coverage is indicated on course web page

3

# **Preview**

- Memory types
  - Different types of memory and general characteristics (RAM, PROM, ...)
  - Interfacing to memory (rows vs. columns)
- **♦** CPU memory bus
  - Connects CPU to memory
  - Connects CPU to I/O
  - DMA direct memory access
  - Practicalities (fanout, etc.)
- ♦ Quick review of memory protection (15-213 material)

.

#### Reminder – the memory bus on a microcontroller Used to transfer data to and from processor • Various types of memory • I/O data as well • Carries: address, data and control signals "Memory" Bus also does I/O Figure 1.1 Computer Bus The basic components of a computer system Processor include processor, Input memory, and I/O. ports External Physical RAM circuits devices [Valvano] Output Output ports ROM Address Data









- ◆ Uses "6T" cell design to reduce power consumption -- static CMOS
  - Used for on-chip RAM and small off-chip RAMs
  - Uses same process technology as CPU logic
  - Faster, less dense, more expensive than DRAM

#### IBM's 6-Transistor Memory Cell



# ◆ DRAM optimized for small size, not speed • Uses different process technology than SRAMs or CPUs - Integrated DRAM + CPU chips can be inefficient to create – more process steps Figure 1: IBM Trench Capacitor Memory Cell Column Address Word Line P+ P+ N-well P- Substrate Note: Not to Scale

# Basics of DRAM Cells [18-240]

#### **♦** The DRAM cell

- Dynamic memory the memory element is not active
- Even with power on, the memory will ... eventually ... forget

#### Memory mechanism is a capacitor

- Charge is stored in it to represent a logic 1
- No charge represents a logic 0
- When you read it, you drain the capacitor must rewrite it
- Real life hits! The capacitor has a leak the logic 1 eventually decays to a logic 0



Data

(inout)

Select



1

# Dram refresh [18-240]

#### **♦** The charge exponentially decays

- The capacitor must be refreshed (recharged), typically every 4 milliseconds
- Every bit of the memory must be refreshed!
- Typically one memory array row is refreshed at a time





# **Multiplexed Addresses [18-240]**

#### ◆ SRAM chips have a pin for every address line

- · Gives fast access, which is what SRAM is all about
- For example, 64K bit x 1 chip has 16 address lines
- For example, 256K bit x 8 (2 Mbit chip) has 18 address pins; 8 data pins

#### **◆ DRAMS** split the address in half (multiplex high and low bits)

- The top 8 bits were the row address
- Then bottom 8 bits selected one column (the column address)
- This organization reduces the DRAM pin count same pins for both Row & Col
  - 8 address bits can be sent at a time, in sequence
  - Only 8 pins and two strobe signals
  - vs. 16 pins and a strobe sigal
  - Also ties in with the internal memory organization







| Timing Diagram Notation                                |               |                                           |                                                           |  |  |  |
|--------------------------------------------------------|---------------|-------------------------------------------|-----------------------------------------------------------|--|--|--|
| Figure 9.18  Nomenclature for drawing timing diagrams. | Symbol        | Input                                     | Output                                                    |  |  |  |
|                                                        |               | The input must be valid                   | The output will be valid                                  |  |  |  |
|                                                        |               | If the input<br>were to fall              | Then the output will fall                                 |  |  |  |
|                                                        |               | If the input were to rise                 | Then the output will rise                                 |  |  |  |
|                                                        | *****         | Don't care,<br>it will work<br>regardless | Don't know, the output value is indeterminate             |  |  |  |
|                                                        | $\rightarrow$ | Nonsense                                  | High impedence,<br>tristate, HiZ,<br>Not driven, floating |  |  |  |
| [18-                                                   | 240]          |                                           | 17                                                        |  |  |  |



#### DRAM Read Cycle [18-240] Sequence of events for reading a memory • Note – it is pretty complex • Usually "small" embedded systems avoid DRAM to keep things simple **Address** row addr col addr Store row latch ras\_l into selected row (like a refresh) cas\_I Dout Load row-address Output Load columnregister (latch), disable dout address register read selected row (latch), outputand store in row enable Dout. latch. we\_I not asserted ras\_l, row address strobe











## Refresh Cycle [18-240]

#### Each 4 ms, every word must be refreshed

- Every  $\sim$ 15 µsec a 256-bit word is refreshed (4ms/256)
- There is an on-chip controller to do this it generates the row address and ras\_l



#### Notes

 More happens in this memory than is easily accountable for with two edges (load register, load latches, write memory)!

Lots of details not shown!

25

# **Non-Volatile RAM Technologies**

- Sometimes memory has to survive a power outage
  - On desktop machines this is (mostly) done by hard disk
  - Many embedded systems don't have magnetic storage (cost, reliability, size)

#### ♦ Battery backed SRAM (fairly rare now that EEPROM is cheap)

- Mold a battery right into the SRAM plastic chip case
- Just as fast & versatile as SRAM
- Typically retains data for 4-7 years (usually limited by battery shelf life)
- · Cost includes both SRAM and a dedicated battery

#### FRAM

- Relatively new technology in the marketplace, but not mainstream (yet)
- Ferroelectric RAM
- Unlimited read/write cycles
- Intended as non-volatile drop-in replacement for SRAM (still expen\$ive)

# **ROM – Read Only Memory**

#### Masked ROM – pattern of bits built permanently into silicon

- Historically the most dense (least expensive) NV memory
- BUT need to change masks to change memory pattern (\$\$\$\$, lead time)
- Every change means building completely new chips!
  - It also means throw the old chips away ... they can't be changed

#### Masked ROM seldom used in low-end embedded systems

- Too expensive to make new chips every time a change is needed
- Takes too long (multiple weeks) to get the new chips

# <u>Corollary:</u> many high volume embedded systems don't use ASICs! (Application-Specific ICs and semi-custom chips)

- · Design tools are too expensive and have too steep a learning curve
- · Changes come frequently, obsoleting inventory
- ASICs usually only worthwhile for high-end embedded systems (\$50 to \$100 chips might be sensible ASICs not \$1 to \$10 chips!)

27

# **PROM Types**

#### **♦ PROM: Programmable Read-Only Memory**

• Generic term for non-volatile memory that can be modified

#### OTPROM – "One Time" PROM

- Can only be programmed a single time (think "blowing fuses" to set bit values)
- · Holds data values indefinitely

#### EPROM – "Eraseable" PROM

- Entire chip erased at once using UV light through a window on chip
- · Mostly obsolete and replaced by flash memory

#### ◆ EEPROM – "Electrically Eraseable" PROM

- Erasure can be accomplished in-circuit under software control
- Same general operation as flash memory EXCEPT...
- ...EEPROM can be erased/rewritten a byte at a time
  - Often have both flash (for bulk storage) and EEPROM (for byte-accessible writes) in same system

#### For all PROMS, ask about data retention

- Bits "rot" over time, 10 years for older technology; 100 years for newer technology
- 10 year product life is often too short for embedded systems!
- Also ask about wearout for values that are updated frequently





# Flash Memory Update & Integrity

- Flash memory can be used as a "solid state hard drive"
  - · Supports erase/reprogram of blocks of memory (not bytes as with EEPROM)
  - Technology used in USB "thumb drives" and solid state MP3 players
  - Hardware supports wear leveling and sector remapping to mitigate write hot-spots

#### **♦** Flash/EEPROM update is complex

- · Requires significant time and repeated operations to set good bit values
- Writing both flash and EEPROM is slow

#### Common flash problem – "weak writes"

- What happens if machine crashes during flash update?
- Gate can be at a marginal voltage → unreliable data values
- Usual solution: keep flag elsewhere in flash indicating write in progress
  - "System has started a flash update"
  - "System has completed a flash update"
  - If reboot finds "started" flag set, you know a weak write took place
- · Some flash-based file systems to have vulnerabilities in this area
  - Sometimes even the ones that say they are protected against power outages
  - If you use one, try about 100 power cycle tests to see if it suffers corruption

3

# **How Does Memory Connect To CPU?**

- Processor bus ("memory bus") connects CPU to memory and I/O
  - Data lines actually transfers data
  - Address lines feed memory address and I/O port number
  - Control lines provides timing and control signals to direct transfers
  - · Sometimes these lines are shared to reduce hardware costs

Figure 1.2

A memory read cycle copies data from RAM, ROM, or an input device into the processor.

[Valvano]



\_\_\_\_

## **Bus Transactions**

#### Bus serves multiple purposes

- · Memory read and write
- · I/O read and write
- Bulk data transfers (DMA discussed later in lecture)

Figure 1.3

A memory write cycle copies data from the processor into RAM or an output device.

[Valvano]



# **Address Decoding**

#### Every device on bus must recognize its own address

- · Must decide which of multiple memory chips to activate
- Each I/O port must decide if it is being addressed
- High bits of addressed decoded to "select" device; low bits used within device

#### "Memory Mapped" I/O

- I/O devices and memory share same address space (e.g., Freescale)
- Alternative: separate memory and I/O control lines (e.g., Intel)
- What address does this decode?

  Figure 9.7

  An address decoder identifies on which cycles to activate.

  A13

  A14

  A14

  A14

  A15

  A17

  A17

  A17

  A18

  A18

  A18

  A19

  A19

  A19

  A19

  A119

  A119

A11

 $\overline{A11}$   $\overline{A10}$ 

[Valvano]

# **Read And Write Timing**

#### **♦** Usually two edges involved

- One edge means "address valid now" starts memory cycle
- Second edge means "read or write data valid now" ends memory cycle

**Figure 9.24**Synchronized bus timing.

[Valvano]



#### MC9S12C32 Bus Timing Figure 9.40 Simplified bus timing for $t_4$ the MC9S12C32 in Eexpanded mode. 7ns 2ns R/W **LSTRB** Ons 15ns 2ns Read [Valvano] XX D15-D0 XX XXXX A15/-A0 XX AD15-AD0 11ns 2ns Write AD15-AD0 XXX A15-A0 D15-D0 36

# **Typical Bus Lines**

#### Clock

- System clock so other devices don't have to have their own oscillators
- Drives bus timing for synchronous transfers

#### Address & Data

- Used for memory R/W, I/O, and DMA
- · Sometimes multiplexed, sometimes separate
- Sometimes address is multiplexed (high/low) to make DRAM interface simpler

#### Control signals

- Read/write which way is data moving?
- Memory vs. I/O if they are separate address spaces (Intel, not Freescale)
- Byte vs. word is it a whole word, or just a byte?
- Device controls interrupt request/grant; DMA request/grant; etc.

37

# **DMA – Direct Memory Access**

- For block memory transfers, can we keep data from the CPU bottleneck?
  - In software, each byte read requires Device => CPU; CPU => Memory
  - Instead, directly transfer data from I/O device to memory (and reverse too)
  - · Requires separate DMA controller hardware to perform transfer

#### Figure 1.4

A DMA read cycle copies data from RAM, ROM, or an input device into an output device.

[Valvano]











## ISA (PC/104) Direct Memory Access (DMA) Operation

#### Separate DMA controller

- Counter to track number of words remaining
- "Cycle steals" bus bandwidth, transparent to programs

# ◆ Data moves from memory dack x to I/O

- I/O card asserts DRQx
- I/O eventually receives DACKx from DMA controller
- DMA controller asserts MEMR and IOW to accomplish a concurrent memory read and I/O write operation



# **Practicalities – Fanout**

#### ♦ Sometimes a CPU has to drive many loads on a bus

- Multiple banks of memory
- Multiple I/O devices

#### Fanout = number of loads being driven

- · By address bus
- · By data bus
- · By control lines
- Limited by drive current  $\rm\,I_{OH}$  and  $\rm\,I_{OL}$  (chip I/O speed rated at limited current)
- Common limit for fanout is 5-10 loads

#### If fanout limit is exceeded need a buffer

- Especially common for address lines on memory wider than 8 bits
- For example, 74LS245 is a bidirectional data buffer; 74LS244 is a unidirectional buffer
- · Buffer adds delay; slows down maximum system speed; increases fanout limit
- Usually need to buffer DRAM memory address lines
  - Address lines drive \*all\* the chips (e.g., drives 8 chips for 4 chips x 32 bits x 2 banks)
  - Data lines only drive one chip in each bank (e.g., drives 2 chips for 2 banks)



# **Practicalities – Conflicting Bus Devices**

- What happens if address decoding has a hardware bug?
  - One device might drive a bit to high
  - One device might drive that same bit to low
  - Is that OK?



45

# **Practicalities – Noise And Termination**

- **♦** Real Hardware buses act as a transmission line
  - Signals take non-zero time to propagate
  - Signal waves reflect, superimpose, interfere, etc.
  - Noise issues are dominated by edge steepness not just MHz!
    - Spectral components of edge are the culprit, not transitions per second
- **♦** Termination is used in physically large or complex buses
  - Put terminating resistors at one (or better, both) ends of bus lines
  - Especially if cabling or mechanical connectors are involved



# **Memory Address Space Extension**

- ♦ How does a 16-bit CPU address more than 64KB?
  - Ever wonder how a 16-bit CPU can have 128KB of memory?
  - To do this, need to change "memory model"

#### Page register

- A register that holds top 8 or 16 bits of memory address
- Memory address pre-pended with page register value
- Might have "long" instructions that take full size memory address
- Might have multiple page registers to allow copying between pages
- If you have a problem with load and store instructions not working, check that you have the right memory model we're using the "tiny" memory model which ignores page register
- ♦ Segment registers (e.g., 808x original IBM PC CPU)
  - A 24-bit or 32-bit base register that is added to each memory address
  - Flexible, but hardware addition adds latency to memory path
  - Might have multiple segment registers (e.g., program, stack, data)
- ♦ Virtual memory .... (coming right up)

47

# **Course CPU Uses A Page Register**

- ♦ Version 5 uses "far" addresses for subroutine calls
  - · Uses CALL instructions instead of JSR/BSR
  - Uses RTC instead of RTS
- ◆ PPG = Program Page register
  - 8 bit register that holds the top 8 bits of program address
  - Programs operate in a 64 K-byte fixed address space for programs
  - Switch between pages using CALL and RTC
  - CALL pushes PPG onto stack; RTC pulls PPG from stack

| CALL opr16a, page CALL oprx0_xysp, page CALL oprx9,xysp, page CALL oprx16,xysp, page CALL [D,xysp] | (SP) – 2 ⇒ SP; RTN <sub>H</sub> :RTN <sub>L</sub> ⇒ M <sub>(SP)</sub> :M <sub>(SP+1)</sub> (SP) – 1 ⇒ SP; (PPG) ⇒ M <sub>(SP)</sub> ; pg ⇒ PPAGE register; Program address ⇒ PC  Call subroutine in extended memory | EXT<br>IDX<br>IDX1<br>IDX2<br>[D,IDX] | 4A hh 11 pg<br>4B xb pg<br>4B xb ff pg<br>4B xb ee ff pg<br>4B xb |
|----------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------|-------------------------------------------------------------------|
| CALL [oprx16, xysp]                                                                                | (Program may be located on another expansion memory page.)  Indirect modes get program address and new pg value based on pointer.                                                                                   | [IDX2]                                | 4B xb ee ff                                                       |
| RTC                                                                                                | $\begin{array}{l} (M_{(SP)}) \Rightarrow PPAGE; (SP) + 1 \Rightarrow SP; \\ (M_{(SP)}; M_{(SP+1)}) \Rightarrow PC_H; PC_L; \\ (SP) + 2 \Rightarrow SP \\ Return from Call \end{array}$                              | INH                                   | OA [Freescale]                                                    |



# **Memory Protection**

- ♦ Many small CPUs have unlimited access to memory
  - · Any task can corrupt RAM
  - Fortunately, a wild pointer can't corrupt Flash memory
    - Flash requires a complex procedure to modify
- Virtual memory provides excellent memory protection
  - Each task has its own distinct memory space starting at address 0
  - Only the OS can access other tasks' memory spaces
  - Can enable sharing on a page by page basis

#### Virtual memory hardware "lite" = MMU

- Memory Management Unit
- Big MMU might provide hardware support for virtual memory
- But, a "small" MMU might just protect memory from other tasks
  - Usually a per-task base register that is added to memory addresses

#### What if you don't have an MMU?

- Good practice is at least putting error code information of blocks of RAM values
- · If a wild pointer changes values, the error code has a chance to detect it

## **Lab Skills**

#### Built a memory bus interface

- The module we use doesn't have the real memory bus pinned out to proto-board
- So we created software to emulate a simple memory bus for you



# **Review**

#### Memory types

- Different types of memory and general characteristics
  - Should know names, general construction, characteristics of each
  - General idea behind NV memory (flash operation/EEPROM use)
- Interfacing to memory (rows vs. columns)
  - Should know, e.g., what "RAS" and "CAS" do on DRAMs at level presented
  - Should understand how "read," "write," and "refresh" signals work

#### CPU memory bus

- General signals on a bus and what they are for
- How to read a timing diagram
- General bus operations read, write, DMA, I/O
- General practicalities (fanout, conflicts, noise, termination)
- · Memory address space protection

#### ◆ BUT we **don't** expect you to memorize or do these things:

- Memorize timing numbers on specific buses
- Draw bus timing diagrams or recall bus signal names from memory
- Draw or interpret what each individual transistor does in a memory cell