# Homework 9: Disks & Vector Architecture

Due November 11, 1998

Problem 1: SCSI Bus Performance

The SCSI protocol consists of several phases for each data request and reply. The table below gives a breakdown of bus activity by phase (measured using a SCSI bus analyser) on a system with a number of fast disks transferring sequential blocks to a single host. Each individual request is for a 64 K block of data and in the multiple-disk cases, requests are issued in round-robin order (disk 1, disk 2, disk 3, disk 1, disk 2, disk 3, disk 1, etc.).

 phase 1 disk 2 disks 3 disks ARBITRATE 1% 1% 1% SELECT 1% 1% 5% MESSAGE 3% 10% 12% COMMAND 1% 1% 1% DATA 27% 55% 79% STATUS 1% 1% 1% BUS FREE 66% 31% 1%

a) Given a maximum bus throughput of 20 MB/s at 64K requests, what is the data transfer rate into the host in each case (with 1 disk, with 2 disks, with 3 disks)?

b) What would the transfer rate be if we added a fourth disk (remember that a single SCSI bus can hold up to 7 devices)?

If we reduce the request size to 8K each instead of 64K, we achieve the following utilization:

 phase 1 disk (8K) ARBITRATE 1% SELECT 15% MESSAGE 14% COMMAND 1% DATA 27% STATUS 1% BUS FREE 41%

c) Assuming the same sequential workload, but with 8K requests, does it make sense for me to add a second disk to this system? A third disk?

d) What if I changed my workload to random requests (where prefetching would no longer work, and seek time becomes an issue)? Would I benefit from a second disk? A third disk?

Problem 2: Vector Architecture

A particular vector computer design has the characteristics & assumptions given below. Some of the assumptions such as ignoring bank conflicts are made to make the problem easier and are obviously not realistic.

• 3 memory pipes (two VAGs for loading, one VAG for storing)
• Bus can carry one 8-byte word per clock tick (all numbers are in 8-byte words) and operates at 60 MHz
• No cache is involved
• There are no memory bank conflicts; ignore cycle time of DRAM after data is written or read
• The VRF holds at least 3 vectors of length 8
• VDS can transfer two words concurrently on each clock cycle from any point to any other point (except the bus, which can only handle one word at a time)
• There are appropriate buffers at the bus interface so data waits until a bus is free. Reads are giving priority over writes (so writes wait if there is any read pending)
• Vector instructions are dispatched by a scalar processor at one clock per instruction.
• Vector chaining as described in class is supported and should be used in your solutions; data is moved in more-or-less "fair" fashion, in which the VAGs attempt to keep at about the same vector element number, but don't let resources go idle in order to do this.
• All resources in the system are fully pipelined at one pipeline stage per clock tick.
 Latency Clock ticks of latency Vector instruction dispatch 1 VAG setup 1 Address reaches memory bank 3 DRAM read latency (ignore time to complete cycle) 4 Data returns from memory bank after access via bus 3 VDS delay 1 Adder delay (starting when both operands available) 4 VDS delay 1 Result sent to memory bank via bus (address & data) 3 Data written in to DRAM (ignore time to complete cycle) 4

A 4-element vector addition takes 4 clock cycles to issue ("vload", "vload", "vadd", "vstore"). What is the elapsed time for a 4-element vector addition in clock ticks? Provide a spreadsheet printout or other table diagram illustrating how you got this solution (similar to the spreadsheets in lecture 16, but using columns and latencies appropriate to the table above).