Homework 10:

Due November 18, 1998

Problem 1:

You have a vector computer with the following bandwidths available for 8-byte data values:

• 5 memory banks at 120 Million B/sec each
• 1 read-only bus and 1 write-only bus at 480 Million B/sec each
• A VDS with three concurrent connections at 480 Million B/sec each
• A pipelined vector adder at 60 MFLOPS; 3 clock latency; 1 result per clock throughput
• A VRF with three ports at 480 Million B/sec each
• The following table gives the latency for vector additions at different vector sizes:
 Vector Length Total Latency (clocks) Achieved MFLOPS 1 29 ____ 2 31 ____ 3 33 ____ 4 35 ____ 5 37 ____ 6 39 ____ 7 41 ____ 8 43 ____ 9 45 ____
1. Compute the Achieved MFLOPS column for this table. (Assume no other operations are concurrent with the addition.) Show the computation for vector length 3 in detail (don't bother showing work for the rest).
2. If the scalar floating point coprocessor can compute a vector sum at the rate of 8 clocks per result, what is Nv on this architecture? Use a linear interpolation between the nearest pair of data points in the table.
3. For vector addition, what is Rinfinity on this architecture?

Problem 2:

You are designing a computer system bus under the following constraints:

• Each connection costs \$.10 for each socket contact ("pin") and \$.02 for an edge connector contact ("pin") on a parallel bus at 60 MHz with a maximum of 10 card slots.
• Your initial design multiplexes 32 address lines with 32 data lines for a total of 60 bus signal lines (the address and data lines share the same 32 out of 60 lines).
• A bus transaction takes 4 clocks to complete (i.e., 4 clocks per 32-bit word transfer)

You wish to increase system performance by using transfers of 128 bits. Assume that data can be transferred at the rate of one data chunk per bus clock after the first data chunk is returned with latency 4 clocks including addressing delay. What is the total fully populated (10 card) system cost, latency (in ns), and sustained bandwidth assuming full utilization for the following configurations at 128-bit data transfers?

1. Multiplexed address/data lines 32 bit data lines (Cost, latency, sustained bandwidth)
2. Non-multiplexed address/data lines; 128 bit data lines (Cost, latency, sustained bandwidth). Assume all control lines remain unchanged