# Homework 7: Tuning Software for Speed

Due Wednesday October 21, 1998

Problem 1:

You want to optimize a piece of C software that does complex number computations below:

```int a_r[N][N], b_r[N][N], c_r[N][N];
int a_i[N][N], b_i[N][N], c_i[N][N];
for (j = 0; j < N; j++)
{ for (i = 0; i < N; i++)
a_r[i][j] = b_r[i][j] + c_r[i][j];
}
for (j = 0; j < N; j++)
{ for (i = 0; i < N; i++)
a_i[i][j] = b_i[i][j] + c_i[i][j];
}
```

1) Rewrite the code using loop interchange to reverse the order of indexing. Do you expect this to make the software go faster or slower in the general case?

3) Additionally, perform array merging on the code from answer (2) so that the real and imaginary parts of arrays a, b, and c are each merged (i.e., a_r and a_i are merged into one array a, b_r and b_i are merged into one array b, and c_r and c_i are merged into one array c) and write the resultant code.

4) Additionally, assume that you are using a machine with block size of 32 bytes and a no-write-allocation policy. Rewrite the code from answer (3) to "fake" write allocation and improve performance.

5) For the special case of N=512, 32-bit ints, rewrite the code from answer (4) to improve cache performance using array placement. Assume that you are using a machine as below:

• 8 KB direct mapped L1 cache
• 96 KB 3-way set associative L2 cache
• 2 MB direct mapped L3 cache

Optionally (and not graded), incorporate the original and variations of your software into a real program and time the performance improvements. (Did the fake write allocation help? What does this tell you about the machine you ran on?)