Due Friday, September 4, 1998
Exercise the Dinero cache simulation tool with some pre-generated address traces. The traces can be found at:
There are three traces as follows. Because these traces are a bit large, you may want to use them from where they are by logically linking to them rather than copying them to your local directory.
File Name Size(bytes) a.din 14417700 b.din 14417700 c.din 14417700
The traces capture data accesses made from a program run (instruction accesses were ignored, as were any memory accesses from the main() procedure). They are already in a format suitable for being fed directly into dinero and contain 32-bit trace information (so they can be correctly processed by Dinero running on any 32-bit Unix platform). The three traces are data accesses performed by the three toy benchmark programs (but, at this point we're not telling you which trace is for which program):
Note: for any short-answer questions, it shouldn't take more than about 50 words to answer the question; long essays are neither required nor desired at any time in this course unless specifically asked for.
Experimentally determine the size of first-level cache on two Unix platforms (one platform being the course Alphastations and one that does not have an Alpha CPU). A starting point for your experiment can be found at cachsize.c Be sure to compile all your programs with the C compiler optimization turned on!
This program sequentially reads all the elements from a data array. It does this in two places -- first to pre-load the array into cache to eliminate any compulsory misses. Second it reads the data in a timed loop that repeatedly accesses the array to determine what the sustained access speed to the array will be (presumably faster if the array is small enough to be completely resident in cache). The number of times the array is read is scaled according to the array size to keep the total number of memory accesses relatively constant no matter what the array size. Thus, the relevant results are in MBytes/second read, not the actual execution time.
You will need to modify the program so that the timed loop takes approximately 10 seconds of execution time so that the experimental noise from the timer quantization is low. (Anything between 10 and 100 seconds is fine.) Your best bet is to use or modify the loop limit tuning mechanism used in the example; querying the timer each time through the loop will disturb data cache contents.
Compile your modified program and obtain times for array sizes in 1 KB increments from 1 KB to 40 KB. (On some machines you may need to go higher; if so you can use a larger spread between data points; use your judgement but take data up to approximately 2.5 times the L1 cache size). Repeat the below steps for both machines; you may plot two curves on the same graph as long as they are color-coded or otherwise obviously distinguishable.
If you don't have access to a second workstation type, log in to: unix.andrew.cmu.edu with the I.D. of butler and obtain a list of idle Andrew workstations.
18-548 home page.