Achieving Predictable Performance through Better Memory Controller Placement in Many-Core CMPs

Deep Thoughts

group is surprised that diamond is the best one. why not just put all the controllers in the middle. Are hot spots really as big a problem as the authors contend?

We're not sure if they're genetic algorithms would find the best answer. Why not use the intuition gained from the random method and apply that to a hand optimization. For the diamond, why not move a corner bit to the middle? Wouldn't that cut down hot spots some more?

How does this affect real performance and not just network performance?

We don't like synthetic benchmarks. Why didn't they report all the results in terms of the full system simulator.

—supplemented by Yoongu—

Migration of threads across cores. For cache hot threads, it may be wiser to migrate it to a core that is farther away from memory controllers. As opposed to NUCA or NUMA, we are moving the thread around, not the data.

Open loop simulations can exaggerate the extent of network traffic. Cores are usually closed loop for two reasons. Both of these factors will mitigate the strain on the on-chip network traffic as they act as negative feedback. 1) They do not inject any more packets once a certain number of memory requests are outstanding. This is due to a full window stall on the part of the core. 2) When there are multiple memory requests outstanding, they will take longer to be serviced.