Comment on results presented in Eric’s slides: Eric: DMP-Serial does not perform as bad as expected given that all memory accesses are checked. Onur/Marek: Probably only shared memory references are being tracked After looking at paper it seems that all memory references are tracked. This seems naïve probably shown as a very simple solution that would never actually be used.
Eric: Simulations seem to be oversimplified (e.g. they don’t model caches, assume IPC=1, etc) Marek: Maybe try some token-passing scheme that is more clever than round-robin? Onur/Eric: Probably they need deterministic token passing for determinism in execution.
Question: How do they cope with inherent non-determinism? Answer: They are probably only trying to eliminate non-determinism originating from presence of multiple threads. Single-thread non-determinism will always be there.
Tony: If a bug is encountered, is there a log kept that captures interleaving? Onur: This sounds useful Eric: Replay and deterministic execution ideas commonly combined. is a common use of determinstic execution. Most efficient logging is ~4MB/data per core from Mark Hill.
There is a portability issue. If you migrate this code to a system that doesn't support this scheme some bugs my now start manifesting themselves.
Quantums of code can change significantly by even just adding one instruction. This causes problems e.g. when you debug. You add debugging code, but now you are executing a different interleaving. (e.g. maybe bug is eliminated now).
Marek: Is there a scheme that check all possible interleavings. Eric: Number of paths exponential. There have been schemes to reduce the number of paths, but number still VERY high.
Eric: Motivation and goal of paper is good, but scheme not very interesting. We should also look at ISCA '09 paper on same subject, which restricts who can produce values at specific shared memory locations. This is similar to adding additional constraints. E.g. if someone forgot to place a lock in the code it might still work because he has to wait until the right core produces the value. But still doesn't capture all bugs. E.g. same producer might write to a location more than once and bug might manifest itself depending on when you read the value. (producer is always valid)
Michael: Does the paper mention what percentage of the bugs they can eliminate/detect. This paper does not mention something on this matter. The above-mentioned ISCA-09 paper did consider some reported bugs on popular code (e.g. MySQL, Apache, etc).
They used Pin for instrumentation and did not model caches, probably to avoid complexity of parallel simulation.
Comment by Eric: If code is unbalanced this scheme might lead to significant performance decrease.
Wei Yu: Can I only explicitly enforce this scheme to the areas I want to debug? Michael: This scheme is not mean for debugging, but mostly for preventing rare bugs from manifesting themselves in the future. This is done by only allowing a subset of all possible execution paths. Maybe possible to exhaustively test some regions of the code and for the rest just restrict the possible execution paths.
(ED PETER) The issues of synchronizing processes has been a significant for the reliability world for a long time because they are interested in lock-step operation that is difficult to achieve since non-determinism became an issue in microprocessors. See for example: http://portal.acm.org/citation.cfm?id=1130866