Differences
This shows you the differences between two versions of the page.
documentation:iiswc2010_tutorial_flexus [2010/11/28 13:43] mferdman |
documentation:iiswc2010_tutorial_flexus [2010/12/03 22:25] (current) mferdman |
||
---|---|---|---|
Line 88: | Line 88: | ||
* Hit ESC and type **:wq** to save the file and exit. | * Hit ESC and type **:wq** to save the file and exit. | ||
* Type **mkdir /host** | * Type **mkdir /host** | ||
- | * This is usually a good time to save out a checkpoint right before you mount the host file system. At the Simics console, type **CTRL-C** followed by something like **write-configuration <ckpt_dir>/<your_checkpoint_name_b4_sfsmount>** | + | * This is usually a good time to save out a checkpoint right before you mount the host file system. At the Simics console, type **CTRL-C** followed by something like **write-configuration ~/images/b4_sfsmount** |
* Type **run** at the Simics console to resume. | * Type **run** at the Simics console to resume. | ||
* Within the simulated console, type **mount /host** | * Within the simulated console, type **mount /host** | ||
* Type **ls /host** to see the underlying host machine's root directory | * Type **ls /host** to see the underlying host machine's root directory | ||
- | At this point, you should copy microbenchmark files from **~/tutorial_files/microbenchmarks** into the target machine (by copying it from **/host** to a location on the simulated disk). Save out a NEW checkpoint called **~/images/benchloaded** and quit out of Simics. Now open the checkpoint you saved with vi by typing **vi ~/images/benchloaded** and locate and delete the following lines: | + | At this point, you should copy microbenchmark files from **/host/home/pf_user/tutorial_files/microbenchmarks** into the target machine (by copying it from **/host** to a location on the simulated disk). Save out a NEW checkpoint called **~/images/benchloaded** and quit out of Simics. Now open the checkpoint you saved with vi by typing **vi ~/images/benchloaded** and locate and delete the following lines: |
<code> | <code> | ||
Line 109: | Line 109: | ||
- You can see how the source code inserts the magic instructions by looking at **spinlock.c** | - You can see how the source code inserts the magic instructions by looking at **spinlock.c** | ||
- | - Create a new Simics script called break.simics and fill it in with this: | + | - Create a new Simics script called break.simics and fill it in with this: (this should already be available for you under ~/simics-3.0.22/targets/serengeti) |
<code> | <code> | ||
@def hap_callback(user_arg, cpu, arg): | @def hap_callback(user_arg, cpu, arg): | ||
Line 122: | Line 122: | ||
</code> | </code> | ||
- | - Launch Simics by typing **start-simics break.simics** | + | - Launch Simics by typing **../../scripts/start-simics break.simics** |
- Within the simulated console, navigate to the directory where you copied over the microbenchmark files. | - Within the simulated console, navigate to the directory where you copied over the microbenchmark files. | ||
- Type: **./spinlock 4 1000000000 10 10 0** (this indicates we want 4 threads and run for effectively an infinite number of iterations) | - Type: **./spinlock 4 1000000000 10 10 0** (this indicates we want 4 threads and run for effectively an infinite number of iterations) | ||
Line 128: | Line 128: | ||
- Type **run** again and wait until the first thread starts to execute and triggers the magic breakpoint | - Type **run** again and wait until the first thread starts to execute and triggers the magic breakpoint | ||
- Save a final checkpoint by typing **write-configuration ~/images/spinlock** | - Save a final checkpoint by typing **write-configuration ~/images/spinlock** | ||
+ | - **FINAL STEP** (to prepare the checkpoint we will be using in the ProtoFlex part of the tutorial). This final step is needed to maximize the performance of the underlying simulated I/O system. Simics is typically the initiator of DMA transactions, which occur at some bulk-sized granularity. This granularity is set by default to a very low value (64 Bytes) in default Simics checkpoints. Since Simics is a software-based simulator, issuing many small bulk transfers imposes no simulation overhead. In our system, large bulk transfers are far more desirable. To change this default setting, you will need to **EDIT** the checkpoint file and make one small change. Copy theType the following commands: | ||
+ | <code> | ||
+ | write-configuration ~/checkpoints/final | ||
+ | quit | ||
+ | perl -pi -e 's/dma_block_size: 64/dma_block_size: 8192/' ~/checkpoints/final | ||
+ | </code> | ||
\\ | \\ | ||
Line 134: | Line 140: | ||
======3. Working with Flexus====== | ======3. Working with Flexus====== | ||
- | From the workload we just created, you will get to chance to run some sample jobs with Flexus and create a Flexpoint library. By this point you should have a valid initial checkpoint stored as **~/images/spinlock**. | + | From the simics checkpoint you just created, you will get a chance to run some sample jobs with Flexus. By this point you should have a valid initial checkpoint stored as **~/images/spinlock**. |
- | - Before starting, you should create a few initial directories in the home (which we will explain in the next steps):<code> | + | - Before starting, you should have a few initial directories in the home (which we will explain in the next steps):<code> |
- | mkdir ~/ckpts | + | ~/ckpts |
- | mkdir ~/images | + | ~/specs |
- | mkdir ~/specs | + | |
</code> | </code> | ||
- | - The flexus simulator is stored as ~/tutorial_files/flexus_tutorial.tgz. Copy this file to your home directory and extract the tarball. You should have a directory called ~/flexus. | + | - We created the **ckpts** and **specs** directory in your home. |
+ | - The flexus simulator is stored as ~/tutorial_files/flexus_v4. | ||
====Getting familiar with the run_job script==== | ====Getting familiar with the run_job script==== | ||
Line 147: | Line 153: | ||
++++CLICK - Expand/Collapse| | ++++CLICK - Expand/Collapse| | ||
- | The run_job script should be run from the ~/tutorial_files/flexus_v4 directory and requires for the home directory of the user to contain a .run_job.rc.tcl file. Additionally, a **~/specs** directory must contain at least an interactive job configuration. | + | The run_job script should be run from the ~/tutorial_files/flexus_v4 directory and requires for the home directory of the user to contain a .run_job.rc.tcl file. Additionally, a **~/specs** directory must contain at least a job configuration (We placed a job configuration in this directory). |
- Copy the example RC file from **~/tutorial_files/flexus_v4/scripts/.run_job.rc.tcl** into **~/** | - Copy the example RC file from **~/tutorial_files/flexus_v4/scripts/.run_job.rc.tcl** into **~/** | ||
- | - Create a ~/specs/interactive/ directory and place a user-preload.simics file there (empty file is OK for the tutorial) | + | - Execute the **run_job** script from the ~/tutorial_files/flexus_v4 directory to confirm correct setup (the command-line help will be displayed when the prerequisites are met) |
- | - Execute the run_job script from the ~/flexus/ directory to confirm correct setup (the command-line help will be displayed when the prerequisites are met) | + | |
The .run_job.rc.tcl file contains "rungen" sections with directives for each workload. When executing run_job, the rungen is selected with the "-run" parameter. Typical rungens are "phase" for phase generation, "flexpoint" for flexpoint generation, "trace" for functional simulation jobs, and "timing" for the detailed cycle-accurate simulations. | The .run_job.rc.tcl file contains "rungen" sections with directives for each workload. When executing run_job, the rungen is selected with the "-run" parameter. Typical rungens are "phase" for phase generation, "flexpoint" for flexpoint generation, "trace" for functional simulation jobs, and "timing" for the detailed cycle-accurate simulations. | ||
The run_job script has already been configured for you. | The run_job script has already been configured for you. | ||
- | * Take a look at the various paths and options that are specified in it by examining the ~/flexus/scripts/global.run_job.rc.tcl file. | + | * Take a look at the various paths and options that are specified in it by examining the ~/tutorial_files/flexus_v4/scripts/global.run_job.rc.tcl file. |
Flexus scripts expect a specific directory hierarchy for the checkpoints. | Flexus scripts expect a specific directory hierarchy for the checkpoints. | ||
Line 166: | Line 171: | ||
* load the initial checkpoint in Simics (using the **start-simics** script) | * load the initial checkpoint in Simics (using the **start-simics** script) | ||
* simics> **read-configuration ~/images/spinlock** | * simics> **read-configuration ~/images/spinlock** | ||
- | * simics> **run-command-file ~/flexus/scripts/create_mem_and_io_proxy.simics** | + | * simics> **run-command-file ~/tutorial_files/flexus_v4/scripts/mem_io_proxy.simics** |
* simics> **write-configuration ~/ckpts/spinlock/baseline/phase_000/simics/phase_000** | * simics> **write-configuration ~/ckpts/spinlock/baseline/phase_000/simics/phase_000** | ||
To verify that the basic run_job settings are correct and that the spinlock workload is properly set up, use run_job to launch Simics with the spinlock workload (NONE indicates that no Flexus simulator library should be loaded): | To verify that the basic run_job settings are correct and that the spinlock workload is properly set up, use run_job to launch Simics with the spinlock workload (NONE indicates that no Flexus simulator library should be loaded): | ||
- | * **~/flexus/scripts/run_job NONE spinlock** (error message about "flexus" missing is OK) | + | * **~/tutorial_files/flexus_v4/scripts/run_job NONE spinlock** (error message about "flexus" missing is OK) |
Add configuration for the "spinlock" benchmark to the "trace" rungen of ~/.run_job.rc.tcl | Add configuration for the "spinlock" benchmark to the "trace" rungen of ~/.run_job.rc.tcl | ||
Line 176: | Line 181: | ||
* configure statistics region interval at 50000000 (50M) cycles | * configure statistics region interval at 50000000 (50M) cycles | ||
- | Run a "spinlock" trace job with TraceCMPFlex. | + | Run a "spinlock" trace job with CMP.L2Shared.Trace |
- | * Example trace configuration can be found in the **scripts/trace/user-*load.simics** files. | + | * **run_job -run trace -cfg 4cores -local CMP.L2Shared.Trace spinlock** |
- | * **~/flexus/scripts/run_job -run trace -cfg test_cfg_trace -local TraceCMPFlex spinlock** | + | * Explanation of "local": -local requests to run a batch of jobs locally. without -local an interactive run is assumed which waits at the simics> prompt instead of running. |
- | * Explanation of "local": -local requests to run a batch of jobs locally. without -local an interactive run is assumed which waits at the simics> prompt instead of running. | + | * Explanation of "remote": -remote will submit jobs to a remote cluster (e.g., Condor, PBS, etc...) [not available for the tutorial]. |
- | * Explanation of "remote": -remote will submit jobs to a remote cluster (e.g., Condor, PBS, etc...) [not available for the tutorial]. | + | * You can **run** simulation, interrupt it with **ctrl+c**, and change debug severity with **flexus.debug-set-severity iface**. |
++++ | ++++ | ||
- | ====Displaying statistics through the stat-manager tool==== | + | ====Displaying statistics with the stat-manager tool==== |
++++CLICK - Expand/Collapse| | ++++CLICK - Expand/Collapse| | ||
Find the run directory for the trace job in ~/results/ and examine the resulting statistics database: | Find the run directory for the trace job in ~/results/ and examine the resulting statistics database: | ||
- | * **~/flexus/stat-manager/stat-manager list-measurements** | + | * **~/tutorial_files/flexus_v4/stat-manager/stat-manager list-measurements** |
* See the cache hit/miss statistics, branch predictor stats, and instruction mix breakdown. | * See the cache hit/miss statistics, branch predictor stats, and instruction mix breakdown. | ||
- | * **~/flexus/stat-manager/stat-manager print "Region 000" | less** | + | * **~/tutorial_files/flexus_v4/stat-manager/stat-manager print 'Region 000' | less** |
- | * **~/flexus/stat-manager/stat-manager print "Region 001" | less** | + | * **~/tutorial_files/flexus_v4/stat-manager/stat-manager print 'Region 001' | less** |
* By default, stat-manager aggregates statistics across all cores. You can override this behavior with the -per-node flag. | * By default, stat-manager aggregates statistics across all cores. You can override this behavior with the -per-node flag. | ||
- | * **~/flexus/stat-manager/stat-manager -per-node print "Region 001" | less** | + | * **~/tutorial_files/flexus_v4/stat-manager/stat-manager -per-node print 'Region 001' | less** |
++++ | ++++ | ||
+ | |||
+ | ====Running timing simulations==== | ||
+ | ++++CLICK - Expand/Collapse| | ||
+ | Run a "spinlock" timing job with CMP.L2SharedNUCA.OoO | ||
+ | * **run_job -run timing -cfg 4cores -ma CMP.L2SharedNUCA.OoO spinlock** | ||
+ | * NOTE: When running timing simulations, one must pass the **-ma** parameter to Simics. | ||
+ | * You can **run** simulation, interrupt it with **ctrl+c**, and change debug severity with **flexus.debug-set-severity iface**, **run 10** will run 10 cycles on all CPUs. | ||
+ | * Rebuild the simulator with **vverb** debug output (CMP.L2SharedNUCA.OoO-vverb) and try running simulation with **flexus.debug-set-severity vverb** to see the detailed debug output. | ||
+ | ++++ | ||
+ | |||
+ | ======4. Using Statistical Sampling with Flexus====== | ||
====Creating a flexpoint library==== | ====Creating a flexpoint library==== | ||
Line 199: | Line 215: | ||
++++CLICK - Expand/Collapse| | ++++CLICK - Expand/Collapse| | ||
* Configure the "flexpoint" rungen in .run_job.rc.tcl to create 20 flexpoints, spaced 200000 (200K) instructions apart. | * Configure the "flexpoint" rungen in .run_job.rc.tcl to create 20 flexpoints, spaced 200000 (200K) instructions apart. | ||
- | * **~/flexus/scripts/run_job -ckpt-gen -postprocess "$HOME/flexus/scripts/postprocess_ckptgen.sh flexpoint 20 mystate" -local -cfg test_cfg_trace -run flexpoint TraceCMPFlex spinlock** | + | * **~/tutorial_files/flexus_v4/scripts/run_job -ckpt-gen -postprocess "$HOME/tutorial_files/flexus_v4/scripts/postprocess_ckptgen.sh flexpoint 20 mystate" -local -cfg 4cores -run flexpoint CMP.L2Shared.Trace spinlock** |
* **-ckpt-gen** ensures that state is written out at the end of simulation | * **-ckpt-gen** ensures that state is written out at the end of simulation | ||
* **-postprocess** specifies the script to run after each job | * **-postprocess** specifies the script to run after each job | ||
Line 211: | Line 227: | ||
++++CLICK - Expand/Collapse| | ++++CLICK - Expand/Collapse| | ||
Add configuration for the "spinlock" benchmark to the "timing" rungen of ~/.run_job.rc.tcl | Add configuration for the "spinlock" benchmark to the "timing" rungen of ~/.run_job.rc.tcl | ||
- | * configure simulation to stop at 15000 (15K) cycles | + | * configure simulation to stop at 150000 (150K) cycles |
- | * configure statistics region interval at 5000 (5K) cycles | + | * configure statistics region interval at 50000 (50K) cycles |
- | Run a "spinlock" timing job with CMPFlex.OoO. | + | Run a "spinlock" timing job with CMP.L2SharedNUCA.OoO. |
- | * Example timing configuration can be found in the scripts/timing_v9/user-*load.simics files. | + | * **~/tutorial_files/flexus_v4/scripts/run_job -run timing -cfg 4cores -local -ma -state mystate CMP.L2SharedNUCA.OoO spinlock** |
- | * **~/flexus/scripts/run_job -run timing -cfg test_cfg_timing -local -ma -state mystate CMPFlex.OoO spinlock** | + | |
* NOTE: When running timing simulations, one must pass the **-ma** parameter to Simics. | * NOTE: When running timing simulations, one must pass the **-ma** parameter to Simics. | ||
* NOTE: Don't forget to specify **-state** to load the microarchitectural state created with the trace simulator, otherwise each flexpoint is run from cold microarchitectural state, severely biasing the results! | * NOTE: Don't forget to specify **-state** to load the microarchitectural state created with the trace simulator, otherwise each flexpoint is run from cold microarchitectural state, severely biasing the results! | ||
Line 223: | Line 238: | ||
* Notice much more detailed statistics for timing simulator compared to the trace simulator. | * Notice much more detailed statistics for timing simulator compared to the trace simulator. | ||
* Find the IPC of some of the flexpoints' results using stat-manager: | * Find the IPC of some of the flexpoints' results using stat-manager: | ||
- | * **~/flexus/stat-manager/stat-manager format-string "<EXPR:{Nodes-uarch-TB:User:Commits:Busy}/({Nodes-uarch-TB:User:AccountedCycles}+{Nodes-uarch-TB:System:AccountedCycles})>" "Region 001"** | + | * **~/tutorial_files/flexus_v4/stat-manager/stat-manager format-string "<EXPR:{Nodes-uarch-TB:User:Commits:Busy}/({Nodes-uarch-TB:User:AccountedCycles}+{Nodes-uarch-TB:System:AccountedCycles})>" "Region 001"** |
The default postprocess.sh script (which runs after each job if a -postprocess override is not specified) automatically creates a stats_db.out.selected.gz file that contains only statistics between 100K and 150K instructions. | The default postprocess.sh script (which runs after each job if a -postprocess override is not specified) automatically creates a stats_db.out.selected.gz file that contains only statistics between 100K and 150K instructions. | ||
Use stat-sample to combine all the stats_db.out.selected.gz files into a single statistics file. | Use stat-sample to combine all the stats_db.out.selected.gz files into a single statistics file. | ||
- | * **~/flexus/stat-manager stat-sample stats_db.out.gz */stats_db.out.selected.gz** | + | * **~/tutorial_files/flexus_v4/stat-manager/stat-sample stats_db.out.gz */stats_db.out.selected.gz** |
* Examine the resulting stats_db.out.gz file that contains the combined results of all flexpoints. | * Examine the resulting stats_db.out.gz file that contains the combined results of all flexpoints. | ||
* Examine the IPCs of the various flexpoints: | * Examine the IPCs of the various flexpoints: | ||
Line 234: | Line 249: | ||
* If bringing UIPCs into Excel, compute =STDEV() and =CONFIDENCE() for 95% confidence. | * If bringing UIPCs into Excel, compute =STDEV() and =CONFIDENCE() for 95% confidence. | ||
* Bring time Breakdowns into Excel: | * Bring time Breakdowns into Excel: | ||
- | * **~/flexus/stat-manager/stat-manager print sum | grep ":Bkd:" > breakdown.tsv** | + | * **~/tutorial_files/flexus_v4/stat-manager/stat-manager print sum | grep ":Bkd:" > breakdown.tsv** |
* Use Excel to split data by the colon (":") character into columns. Apply the Pivot Chart feature to plot the time breakdown. | * Use Excel to split data by the colon (":") character into columns. Apply the Pivot Chart feature to plot the time breakdown. | ||
\\ | \\ |