ProtoFlex   Carnegie Mellon

Differences

This shows you the differences between two versions of the page.

documentation:iiswc2010_tutorial_flexus [2010/11/28 12:50]
mferdman
documentation:iiswc2010_tutorial_flexus [2010/12/03 22:25] (current)
mferdman
Line 40: Line 40:
++++CLICK - Expand/Collapse| ++++CLICK - Expand/Collapse|
-**To create our first Simics checkpoint, we will need to boot a simulated target system and save out a new checkpoint. For your convenience, we have already created a disk image that contains a freshly installed copy of Solaris 10.**+To create our first Simics checkpoint, we will need to boot a simulated target system and save out a new checkpoint. For your convenience, we have already created a disk image that contains a freshly installed copy of Solaris 10.
  - Navigate over to **/home/pf_user/simics-3.0.22/targets/serengeti** on the primary PC   - Navigate over to **/home/pf_user/simics-3.0.22/targets/serengeti** on the primary PC
  - Open and edit the **abisko-common.simics** file and confirm the following lines near the top:<code>   - Open and edit the **abisko-common.simics** file and confirm the following lines near the top:<code>
Line 48: Line 48:
</code> </code>
  - These parameters allow us to configure the target machine at boot time according to our preferences. The design we will be demonstrating will be a 4-CPU system with a total of 256MB.   - These parameters allow us to configure the target machine at boot time according to our preferences. The design we will be demonstrating will be a 4-CPU system with a total of 256MB.
-  - For the purposes of this tutorial, create a new folder **~/checkpoints**. We will store all Simics-generated checkpoints in this directory. +  - For the purposes of this tutorial, create a new folder **~/images**. We will store all Simics-generated checkpoints in this directory.
  - Once you have edited the parameters, type **../../scripts/start-simics -x abisko-common.simics** to boot our machine.   - Once you have edited the parameters, type **../../scripts/start-simics -x abisko-common.simics** to boot our machine.
  - A simulated terminal should appear and show the Solaris 10 boot process.   - A simulated terminal should appear and show the Solaris 10 boot process.
  - Type **run** to begin simulation at the console.   - Type **run** to begin simulation at the console.
-  - Once you reach the interactive terminal, login using the username "root" and the password "cmu".  Once you are at the simulated command prompt, we are now ready to save our first checkpoint. +  - Once you reach the interactive terminal, login using the username "root" and the password "cmu".  Once you are at the simulated command prompt, you are ready to save the first checkpoint. 
-  - Hit **CTRL-C** in the Simics console, and type **write-configuration ~/checkpoints/after-boot-4cpu**. +  - Hit **CTRL-C** in the Simics console, and type **write-configuration ~/images/after-boot-4cpu**.
  - Type **quit** to exit out of Simics.   - Type **quit** to exit out of Simics.
-  - To load up your checkpoint again, type **../../scripts/start-simics**.  Once you are at the Simics console, type **read-configuration ~/checkpoints/after-boot-4cpu**.  You should see your simulated terminal re-appear where you last left it.+  - To load up your checkpoint again, type **../../scripts/start-simics**.  Once you are at the Simics console, type **read-configuration ~/images/after-boot-4cpu**.  You should see your simulated terminal re-appear where you last left it.
\\ \\
Line 66: Line 66:
Within the tarball, there are two source files: **counter.c**, **spinlock.c**. These two files have already been precompiled using a SPARC compiler and can be executed within the target machine. In the next step, we will implement the steps needed to move these files into the simulated target system. First, you will need to acquire the {{:documentation:simicsfs.iso.zip|simicsfs.iso}} file, which contains a cdrom image of the Simics files to facilitate target-to-host file transfers.  Within the tarball, there are two source files: **counter.c**, **spinlock.c**. These two files have already been precompiled using a SPARC compiler and can be executed within the target machine. In the next step, we will implement the steps needed to move these files into the simulated target system. First, you will need to acquire the {{:documentation:simicsfs.iso.zip|simicsfs.iso}} file, which contains a cdrom image of the Simics files to facilitate target-to-host file transfers. 
-  - Start up a checkpoint that was saved out from the previous section (e.g., ~/simics-3.0.22/scripts/start-simics ~/checkpoints/after-boot-4cpu).  At the Simics console, type **new-file-cdrom simicsfs.iso** (make sure you started simics in the direcotry that contains the simicsfs.iso file, otherwise type in the full path of the simicsfs.iso file)+  - Start up a checkpoint that was saved out from the previous section (e.g., ~/simics-3.0.22/scripts/start-simics ~/images/after-boot-4cpu).  At the Simics console, type **new-file-cdrom ~/simics-3.0.22/simicsfs.iso** (make sure you started simics in the direcotry that contains the simicsfs.iso file, otherwise type in the full path of the simicsfs.iso file)
  - Then type **cd0.insert iso0**   - Then type **cd0.insert iso0**
-  - Type **c** to begin simulating at the console.  You may need to wait a few minutes until the simulated cdrom drive has loaded the image.+  - Type **run** to begin simulating at the console.  You may need to wait a few minutes until the simulated cdrom drive has loaded the image.
  - Once you have done this, navigate to **/cdrom/cdrom0** within the target machine. You will see several files named **mount_simicsfs** and **simicsfs-sol***.   - Once you have done this, navigate to **/cdrom/cdrom0** within the target machine. You will see several files named **mount_simicsfs** and **simicsfs-sol***.
  - Type the following commands below:   - Type the following commands below:
Line 81: Line 81:
</code> </code>
-  * Inside the vfstab file, add a new line to the very end (with each entry tab-delimited):+  * Inside the vfstab file, add a new line to the very end (with each entry **tab-delimited**):
<code> <code>
simicsfs  -  /host  simicsfs  -  no  - simicsfs  -  /host  simicsfs  -  no  -
Line 88: Line 88:
  * Hit ESC and type **:wq** to save the file and exit.   * Hit ESC and type **:wq** to save the file and exit.
  * Type **mkdir /host**   * Type **mkdir /host**
-  * This is usually a good time to save out a checkpoint right before you mount the host file system.  At the Simics console, type **CTRL-C** followed by something like **write-configuration <ckpt_dir>/<your_checkpoint_name_b4_sfsmount>** +  * This is usually a good time to save out a checkpoint right before you mount the host file system.  At the Simics console, type **CTRL-C** followed by something like **write-configuration ~/images/b4_sfsmount** 
-  * Type **c** at the Simics console to resume.+  * Type **run** at the Simics console to resume.
  * Within the simulated console, type **mount /host**   * Within the simulated console, type **mount /host**
  * Type **ls /host** to see the underlying host machine's root directory   * Type **ls /host** to see the underlying host machine's root directory
-At this point, you should place the microbenchmark files somewhere on the host machine and copy them over to the target machine. Save out a NEW checkpoint called **~/checkpoints/benchloaded** and quit out of Simics. Now open the checkpoint you saved with vi by typing **vi ~/checkpoints/benchloaded**and locate and delete the following lines:+At this point, you should copy microbenchmark files from **/host/home/pf_user/tutorial_files/microbenchmarks** into the target machine (by copying it from **/host** to a location on the simulated disk). Save out a NEW checkpoint called **~/images/benchloaded** and quit out of Simics. Now open the checkpoint you saved with vi by typing **vi ~/images/benchloaded** and locate and delete the following lines:
<code> <code>
Line 108: Line 108:
In this next section, we will create a Simics script that will allow us to detect breakpoints inserted within our application in order to stage the workload.  A breakpoint (also known as a 'magic breakpoint' in Virtutech parlance) is simply a predefined assembly instruction inlined into your code. This instruction usually has no effect (e.g., a write to register 0) but is recognized by Simics.  You can take a look at all the magic breakpoint instructions within the **magic-instruction.h** file within the microbenchmarks tarball downloaded earlier. In this next section, we will create a Simics script that will allow us to detect breakpoints inserted within our application in order to stage the workload.  A breakpoint (also known as a 'magic breakpoint' in Virtutech parlance) is simply a predefined assembly instruction inlined into your code. This instruction usually has no effect (e.g., a write to register 0) but is recognized by Simics.  You can take a look at all the magic breakpoint instructions within the **magic-instruction.h** file within the microbenchmarks tarball downloaded earlier.
-  - Create a new Simics script called break.simics and fill it in with this:+  - You can see how the source code inserts the magic instructions by looking at **spinlock.c** 
 +  - Create a new Simics script called break.simics and fill it in with this: (this should already be available for you under ~/simics-3.0.22/targets/serengeti)
<code> <code>
@def hap_callback(user_arg, cpu, arg): @def hap_callback(user_arg, cpu, arg):
Line 118: Line 119:
@SIM_hap_add_callback("Core_Magic_Instruction", hap_callback, None) @SIM_hap_add_callback("Core_Magic_Instruction", hap_callback, None)
-read-configuration ~/checkpoints/benchloaded+read-configuration ~/images/benchloaded
</code> </code>
-  - Launch Simics by typing **start-simics break.simics**+  - Launch Simics by typing **../../scripts/start-simics break.simics**
  - Within the simulated console, navigate to the directory where you copied over the microbenchmark files.   - Within the simulated console, navigate to the directory where you copied over the microbenchmark files.
  - Type: **./spinlock 4 1000000000 10 10 0**  (this indicates we want 4 threads and run for effectively an infinite number of iterations)   - Type: **./spinlock 4 1000000000 10 10 0**  (this indicates we want 4 threads and run for effectively an infinite number of iterations)
  - Simics should immediately break to the console and output **Entered main()**   - Simics should immediately break to the console and output **Entered main()**
-  - Typing **c** again will break once the first thread reaches the beginning of its handler +  - Type **run** again and wait until the first thread starts to execute and triggers the magic breakpoint 
-  - You can see how the source code inserts the magic instructions by looking at **spinlock.c** +  - Save a final checkpoint by typing **write-configuration ~/images/spinlock** 
-  - **Save out a final checkpoint by typing **write-configuration ~/checkpoints/spinlock****+  - **FINAL STEP** (to prepare the checkpoint we will be using in the ProtoFlex part of the tutorial).  This final step is needed to maximize the performance of the underlying simulated I/O system. Simics is typically the initiator of DMA transactions, which occur at some bulk-sized granularity.  This granularity is set by default to a very low value (64 Bytes) in default Simics checkpoints.  Since Simics is a software-based simulator, issuing many small bulk transfers imposes no simulation overhead.  In our system, large bulk transfers are far more desirable. To change this default setting, you will need to **EDIT** the checkpoint file and make one small change. Copy theType the following commands: 
 +<code> 
 +write-configuration ~/checkpoints/final 
 +quit 
 +perl -pi -e 's/dma_block_size: 64/dma_block_size: 8192/' ~/checkpoints/final 
 +</code>
\\ \\
Line 134: Line 140:
======3. Working with Flexus====== ======3. Working with Flexus======
-From the workload we just created, you will get to chance to run some sample jobs with Flexus and create a Flexpoint library. By this point you should have a valid initial checkpoint stored as **~/checkpoints/spinlock**.+From the simics checkpoint you just created, you will get a chance to run some sample jobs with Flexus. By this point you should have a valid initial checkpoint stored as **~/images/spinlock**.
-  - Before starting, you should create a few initial directories in the home (which we will explain in the next steps): +  - Before starting, you should have a few initial directories in the home (which we will explain in the next steps):<code> 
-<code> +~/ckpts 
-mkdir ~/checkpoints +~/specs
-mkdir ~/images +
-mkdir ~/specs+
</code> </code>
- +  - We created the **ckpts** and **specs** directory in your home. 
-  - The flexus simulator is stored as ~/tutorial_files/flexus_tutorial.tgz.  Copy this file to your home directory and extract the tarball.  You should have a directory called ~/flexus.+  - The flexus simulator is stored as ~/tutorial_files/flexus_v4.
====Getting familiar with the run_job script==== ====Getting familiar with the run_job script====
Line 149: Line 153:
++++CLICK - Expand/Collapse| ++++CLICK - Expand/Collapse|
-The run_job script should be run from the ~/flexus/ directory and requires for the home directory of the user to contain a .run_job.rc.tcl file.  Additionally, a **~/specs** directory must contain at least an interactive job configuration. +The run_job script should be run from the ~/tutorial_files/flexus_v4 directory and requires for the home directory of the user to contain a .run_job.rc.tcl file.  Additionally, a **~/specs** directory must contain at least a job configuration (We placed a job configuration in this directory)
-  - Copy the example RC file from ~/flexus/scripts/.run_job.rc.tcl into ~/ +  - Copy the example RC file from **~/tutorial_files/flexus_v4/scripts/.run_job.rc.tcl** into **~/** 
-  - Create a ~/specs/interactive/ directory and place a user-preload.simics file there (empty file is OK for the tutorial) +  - Execute the **run_job** script from the ~/tutorial_files/flexus_v4 directory to confirm correct setup (the command-line help will be displayed when the prerequisites are met)
-  - Execute the run_job script from the ~/flexus/ directory to confirm correct setup (the command-line help will be displayed when the prerequisites are met)+
The .run_job.rc.tcl file contains "rungen" sections with directives for each workload.  When executing run_job, the rungen is selected with the "-run" parameter.  Typical rungens are "phase" for phase generation, "flexpoint" for flexpoint generation, "trace" for functional simulation jobs, and "timing" for the detailed cycle-accurate simulations. The .run_job.rc.tcl file contains "rungen" sections with directives for each workload.  When executing run_job, the rungen is selected with the "-run" parameter.  Typical rungens are "phase" for phase generation, "flexpoint" for flexpoint generation, "trace" for functional simulation jobs, and "timing" for the detailed cycle-accurate simulations.
The run_job script has already been configured for you. The run_job script has already been configured for you.
-  * Take a look at the various paths and options that are specified in it by examining the ~/flexus/scripts/global.run_job.rc.tcl file.+  * Take a look at the various paths and options that are specified in it by examining the ~/tutorial_files/flexus_v4/scripts/global.run_job.rc.tcl file.
Flexus scripts expect a specific directory hierarchy for the checkpoints. Flexus scripts expect a specific directory hierarchy for the checkpoints.
Line 167: Line 170:
Before we proceed with creating Flexus-compatible checkpoints, there are a number of post-processing steps needed that must be performed directly on the Simics checkpoint we created earlier. A Simics script is to provide these steps. Before we proceed with creating Flexus-compatible checkpoints, there are a number of post-processing steps needed that must be performed directly on the Simics checkpoint we created earlier. A Simics script is to provide these steps.
  * load the initial checkpoint in Simics (using the **start-simics** script)   * load the initial checkpoint in Simics (using the **start-simics** script)
-  * simics> **read-configuration ~/checkpoints/spinlock** +  * simics> **read-configuration ~/images/spinlock** 
-  * simics> **run-command-file ~/flexus/scripts/create_mem_and_io_proxy.simics**+  * simics> **run-command-file ~/tutorial_files/flexus_v4/scripts/mem_io_proxy.simics**
  * simics> **write-configuration ~/ckpts/spinlock/baseline/phase_000/simics/phase_000**   * simics> **write-configuration ~/ckpts/spinlock/baseline/phase_000/simics/phase_000**
To verify that the basic run_job settings are correct and that the spinlock workload is properly set up, use run_job to launch Simics with the spinlock workload (NONE indicates that no Flexus simulator library should be loaded): To verify that the basic run_job settings are correct and that the spinlock workload is properly set up, use run_job to launch Simics with the spinlock workload (NONE indicates that no Flexus simulator library should be loaded):
-  * **~/flexus/scripts/run_job NONE spinlock**  (error message about "flexus" missing is OK)+  * **~/tutorial_files/flexus_v4/scripts/run_job NONE spinlock**  (error message about "flexus" missing is OK)
Add configuration for the "spinlock" benchmark to the "trace" rungen of ~/.run_job.rc.tcl Add configuration for the "spinlock" benchmark to the "trace" rungen of ~/.run_job.rc.tcl
Line 178: Line 181:
  * configure statistics region interval at 50000000 (50M) cycles   * configure statistics region interval at 50000000 (50M) cycles
-Run a "spinlock" trace job with TraceCMPFlex. +Run a "spinlock" trace job with CMP.L2Shared.Trace 
-  * Example trace configuration can be found in the **scripts/trace/user-*load.simics** files+  * **run_job -run trace -cfg 4cores -local CMP.L2Shared.Trace spinlock** 
-  * **~/flexus/scripts/run_job -run trace -cfg test_cfg_trace -local TraceCMPFlex spinlock** + * Explanation of "local": -local requests to run a batch of jobs locally.  without -local an interactive run is assumed which waits at the simics> prompt instead of running. 
-   * Explanation of "local": -local requests to run a batch of jobs locally.  without -local an interactive run is assumed which waits at the simics> prompt instead of running.  + * Explanation of "remote": -remote will submit jobs to a remote cluster (e.g., Condor, PBS, etc...) [not available for the tutorial]
-   * Explanation of "remote": -remote will submit jobs to a remote cluster (e.g., Condor, PBS, etc...) [not available for the tutorial].+  * You can **run** simulation, interrupt it with **ctrl+c**, and change debug severity with **flexus.debug-set-severity iface**.
++++ ++++
-====Displaying statistics through the stat-manager tool====+====Displaying statistics with the stat-manager tool====
++++CLICK - Expand/Collapse| ++++CLICK - Expand/Collapse|
Find the run directory for the trace job in ~/results/ and examine the resulting statistics database: Find the run directory for the trace job in ~/results/ and examine the resulting statistics database:
-  * **~/flexus/stat-manager/stat-manager list-measurements**+  * **~/tutorial_files/flexus_v4/stat-manager/stat-manager list-measurements**
  * See the cache hit/miss statistics, branch predictor stats, and instruction mix breakdown.   * See the cache hit/miss statistics, branch predictor stats, and instruction mix breakdown.
-    * **~/flexus/stat-manager/stat-manager print "Region 000" | less** +    * **~/tutorial_files/flexus_v4/stat-manager/stat-manager print 'Region 000' | less** 
-    * **~/flexus/stat-manager/stat-manager print "Region 001" | less**+    * **~/tutorial_files/flexus_v4/stat-manager/stat-manager print 'Region 001' | less**
  * By default, stat-manager aggregates statistics across all cores.  You can override this behavior with the -per-node flag.   * By default, stat-manager aggregates statistics across all cores.  You can override this behavior with the -per-node flag.
-    * **~/flexus/stat-manager/stat-manager -per-node print "Region 001" | less**+    * **~/tutorial_files/flexus_v4/stat-manager/stat-manager -per-node print 'Region 001' | less**
++++ ++++
 +
 +====Running timing simulations====
 +++++CLICK - Expand/Collapse|
 +Run a "spinlock" timing job with CMP.L2SharedNUCA.OoO
 +  * **run_job -run timing -cfg 4cores -ma CMP.L2SharedNUCA.OoO spinlock**
 +  * NOTE: When running timing simulations, one must pass the **-ma** parameter to Simics.
 +  * You can **run** simulation, interrupt it with **ctrl+c**, and change debug severity with **flexus.debug-set-severity iface**, **run 10** will run 10 cycles on all CPUs.
 +  * Rebuild the simulator with **vverb** debug output (CMP.L2SharedNUCA.OoO-vverb) and try running simulation with **flexus.debug-set-severity vverb** to see the detailed debug output.
 +++++
 +
 +======4. Using Statistical Sampling with Flexus======
====Creating a flexpoint library==== ====Creating a flexpoint library====
Line 201: Line 215:
++++CLICK - Expand/Collapse| ++++CLICK - Expand/Collapse|
  * Configure the "flexpoint" rungen in .run_job.rc.tcl to create 20 flexpoints, spaced 200000 (200K) instructions apart.   * Configure the "flexpoint" rungen in .run_job.rc.tcl to create 20 flexpoints, spaced 200000 (200K) instructions apart.
-  * **~/flexus/scripts/run_job -ckpt-gen -postprocess "$HOME/flexus/scripts/postprocess_ckptgen.sh flexpoint 20 mystate" -local -cfg test_cfg_trace -run flexpoint TraceCMPFlex spinlock**+  * **~/tutorial_files/flexus_v4/scripts/run_job -ckpt-gen -postprocess "$HOME/tutorial_files/flexus_v4/scripts/postprocess_ckptgen.sh flexpoint 20 mystate" -local -cfg 4cores -run flexpoint CMP.L2Shared.Trace spinlock**
    * **-ckpt-gen** ensures that state is written out at the end of simulation     * **-ckpt-gen** ensures that state is written out at the end of simulation
    * **-postprocess** specifies the script to run after each job     * **-postprocess** specifies the script to run after each job
Line 213: Line 227:
++++CLICK - Expand/Collapse| ++++CLICK - Expand/Collapse|
Add configuration for the "spinlock" benchmark to the "timing" rungen of ~/.run_job.rc.tcl Add configuration for the "spinlock" benchmark to the "timing" rungen of ~/.run_job.rc.tcl
-  * configure simulation to stop at 15000 (15K) cycles +  * configure simulation to stop at 150000 (150K) cycles 
-  * configure statistics region interval at 5000 (5K) cycles+  * configure statistics region interval at 50000 (50K) cycles
-Run a "spinlock" timing job with CMPFlex.OoO+Run a "spinlock" timing job with CMP.L2SharedNUCA.OoO. 
-  * Example timing configuration can be found in the scripts/timing_v9/user-*load.simics files+  * **~/tutorial_files/flexus_v4/scripts/run_job -run timing -cfg 4cores -local -ma -state mystate CMP.L2SharedNUCA.OoO spinlock**
-  * **~/flexus/scripts/run_job -run timing -cfg test_cfg_timing -local -ma -state mystate CMPFlex.OoO spinlock**+
  * NOTE: When running timing simulations, one must pass the **-ma** parameter to Simics.   * NOTE: When running timing simulations, one must pass the **-ma** parameter to Simics.
  * NOTE: Don't forget to specify **-state** to load the microarchitectural state created with the trace simulator, otherwise each flexpoint is run from cold microarchitectural state, severely biasing the results!   * NOTE: Don't forget to specify **-state** to load the microarchitectural state created with the trace simulator, otherwise each flexpoint is run from cold microarchitectural state, severely biasing the results!
Line 225: Line 238:
  * Notice much more detailed statistics for timing simulator compared to the trace simulator.   * Notice much more detailed statistics for timing simulator compared to the trace simulator.
  * Find the IPC of some of the flexpoints' results using stat-manager:   * Find the IPC of some of the flexpoints' results using stat-manager:
-    * **~/flexus/stat-manager/stat-manager format-string "<EXPR:{Nodes-uarch-TB:User:Commits:Busy}/({Nodes-uarch-TB:User:AccountedCycles}+{Nodes-uarch-TB:System:AccountedCycles})>" "Region 001"**+    * **~/tutorial_files/flexus_v4/stat-manager/stat-manager format-string "<EXPR:{Nodes-uarch-TB:User:Commits:Busy}/({Nodes-uarch-TB:User:AccountedCycles}+{Nodes-uarch-TB:System:AccountedCycles})>" "Region 001"**
The default postprocess.sh script (which runs after each job if a -postprocess override is not specified) automatically creates a stats_db.out.selected.gz file that contains only statistics between 100K and 150K instructions. The default postprocess.sh script (which runs after each job if a -postprocess override is not specified) automatically creates a stats_db.out.selected.gz file that contains only statistics between 100K and 150K instructions.
Use stat-sample to combine all the stats_db.out.selected.gz files into a single statistics file. Use stat-sample to combine all the stats_db.out.selected.gz files into a single statistics file.
-  * **~/flexus/stat-manager stat-sample stats_db.out.gz */stats_db.out.selected.gz**+  * **~/tutorial_files/flexus_v4/stat-manager/stat-sample stats_db.out.gz */stats_db.out.selected.gz**
  * Examine the resulting stats_db.out.gz file that contains the combined results of all flexpoints.   * Examine the resulting stats_db.out.gz file that contains the combined results of all flexpoints.
  * Examine the IPCs of the various flexpoints:   * Examine the IPCs of the various flexpoints:
Line 236: Line 249:
  * If bringing UIPCs into Excel, compute =STDEV() and =CONFIDENCE() for 95% confidence.   * If bringing UIPCs into Excel, compute =STDEV() and =CONFIDENCE() for 95% confidence.
  * Bring time Breakdowns into Excel:   * Bring time Breakdowns into Excel:
-    * **~/flexus/stat-manager/stat-manager print sum | grep ":Bkd:" > breakdown.tsv**+    * **~/tutorial_files/flexus_v4/stat-manager/stat-manager print sum | grep ":Bkd:" > breakdown.tsv**
    * Use Excel to split data by the colon (":") character into columns.  Apply the Pivot Chart feature to plot the time breakdown.     * Use Excel to split data by the colon (":") character into columns.  Apply the Pivot Chart feature to plot the time breakdown.
\\ \\