18-348 Lab #4

Spring 2013

NOTE: Lab 4 consists of two components (Lab 4 Part A and Lab 4 Part B).

Relevant lectures:
- Part A: Lecture 6. Embedded Language Use
- Part B: Lecture 7. Coding Tricks; Multiprecision Math; Reviews

NOTE for Lab Part A: Recall from lecture that neither the one's nor two's checksums can detect all two-bit errors. Whether the checksums can detect the errors depends on BOTH the data values (string) and which bits are flipped. For example, try the string "ECE348" to see the difference in detection capability between one's and two's checksums.

NOTE: The HC12 compiler manual can be confusing with regards to calling conventions. Functions with a fixed number of parameters use the Pascal calling convention, which is pushing parameters left to right and caller removing parameters from stack. The C calling convention (which the documentation notes is pushed right to left) is only used for functions with a variable number of parameters, which we don't use in this lab. Our compiler pushes parameters from left to right. When in doubt, trust what the compiler does, not what the compiler manual says it does.

Links to all files referenced in the lab and prelab can be found in the Files section at the end of this document. You might wish to read the lab assignment before starting work on the Pre-lab to help with understanding how to link C to assembly language.


Pre-Lab 4 - Part A:

Goal:

To learn programming patterns for embedded systems and interactions between C and assembly programming.

Discussion:

Bitwise operations in C:

In this lab, you will practice using C language constructs and operators to do bitwise operations.  The most common language constructs used to implement bitwise operations are:

Operator (in context)
 Usage    Description
  &
 a & b    bitwise AND two values
 |
 a | b    bitwise OR two values
 ~
 ~a   bitwise invert a value
 ^
 a ^ b   bitwise XOR two values
 <<
 b << n   shift the bits in b to the left by n bits (the upper n bits of b are lost).
 >>
 b >> n   shift the bits in b to the right by n bits (the lower n bits of b are lost).
 The shift is an arithmetic for signed integers, but it is a logical shift right for unsigned integers.

The bitwise AND, OR, and INVERT (&, |, ~) should be distinguished from the logical AND, OR, and NOT (&&, ||, !).  The logical operations produce boolean results where any zero value is treated as FALSE and any non-zero value is treated as TRUE.  The output of a logical operation a C program conforming to standards is either zero or one. The following examples illustrate this:

a = 0xA0;
b = 0x05;
c = a | b;     //bitwise operation: c = 0xA5

d = a || b;    //logical operation: d = 1
e = a & (~b);  //bitwise operation: e = 0xA0
f = a && (!b); //logical operation: f = 0

In general, it is considered bad practice to mix logical and bitwise operations in the same C statement or same line of code because of the potential for confusion.

These operators can be combined to set and clear specific bits in a value.  For example:

Statement
Function
b = b & 0xF0;
clear the lower four bits of b (AND with zero is always zero)
b = b | 0x0F;
set the lower four bits of b (OR with one is always one)
b = b & (~a);
for each bit set to a 1 in a, clear that bit in b (INVERT ones to zeros and then AND); leave other bits in b as they are.

Writing C call stack compatible assembly subroutines

To correctly write an assembly subroutine that interfaces with C code, you must consider each of the following aspects of the call.  The list below refers to the example function "unsigned int foo(unsigned int bar, unsigned char bar2, unsigned int bar3)".  For more information, refer to "Call Protocol and Calling Conventions on page 526 of the HC12 Compiler Reference Manual.

Procedure:

Part 1:

  1. Start a new C project using the HC(S)12 Project Wizard.  Be sure to include the Full-Chip Simulator as a Target.  Replace the main.c file with prelab_4a_prog1.skeleton.c.
  2. Replace the string value in data with the appropriate string for your lab section and group.
  3. Implement the setBit and clearBit functions.  Use them to invert the case of the string (e.g. "DemO" becomes "dEMo").
  4. Implement the countBits function.
  5. Run your program in the simulator.  Use the memory window to locate data where it has been placed on the stack.  Observe that the program correctly inverts the case.
  6. Record the data required to answer question 1 below.
  7. Hand in program as prelab_4a_prog1_XX.c where "XX" is your Andrew ID.

Part 2:

  1. Download the lab_4a_c_asm.zip file.  Extract the project and open it in Code Warrior. Append your andrewID to the three prelab_4a_prog2* filenames. Be sure to edit the #include in prelab_4a_prog2.c to point to the new header filename
  2. Add the following function prototypes to the prelab_4a_prog2_asm.h file.
  3. Open prelab_4a_prog2.c.  Modify the calls to bitReverse and addALot to the appropriate value for your lab group.
  4. Open prelab_4a_prog2.asm. 

Part A - Questions:

  1. Enter the information observed from Part 1 prelab_4_prog1_gxx.c in the table below:
    Parameter
    Value                            
    Original String in data
             
    Bit count (number of "1" bits)

    Address in memory where byte 0 of data is located.

    Address in memory where byte 4 of data is located.

    Hexadecimal representation of the original values in data (before case conversion)

    Hexadecimal representation of data after case conversion

  2. Draw the stack frame for the call to addALot just after the subroutine call is made (just after the BSR or JSR completes execution).  Indicate what each value represents (e.g. "val1 low byte, val1 high byte" etc).  Indicate the location of any function parameters not located on the stack.
  3. Enter the information observed from Part 2 in the table below:
    Parameter
    Value
    ASCII character for call to bit reverse

    Hexadecimal value of result from reverse value

    Hexadecimal value of the result from call to addALot

  4. Bonus: Write a bit count subroutine in assembly language that uses a loop (not a lookup table) to count the bits in an 8-bit integer register. It must use a loop to do the actual counting. In this loop, you are only allowed to use the following three instructions: Branch if Not Equal to Zero (BNE), Add with carry, and Logical Shift Right (but not necessarily in that order). You can use whichever variants of these instructions makes sense depending on the registers you use (e.g., LSR, LSRA, LSRB, LSRD are all acceptable Logical Shift Right variants). Other instructions can be used before and after the loop, but only these three may be used within the loop that counts bits. When the loop terminates, some register has the number of bits. Write a test program in C to confirm that the subroutine works properly. You will only receive credit for meeting all the requirements of this question -- no partial credit. If you use an instruction other than the three specified within the loop -- then no credit for you!

Pre-Lab 4 - Part B:

Goals:

Discussion:

Refer to the lecture notes for information on integer division.

Procedure:

Part 1:

Signed 4-Bit Integer (decimal representation)
Signed 4-bit Integer (binary representation
Integer Division (discard remainder)
i/2 (decimal representation)
i/2 (binary representation)
7
0111
3
0011
6



5



4



3



2



1



0
0000


-1
1111
0
0000
-2



-3



-4



-5



-6



-7



-8
1000
-4
1100

Part 2:

  1. Create a new assemble project using the project stationery.  Download the prelab_4b_skeleton.asm file and rename it prelab_4b_gXX_andrewid.asm.  Replace the main file with prelab_4b_gXX_andrewid.asm.
  2. The main section of the code is marked off with comments that tell you not to modify the code.  DO NOT MODIFY THIS CODE.  Any modification of the code in this section will result in no credit being given for this part of the assignment.  You are allowed to modify the code in the divByTwo subroutine.
  3. Use the values in the table to develop a short description of the algorithm needed to implement a 16-bit signed divide by 2. (Note that you should use shift and bit set / test instructions; no variation of the DIV instruction is allowed.) Your description should be less than 100 words.
  4. Implement the divByTwo subroutine using the algorithm you described above.
  5. Set the target for the project to the full-chip simulator.  Run the simulator.
  6. Step through the code to get a feel for what it is doing.  The code runs the divByTwo subroutine, and does the same operation using the IDIVS instruction, then compares the results.
  7. In the simulator, set a breakpoint at BP1 and BP2.  Run the simulation.  If the simulation reaches the BP2 breakpoint without reaching the BP1 breakpoint, then your divByTwo performs the expected function for all values between 0x0000 and 0xFFFF.

Part 3:

Write, but DO NOT SIMULATE and DO NOT DEBUG code for this section. We EXPECT that your hand-in will have bugs, and do NOT expect that it will run properly! But it should be syntactically correct so that it will "make" and so that all the parts have something that is "close" to right. (We expect you will make a good faith attempt to actually write a reasonable program, and not submit something laughable.) Just to be clear -- your grade will be based on whether the code looks like an OK first draft with bugs, is commented properly, and so on, but NOT on whether it is bug-free! There might be at most one person in the class who can write bug-free code without executing something, but probably not. So don't worry about it.

Each partner will implement one distinct version of 32-bit x 32-bit -> 64-bit unsigned multiply. You should choose from the three methods described in the lecture materials:

  1. Shift-and-add
  2. Partial sums using built-in MUL functions
  3. Partial sums using table lookups for multiply.

Note: You will end up doing all three implementations in the lab, but you are only required to do two (one for each partner) for the pre-lab. This means each partner must do a different implementation. Do this work independently from your partner. You will use this code to conduct reviews in the Lab section.

  1. Plan on your board being wired as described in Part A of Lab 4.
  2. Create a new assembly project using the project stationery.  Download the lab_4b_skeleton.asm file and rename it lab_4b_gXX_andrewid.asm.  Replace the main file in the project with this file.
  3. Implement one of the above mul64 subroutines to meet the following requirements:

Additionally,

Part B - Questions:

  1. Complete the table from part 1 and hand it in.
  2. Include a description of your signed divide-by-two algorithm (from part 2).  Limit your description to 100 words.
  3. Bonus:   Give the number of instruction cycles for the IDIVS instruction.  Give the number of instruction cycles for the longest path through your divByTwo subroutine (include all cycles from just before the BSR to just after the RTS). Which is faster and by how much?

Prelab Hand-in Checklist: (90 + 18 points)

All non-code submissions shall be in a single PDF document.

Part A

Part B

  1. (20 pts) Answers the questions 1 & 2 above.
  2. (BONUS 9 points) Answer the bonus question
  3. (15 pts) Submit the prelab_4b_gXX_andrewid.asm file.  Code must be fully commented for full credit. Code should work properly and be bug-free.
  4. (10 pts) Submit the lab_4b_gXX_andrewid.asm file.  Code must be fully commented for full credit. Code is EXPECTED to have bugs and you will not be penalized for them.

Refer to the LAB FAQ for more information on lab hand-in procedures and file type requirements.  You MUST follow these procedures or we will not accept your submissions.


Lab 4 - Part A

Goal:

To practice combining C code with assembly using the HC12 compiler.

Discussion:

Mixing C and Assembly with the HC12 Compiler

This section discusses the techniques for mixing C and assembly.  Remember that the stack frame for a subroutine call represents a contract between the calling code and the subroutine code.  In this case, the compiler has a specific format for the stack frame.  In order to write compatible C code, you must make sure that your code conforms to this format.

There are numerous compiler options and PRAGMA options that can be used to modify the behavior of the compiler with respect to call stacks.  A full discussion is beyond the scope of this course.  The discussion below and the lab assignments refer to the compiler behavior using the default settings.

To have the CodeWarrior environment integrate C and assembly:

  1. Create a new project using the HC(S)12 New Project Wizard. 
  2. Select both C and Assembly
  3. Follow normal procedures for selecting the rest of the wizard options

This procedure gives you 3 source files, which are described below.

Note:  a function defined using the __far directive should return with RTC (3-byte return value).  A full discussion of this is beyond the scope of this course.  For the labs, assume all functions are called using __near, so they use RTS to return.

To correctly write an assembly subroutine that interfaces with C code, you must consider each of the following aspects of the call.  The list below refers to the example function "unsigned int foo(unsigned int bar, unsigned char bar2, unsigned int bar3)".  For more information, refer to "Call Protocol and Calling Conventions on page 526 of the HC12 Compiler Reference Manual

Checksum Computation

A checksum is an error detection code used by many different embedded and enterprise applications.  It is commonly used to provide redundancy for network messages and data storage.  On networks, it allows the receiver to check for transmission errors.  In the case of storage, it allows a system to verify that the stored data has not changed (e.g. due to file system corruption or soft errors in memory).

To check the correctness of a message + checksum pair, the system recomputes the checksum and compares it to the recorded one.  If the two checksums do not match, then the system knows that there is an error somewhere in the message.  If the two checksums do match, then the message is presumed to be correct.  Note that just because the message appears to be consistent with the checksum does not guarantee that the message is the same as the original one.  With all checksums, it is possible to get errors that modify the message or the stored checksum in such a way that they are still consistent.  This is called an undetected error. Note that the error detection provided by checksums depends on both the value being checked (i.e., the number of errors) as well as the location of the errors (e.g., some 2-bit errors may be caught while others are undetected due to their location). This effect will be seen in the lab.

A two's complement checksum is computed by simply doing integer addition on each "chunk" of data in a set of data. For our lab, this means doing an integer addition of all the characters in a data string using 8-bit addition. Overflows are ignored, and the 8-bit result of the addition is the checksum. This checksum has the nice property of detecting all one-bit errors in the data, and many other errors as well. But, some two-bit errors are undetected.

A one's complement checksum is computed similarly, but using one's complement arithmetic (remember that from 18-240?). To refresh your memory, in one's complement arithmetic, the value "$FF" treated as equal to the value "$00" -- they are both zero. So, when performing addition, you need to check whether the sum will cross over the "$FF" to "$00" boundary, and add one if it does so that both representations of zero end up being equivalen in value. This can be done with a conditional branch that checks whether either of the following two conditions holds true for signed values and adds one to the resultant sum whenever either condition is met:

In general, Cyclic Redundancy Codes (CRCs) provide much stronger error detection properties than arithmetic checksums.  A full discussion of the details of the CRC algorithm is beyond the scope of this course, and the code is a little too complex for this lab. But they are similar to other checksums in that they involve "summing" up values across the length of multiple bytes or words of data. We put this note here simply so that you do not think that a one's complement checksum is the best you can do!

In your lab, you should repeat the computation for each byte in the string, starting with a value of zero and the first byte, ending with the last non-zero byte of the string.  (This means that you should initilize the checksum value to 0 before processing the first byte of the message).

Reference values to help you test your programs -- make sure you get these results!

Input A Input B Two's complement A+B One's complement A+B
$FF $FF $FE $00
$FE $83 $81 $82
$75 $A7 $1C $1D
$B3 $56 $09 $0A
$36 $42 $78 $78
$00 $00 $00 $00
$FF $00 $FF $00

String
Two's complement checksum One's complement checksum
Bert Ernie
$A0 $A3
Ray Koopman
$21 $25

Procedure:

Part 1:

  1. Wire your board with port T as output and port AD as input according to the following table:
    MCU Pin
    Project board connection
    Port Configuration
    AD0
    PB1
    input
    AD1
    PB2
    input
    AD2
    PB3
    input
    AD3
    PB4
    input
    AD4
    PB5
    input
    AD5
    PB6
    input
    AD6
    PB7
    input
    AD7
    PB8
    input
    PT0
    LED1
    output
    PT1 LED2 output
    PT2 LED3 output
    PT3 LED4 output
    PT4 LED5 output
    PT5 LED6 output
    PT6 LED7 output
    PT7 LED8 output
  2. Create a project with a C main program called lab_4a_gXX that will contain both C and assembly language files. Put your C code in the file "main.c" and your assembly code in the file "main.asm". The parts of the procedure below will guide you in creating a program that computes checksums in multiple ways. In the end, all the programs must co-exist in a single project (including the bonus if you choose to do it) with this single hand-in directory.
  3. Take a look at the questions before working on the other parts of this procedure so that you are sure to record the necessary data for the lab writeup.
  4. Create four 8-bit integer variables: TwoSumC, OneSumC, OneSumAsm, and OneSumOpt. Just initialize them to constants for now -- we'll tell you how to compute them below. Values shall be displayed uninverted (i.e., an "ON" LED is 1, and an "OFF" LED is 0). These button definitions will let you demo all capabilities of your program on a single string without recompiling.

Part 2:

  1. Comment out references to the main_asm() function (you'll use it later in part 4 and the bonus).
  2. Add the following declaration to main() function in main.c.  Replace LN1 and LN2 with the last names of the your group members.
  3. Implement an 8-bit two's complement checksum calculation using C. In the main program, call this function and put the result in the variable "TwoSumC". Compute the checksum over myString[] from the first character until (but not including) the null byte at the end of the string. Use the following prototype for your function:
    unsigned char chk_two_c(char * string);
  4. Run this program and record the hexadecimal output as displayed on the LEDs. Confirm that it is the correct value per hand computation. Also, use the simulator to compute the number of clock cycles taken by the subroutine chk_two_c from BSR/JSR to RTS.
  5. Add code to flip ("invert") the bottom bit in each of the first two characters when PB7 is pressed so that the value is corrupted to put an error in the value. (Re-iterating: this involves flipping bit 0 of the first byte, and bit 0 of the second byte, resulting in two bytes, each with a single-bit error in the lowest bit position.) Run the program and record the output. Did the checksum detect this error?
  6. Modify the program so that the top bit in each of the first two characters is flipped when PB8 is presssed, again putting an error in the value. (Re-iterating: this involves flipping bit 7 of the first byte, and bit 7 of the second byte, resulting in two bytes, each with a single-bit error in the highest bit position.) Run the program and record the output. Did the checksum detect this error? (It shouldn't detect the error -- the two flipped bits cancel each other out in terms of effect on the checksum. This is a shortcoming of two's complement addition checksums.)

Part 3:

  1. Implement an 8-bit one's complement checksum calculation using C. In the main program, call this function and put the result in the variable "OneSumC".
    unsigned char chk_one_c(char * string);.
  2. Run this program and record the hexadecimal output as displayed on the LEDs. Confirm that it is the correct value per hand computation. Also, use the simulator to compute the number of clock cycles taken by the subroutine chk_one_c from BSR/JSR to RTS.
  3. Use PB7 to to flip ("invert") the bottom bit in each of the first two characters so that the value is corrupted to put an error in the value. Run the program and record the output. Did the checksum detect this error?
  4. Use PB8 to flip the top bit in each of the first two characters, again putting an error in the value. Run the program and record the output. Did the checksum detect this error? (It should -- which is why one's complement checksums are usually better.)

Part 4:

  1. Write a new, similar, program that computes an 8-bit one's complement checksum using assembly language with the calling program in C. In the main program, call this function and put the result in the variable "OneSumAsm"
    unsigned char chk_one_asm(char * string);
  2. Run this program with the specified test string and record the hexadecimal output as displayed on the LEDs. Confirm that it is the correct value per hand computation. Also, use the simulator to compute the number of clock cycles taken by the subroutine chk_one_asm from BSR/JSR to RTS.
  3. Use PB7 to to flip ("invert") the bottom bit in each of the first two characters so that the value is corrupted to put an error in the value. Run the program and record the output. Did the checksum detect this error? (If not, fix the problem.)
  4. Use PB8 to flip the top bit in each of the first two characters, again putting an error in the value. Run the program and record the output. Did the checksum detect this error? (It should -- which is why one's complement checksums are usually better.)
  5. Verify that the assembly and C subroutines produce identical outputs for at least four more-or-less randomly chosen different additional strings.

Part 5: (Bonus)

Part A - Questions

  1. Record the results of your experiments above in the table below:
Routine
Checksum value with no bits flipped Checksum with two
bottom bits flipped
Checksum with two
top bits flipped
Part 2: TwoSumC
     
Part 3: OneSumC
     
Part 4: OneSumAsm
     
Part 5: (bonus) OneSumOpt
     
Routine
# of Cycles
Part 2: C two's complement

Part 3: C one's complement

Part 4: ASM one's complement
Part 5: (Bonus) optimized ASM

Part A - Demo Checklist: (20 + 4)

  1. (20 points) Demo your Checksum project to the TA.  The TA will ask you to run the program with a different string, and show the resultant computation values with various PB combinations pressed. The TA may also ask you to show a timing calculation with the simulator.
  2. (Bonus: 4 points) Demo your optimized Checksum project to the TA.

Lab 4 - Part B

Goal:

Discussion:

Refer to the lecture notes for information on multiprecision add/subtract/multiply. Refer to the lecture notes for information on reviews.

Procedure:

Part 1:

In Part one you will implement 64-bit add and subtract.

Implement both the add64 or sub64 subroutine to meet the following requirements:

Additionally, for both subroutines:

Part 2:

In this part, you will do a review of each of the multiplication code files (one generated by each team member).  For the review, both team members should be present.  The person whose code is being reviewed is the developer and the other person is the reviewer.  When you do the second review, these roles will be reversed.  Complete the information below.  You must submit a complete writeup for BOTH reviews.

It is understood that line numbers might change as the code is fixed -- don't worry about it and don't go back to fix up line #s if they change.

Part 3:

For this part, for both programs work together to get demos working. It is fine to collaborate on this portion of the lab and help each other with debugging, etc. Keep the following data as you do this on a per-program basis (i.e., two sets of information -- one set per program):


Part 4:

Implement the third version of 32-bit x 32-bit -> 64-bit multiply.

Part 5 (Bonus - optional):

This section is optional and not that easy.  You may do these exercises to earn extra credit and get better understanding of multiprecision math. If you are running over 12 hours per week on average for the course, you should NOT be attempting this section!

Implement a 64 bit dividend / 32 bit divisor=> 32 bit quotient; 32 bit remainder in assembly language. Implement one of either restoring or non-restoring division. Save the implementations as lab_4b_div_GXX.asm.

Part 6 (Bonus - optional):

Perform a review of your third multiplication implementation or your division implementation. Follow the formats used in Parts 2 and 3.

Part B - Demo Checklist: (35 + (5 or 10) points)

  1. (15 points)  Demo both the multiprecision add and subtract programs to the TA.
  2. (20 points)  Demo all three multiplication implementations.
  3. Bonus: (Either 5 points or 10 points) Demo one of the following: restoring division worth 5 points; OR non-restoring division worth 10 points.

Lab - Hand-in Checklist: (150 + 19 + (5 or 10))

Part A

  1. (5 points) List any problems you encountered in the lab and pre-lab, and suggestions for future improvement of this lab. If none, then state so to get these points.
  2. (40 points)  Submit the entire project for all parts.  Your project should be in a folder called "lab_4a_gXX".  All files necessary to open the project in code warrior and invoke the simulator must be present to receive full credit.  Code must be fully commented to receive full credit.
  3. (20 points) Answers to the questions above
  4. (13 points) Bonus -- provide code and fill in tables for questions for the optimized assembly version of one's complement checksum.

Part B

  1. (5 points) List any problems you encountered in the lab and pre-lab, and suggestions for future improvement of this lab. If none, then state so to get these points.
  2. (10 points) Submit a listing of the code for lab_4b_add_gXX.asm and lab_4b_sub_gXX.asm
  3. (30 points) Reviews for the both lab partners' code.
  4. (30 points) Corrected and working code for both lab partners, lab_4b_gXX_andrewID1.asm and lab_4b_gXX_andrewID2.asm.  Code must conform to the coding style sheet to receive full credit.
  5. (10 points) Submit lab_4b_mul3_gXX.asm with your 64bit multiply subroutine.
  6. (Either 5 or 10 points) Bonus -- Submit only one of the following: restoring division worth 5 points; OR non-restoring division worth 10 points
  7. (6 points) Bonus --  Submit a review, including review metrics as well as development metrics (using formats for parts 2 & 3) for the third implementation of 64-bit multiply or the 64-bit divide you developed .  3 pts for each DISTINCT implementation review, up to 6 total points.

Refer to the LAB FAQ for more information on lab hand-in procedures and file type requirements.  You MUST follow these procedures or we will not accept your submissions.


Hints and Suggestions:

Part A

Part B

FILES for this lab:

Part A

Part B

Relevant reading:

Also, see the course materials repository page.


Change notes for 2014: