Based on the program completed by last semester, the advance program would take time shift into account. Each two signals' correlation would involve lag and shifts. Theoretically the program will do more floating operations on the threads while sacrificing time, thus allowing to improve performance through better reuse. The general goals and approaches for this project are as follows: 1. Use CUDA specs to store, iterate variables in more than 1 dimension. 2. Fetch memory without unit strides to improve efficiency. 3. Higher coverage test strategy 4. More robust program when dealing with large size inputs and large number of input blocks. 5. Final paper done by the end of semester for HPEC.