|
| |
Current
Project:
-
Jan. 2007-present
Research of ILLIAC6 Supercomputer
Controlled and coordinated the progress of developing communication
protocol stack in reconfigurable device. The main work included clarifying the
hardware /software interface
and each layer's specification, analyzing and optimizing data path for
bandwidth, allocating global resource of device, and
building up a run-time reconfigurable solution. The whole protocol stack
has almost been finished and
expected to be verified on
mezzanine card soon.
Designed the crossbar switches in the protocol stack. Crossbar switch
interacts with both physical channels and the software
mapper. Changed with
the static scheduling algorithm, round-robin dynamic scheduling and fixed-length packet
time-slot interchanger. The first one
fits only light overload.
The second one avoids the NP-problem in static routing but provides
unpredictable bandwidth. The final version is capable to guarantee tight
bandwidth for real-time application. Crossbar switch design has been
tested on development board in 200MHz frequency
and supplied up to 6.4 Gbps for each link.
Past
Projects:
In order to increase the throughput
of VLIW (Very Long Instruction Word) microprocessor in multi-media applications, I
analyzed the performance under real workloads and found out how to optimize. I
think SMT (Simultaneous Multithreading) technology will be an efficient method
in area and power, considering the features of video compression such as lots of
vector operations and high thread-level parallelism. Therefore, I design an
asymmetrical multithreading
architecture in VLIW microprocessor, which can increase the
performance--especially SIMD coprocessor's efficiency, power efficiency and area
efficiency of the processor.
Asymmetrical mechanism
guarantees the main thread with the whole resource, and increases the throughput
with executing some low-priority threads simultaneously.
Register file
is a crucial unit in microprocessor which has register-to-register ISA
(Instruction Set Architecture). Multi-ports and high frequency of register file
can support the high performance of the superscalar computer system. This
register file can be written by 6 ports and read by 10 ports at one time. I chose
a reliable memory cell structure for 4-port reading without content changes. The
register design started from transistor-level circuit and was layout compactly.
It took digital IC flow to design and verify functions. The tools I utilized are
Spice, CosmosSE, CosmosLE, Nanosim, StarRCXT, etc. It has been implemented in
0.18um CMOS technology and meets the requirements very well.
SuperV is a VLIW microprocessor
with SIMD instructions. It includes over one million gates with the speed up to
266MHz. My tasks in this project included
1)
Implemented the control path of the microprocessor in RTL level;
2)
Accomplished the custom
design of 16-ports write-through register file in transistor-level,
which supported 4-issued microprocessor;
3) Verified the function
of the CPU core
and SIMD coprocessor
on both simulation and formal methods.
The tools we used included Modelsim,
VCS, Design-Compiler, Formality, etc. In the phase of verification, I designed
some kernel assembly programs
with
sub-word
parallelism
for simulation
and built up an efficient test-bench to shorten the verification period. This
project wais funded by
national basic research program of China. It
has been fabricated in 0.18μm CMOS technology. The chip got one-pass and
ran MPEG-2 application as we expected.
The register
file is designed as a small IP core. It supported 4 read and 2 write operations
at one time. It was designed in 0.18um. I designed the transistor-level circuit and finished the layout and optimization. The tools I used
were Spice, CosmosSE, Enterprise, Star_sim, StarRC, etc.
The
multiplier generator is a work to shorten the design period of parallel
multiplier. It can produce the RTL code under any
data widths from 4-bit to 40-bit. The multiplier consists of many 4-2 counters so
that the structure is neat to place and route. I designed it in C language and synthesized
all multipliers of all widths to get the timing and area reports in Design_Compiler.
|