Midterm Report for Real-Time Transmission of Live Video

We were trying to implement a real-time transmission system through Internet. The video streams are recorded from SRI Stereo head. The Small Vision System provides the depth information of each frame in the video streams also. From this depth information, we could retrieve the foreground from the backgroud.

For the midterm, we transmit the background only in the first few seconds and reserved on frame in the display buffer and use that frame for our static backgroud for future display in the receive side. The foreground is segmented as rectagles with X, Y positions of the starting pixel, and the width and height of the segmented image. We call this rectangle a frame and hand this frame to the network transmission layer.

The network transmission layer is a combination of UDP/IP socket and RTP that is always implemented in the applications. A profile header is also defined for our specific data format. One frame could be packetized into several packets with Maximum Packet Size of 1024 Bytes. The packet containing the last frame is marked as a synchronization point.

The receiver would receive each packets sequentially, parse the RTP and profile headers to get the timestamp, mark field, sequence nubmer, etc for necessary processing and put the data contained in the payload field of the packet into hte display buffer. After a packet marked for final packet of the frame, the display buffer is flused for display. When packets come out of order or some packets are lost, e.g., the packet of next frame comes before the last packet of current frame arrives, we just flush the current frame for display and be ready for accumulating the next frame. This way, we could avoid processing delay and gurantee real-time requirement.

For Final, we are going to consider scalability issues primarily. From the codec side, we plan to apply H.261, H.263 video code to both the foreground and the background but with different quantization stepsize. In some situations we might want to send the low-quality image of the background and foreground together to give the receive a rough sight of what is going on on the sending side, and in other situations, we would consider sending the foreground with good quality and do not care about the quality of the background that much. The sender could send them through different channels so that receive could decide which layer to reserve.

From the networking side, we would like to put more control and adaptive procecures to the code. Part of the RTCP is going to be implemented and also the negotiation between the sender and receivers. Currently we simply support Unicast since we did not have any experiment about Mbone yet, but we would defintely like to try on the MBone if possible.

Our Segmentation Result

Our Source Code
Our platform is Linux!


Peng Chang
Shuheng Zhou


Last modified: Wed Mar 10 23:33:53 EST 1999