Audio Codec

We attempted to implement G.728 for the audio codec in our H.263 application. G.728 is the ITU's protocol for coding speech at 16 kbit/s using low-delay code excited linear prediction. The main components of the encoder are a perceptual weighting filter, a simulated decoder, and a codebook search module, and the main components of the decoder are a codebook, a synthesis filter, and a postfilter. All of the components have been coded in C++, but the codec is not functioning.

Inputs

The input to G.728 is a linear PCM signal, which is usually converted from A-law or u-law. A 16-bit linear PCM input can also be used. The output is then converted to whatever format the uses chooses.

Structure

The conversation is sampled at 125 us intervals, with a vector consisting of a group of five consecutive samples. Four consecutive vectors make up an adaptation cycle, or a frame. The input to the PCM format conversion must be normalized to -4015.5 and 4015.5.

The vector buffer

The encoder transmits the excitation vector quantization codebook index to the decoder. From the index, the decoder can find the excitation gain, the synthesis filter coefficients, and the perceptual weighting filter coefficients. The excitation gain is computed once per vector, and the synthesis and perceptual weighting filter coefficients are found once per adaptation cycle.

The perceptual weighting filter consists of a hybrid windowing module, a Levinson-Durbin recursion module, and then a weighting filter coefficient calculator. The perceptual weighting filter is a 10-th order pole-zero filter. The perceptual weighting filter calculates its coefficients once every four vectors at the third speech vector. The hybrid window module place a window on the previous speech vectors and calculates the first 11 autocorrelation coefficients of the windowed speech signal. The Levinson-Durbin recursion converts the autocorrelation coefficients to predictor coefficients, which are the used to calculate the coefficients for the weighting filter.

The synthesis filter consists of a hybrid windowing module, a Levinson-Durbin recursion module, and a bandwidth expansion module. In contrast to the perceptual weighting filter, the backward synthesis filter used a 50th-predictor order, uses quantized, rather than unquantized, speech, and has different hybrid window parameters. Both synthesis filters have identical coefficients, and both are updated by a backward synthesis filter adapter. The synthesis filters are all-pole and have a feedback loop. The transfer function is:

F(z) = 1 / [1-P(z)],

where P(z) represents the transfer function of the LPC predictor. The synthesis filter calculates a weighted speech vector and then a zero-input response vector , which represents the response of both filters to the previous gain-scale vectors.

The codebook search module goes through 1024 codevectors in the excitation VQ codebook to find the index of the codevector that minimizes the mean-squared error distortion. The 10-bit codebook is decomposed into two smaller codebooks, a 7-bit shape codebook and a a 3-bit gain codebook. The combination of the best shape code vector and the best gain level is then transmitted to the decoder.

Bugs

The output of the encoder seems fair and consistent but the output to the decoder is spurious and contains many erroneous zeros. By testing each function of the encoder and the decoder, we were able to verify which blocks seem to be working and which do not. In the encoder, we believe that the input PCM format conversion, vector buffer, the perceptual weighting filter, and the codebook search modules are working correctly. However, we have not been able to verify that the simulated decoder is working and hypothesize that the main problems are within the synthesis filter and the backward synthesis filter adapter. We think that the excitation VQ codebook, gain, postfilter, and output PCM format conversion blocks are correct in the decoder and believe that the main problems are in the synthesis filter and backward synthesis filter adapter, which are theoretically the same as those in the encoder. Because of time constraints, we could not verify each block as thoroughly as we could have and were not able to generate enough test vectors to be assured that each block is completely correct. We have verified that the tables are correct in tables.C and do not think that they are causing any problems.

Back to Description of projects