Starts at: April 28, 2014 1:30 PM
Ends at: 4:30 PM
Location: Porter Hall, B34
Over the years, a variety of array processing techniques have been applied to the problem of enhancing degraded speech to improve automatic speech recognition. While a number of nonlinear processing methods have arisen, they tend to lag behind linear beamforming in terms of simplicity, scalability and flexibility. Nonlinear techniques are also more difficult to analyze and lack the systematic descriptions available to the study of linear beamformers.
This work focuses on a class of nonlinear processing, known as time-frequency (T-F) masking, whose variants comprise a significant portion of the existing techniques. Analyses are developed that attempt to mirror the beam patterns used to describe linear processing, leading to a view of T-F masking as “nonlinear beamforming”. While these “nonlinear beam patterns” are not quite as simple or all-encompassing as traditional beam patterns in microphone-array processing, they do accurately represent the behavior of masking algorithms in analogous and intuitive ways.
In addition to analyzing this class of nonlinear masking algorithm, we also attempt to improve their performance in a variety of ways, as current masking techniques do not scale acceptably with array size. Improvements are proposed to both the baseline performance of these algorithms, by addressing both the mask estimation and the signal reconstruction stages, and to their scalability, by using a hybrid masking-beamforming system. The goal of these improvements is to narrow the existing performance gap between masking and linear beamforming in arrays with more than two microphones.Amir