
Teaching Your Phone to Listen Like an Ear
By Krista Burns
Media InquiriesIn a crowded restaurant, your smartphone’s microphone is easily overwhelmed. The clatter of plates, overlapping conversations, and background music all blend together into a chaotic audio mix. Try recording a friend across the table and the result often sounds like a distant voice buried under noise. Carnegie Mellon researchers believe they have found a surprisingly simple way to fix that by teaching smartphones to listen more like living creatures do.
The system, called SonicSieve, combines a small physical attachment with artificial intelligence to help smartphones capture speech from a specific direction while filtering out background noise.
The team will present their findings at the 2026 Association of Computing Machinery Conference on Human Factors in Computing Systems.
"What makes SonicSieve unusual is that the key innovation isn’t just software, it’s a tiny piece of acoustic engineering," explains Kuang Yuan, an electrical and computer engineering Ph.D. student and lead author on the paper. Yuan is advised by Swarun Kumar, the Sathaye Family Foundation Professor of Electrical and Computer Engineering.
The device attaches to the inline microphone found on common wired earphones. At first glance, it looks like a small molded structure placed over the microphone, but its shape is carefully designed to manipulate incoming sound waves. When sound arrives from different directions, the structure alters the way those waves reach the microphone. The result is a pattern of tiny changes in the recorded signal that depend on where the sound came from. These directional cues are similar to the way the human ear’s shape helps the brain determine where a sound originates. The attachment is passive, meaning it does not require power, sensors, or additional circuitry. It simply reshapes sound using geometry.
"It’s a bio-inspired design," says Justin Chan, assistant professor of electrical and computer engineering and computer science, and project advisor. "The device design draws inspiration from biological hearing systems. Like the folds of a human ear, the microstructure physically encodes directional information into the audio before it even reaches the electronics."
"And that's why we call our system SonicSieve, its sieve-like design helps filter and extract audio coming in from a specific direction," says Kumar.

SonicSieve enables directional speech extraction on smartphones using a lightweight, passive acoustic microstructure. (Left) Our system leverages the distinct spatial cues created by the 3D-printed microstructure with a real-time neural network to intelligently amplify speech from target directions while attenuating others. (Right) Our design attaches to the in-line microphone of low-cost earphones which can be plugged into a smartphone. The system records sound mixtures from the in-line and top microphone which are used by the neural network to generalize across different sound sources and environments.
Once directional information is embedded in the audio, software takes over. The team developed a neural network that runs directly on a smartphone and processes the incoming audio in real time. The system analyzes the subtle patterns introduced by the microstructure and uses them to separate speech coming from a target direction, roughly a 30-degree cone, from other sounds.
"The system learns to ‘listen forward’," explains Yuan. "It suppresses voices and noises arriving from other angles."
According to the researchers, SonicSieve improves signal quality by about 5 decibels when focusing on a target region.
"That may sound small, but in audio processing it represents a meaningful boost in clarity," explains Chan.
SonicSieve achieves this by using just two microphones, outperforming conventional smartphone setups that rely on five-microphone arrays to perform similar directional filtering. Traditional approaches to directional audio capture, called beamforming, depend on multiple microphones placed at different positions. By comparing the slight timing differences between microphones, software can estimate where sound is coming from and emphasize certain directions.
"SonicSieve flips that equation," says Yuan. "Instead of multiplying microphones, it uses clever acoustics and machine learning to extract more information from fewer sensors."
The physical attachment effectively pre-processes the sound before it reaches the microphones, encoding spatial information that the AI can then decode.
"It’s a hybrid approach. Part hardware trick, part algorithm," says Chan.
If systems like SonicSieve become practical in consumer devices, they could improve many everyday experiences, like voice recordings in noisy places, video calls in crowded environments, speech recognition systems, and lecture recordings in large rooms.
For years, engineers have tried to solve audio problems primarily through more sensors and heavier computation. SonicSieve suggests another path, letting physics do part of the work.
By shaping sound before it even reaches the microphone, the system gives AI a head start. The future of smarter listening might not come from bigger microphone arrays or more powerful chips, but from tiny, carefully sculpted structures that help machines hear the world the way biology already does.
The authors of the paper “SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures” include Kuang Yuan, Yifeng Wang, Xiyuxing Zhang, Chengyi Shen, Swarun Kumar, and Justin Chan.