
The Future of AI Compression Was Hiding In Your Video Player
By Krista Burns
Media InquiriesLarge language models (like GPT-style models) keep getting bigger. As they grow, they need more memory and faster communication between chips to run or to train them. These needs are becoming major obstacles.
Beidi Chen, assistant professor of electrical and computer engineering, and her student Xinyu Yang, a Ph.D. candidate, teamed up with researchers from Duke University and discovered that the same technology used to compress videos (video codecs) works extremely well for compressing AI model data too.
“We demonstrate that video codecs can be versatile and general-purpose tensor codecs while achieving the state-of-the-art compression efficiency in various tasks,” explains Chen. “We further make use of the hardware video encoding and decoding module available on GPUs to create a framework capable of both inference and training with video codecs repurposed as tensor codecs.”
Building on insights gained from video codecs, the team further showed that the hardware of the video codecs can be customized and enhanced to significantly improve tensor encoding/decoding throughput without incurring substantial costs, making it a highly effective solution for large-scale model deployment without requiring significant modifications to the existing GPU architecture.
“We hope that by making small upgrades to this video-codec hardware, without redesigning the whole GPU, you can make it even faster for tensor compression,” says Chen. “This offers a practical, low-cost way to support huge AI models in the future.”
The team presented the research at the recent Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture and received a Best Paper Award.
This work was in collaboration with researchers from Carnegie Mellon University and Duke University, including; Ceyu Xu, Yongji Wu (Duke Univ.); Xinyu Yang, Beidi Chen (Carnegie Mellon Univ.); Matthew Lentz, Danyang Zhuo, Lisa Wu Wills (Duke Univ.)