Systems Seminar - Vibhaalakshmi Sivaraman

Systems Seminar

Title: Gemino: Practical and Robust Neural Compression for Video Conferencing
Speaker: Vibhaalakshmi Sivaraman, MIT
Date: February 7
Time: 4:00 PM - 5:00 PM
Location: Hybrid, Gates 403 (Fujitsu)
Zoom Link

Video conferencing systems suffer from poor user experience when network conditions deteriorate because current video codecs simply cannot operate at extremely low bitrates. Recently, several neural alternatives have been proposed that reconstruct talking head videos at very low bitrates using sparse representations of each frame such as facial landmark information. However, these approaches produce poor reconstructions in scenarios with major movement or occlusions over the course of a call, and do not scale to higher resolutions. We design Gemino, a new neural compression system for video conferencing based on a novel high-frequency-conditional super-resolution pipeline. Gemino upsamples a very low-resolution version of each target frame while enhancing high-frequency details (e.g., skin texture, hair, etc.) based on information extracted from a single high-resolution reference image. We use a multi-scale architecture that runs different components of the model at different resolutions, allowing it to scale to resolutions comparable to 720p, and we personalize the model to learn specific details of each person, achieving much better fidelity at low bitrates. We implement Gemino atop aiortc, an open-source Python implementation of WebRTC, and show that it operates on 1024x1024 videos in real-time on a A100 GPU, and achieves 2.9x lower bitrate than traditional video codecs for the same perceptual quality.

Vibhaa is a sixth year Ph.D. student in the Networking and Mobile Systems Group at MIT CSAIL where she is advised by Prof. Mohammad Alizadeh. Her research interests lie broadly in computer networks, with a particular interest in algorithmic techniques. In the past, she has worked on using networking ideas to improve blockchain scalability, as well as network monitoring and heavy-hitter detection. Recently, she has been interested in improving video streaming and conferencing applications using advances in computer vision and video compression techniques. Prior to MIT, Vibhaa received a B.S.E. in Computer Science from Princeton University.

Tuesday, February 7, 2023 - 4:00pm to 5:00pm