AVFormer: Injecting vision into frozen speech models for zero-shot AV-ASR Google AI Google AI Blog
Posted by Arsha Nagrani and Paul Hongsuck Seo, Research Scientists, Google Research Automatic speech recognition (ASR) is a well-established technology that is widely adopted for various applications such as conference calls, streamed video transcription and voice commands. While the challenges for this technology are centered… Read More »AVFormer: Injecting vision into frozen speech models for zero-shot AV-ASR Google AI Google AI Blog