Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models Apple Machine Learning Research
Self-supervised learning (SSL) has made significant advances in speech representation learning. Models like wav2vec 2.0 and HuBERT have achieved state-of-the-art results in tasks such as speech recognition, particularly in monolingual settings. However, multilingual SSL models tend to underperform their monolingual counterparts on each individual language,… Read More »Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models Apple Machine Learning Research