Matching Latent Encoding for Audio-Text based Keyword Spotting Apple Machine Learning Research
Using audio and text embeddings jointly for Keyword Spotting (KWS) has shown high-quality results, but the key challenge of how to semantically align two embeddings for multi-word keywords of different sequence lengths remains largely unsolved. In this paper, we propose an audio-text-based end-to-end model architecture… Read More »Matching Latent Encoding for Audio-Text based Keyword Spotting Apple Machine Learning Research