Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition Apple Machine Learning Research
This paper presents an efficient decoding approach for end-to-end automatic speech recognition (E2E-ASR) with large language models (LLMs). Although shallow fusion is the most common approach to incorporate language models into E2E-ASR decoding, we face two practical problems with LLMs. (1) LLM inference is computationally… Read More »Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition Apple Machine Learning Research