Grounding Multimodal Large Language Models in Actions Apple Machine Learning Research
Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world… Read More »Grounding Multimodal Large Language Models in Actions Apple Machine Learning Research