4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities Apple Machine Learning Research
[[{“value”:”*Equal Contributors Current multimodal and multitask foundation models like 4M or UnifiedIO show promising results, but in practice their out-of-the-box abilities to accept diverse inputs and perform diverse tasks are limited by the (usually rather small) number of modalities and tasks they are trained on.… Read More »4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities Apple Machine Learning Research