Skip to content

Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning Apple Machine Learning Research

  • by

​[[{“value”:”This paper was accepted at the Self-Supervised Learning – Theory and Practice (SSLTP) Workshop at NeurIPS 2024.
Image-based Joint-Embedding Predictive Architecture (IJEPA) offers an attractive alternative to Masked Autoencoder (MAE) for representation learning using the Masked Image Modeling framework. IJEPA drives representations to capture useful semantic information by predicting in latent rather than input space. However, IJEPA relies on carefully designed context and target windows to avoid representational collapse. The encoder modules in IJEPA cannot adaptively modulate the type of…”}]] [[{“value”:”This paper was accepted at the Self-Supervised Learning – Theory and Practice (SSLTP) Workshop at NeurIPS 2024.
Image-based Joint-Embedding Predictive Architecture (IJEPA) offers an attractive alternative to Masked Autoencoder (MAE) for representation learning using the Masked Image Modeling framework. IJEPA drives representations to capture useful semantic information by predicting in latent rather than input space. However, IJEPA relies on carefully designed context and target windows to avoid representational collapse. The encoder modules in IJEPA cannot adaptively modulate the type of…”}]]  Read More  

Leave a Reply

Your email address will not be published. Required fields are marked *