The most popular paradigm to solve modern vision tasks, such as image classification/object detection, etc., on small datasets involves fine-tuning the latest pre-trained deep network, which was previously ImageNet-based and is now likely CLIP-based. The current pipeline has been largely successful but still has some limitations.
Probably, the main concern regards the giant amount of effort needed to collect and label these large sets of images. Noticeably, the size of the most popular pretraining dataset has grown from 1.2M (ImageNet) to 400M (CLIP) and does not seem to stop. As a direct consequence, also training generalist networks require large computational efforts that nowadays only a few industrial or academic labs can afford. Another critical issue regarding such collected databases is their static nature. Indeed, despite being huge, these datasets are not updated. Hence, their expressive power regarding known concepts is limited in time.
Recent work from Carnegie Mellon University and Berkley University researchers proposes treating the Internet as a special dataset to overcome the previously mentioned issues of the current pre-training and fine-tuning paradigm.
In particular, the paper proposes a reinforcement learning-inspired, disembodied online agent called Internet Explorer that actively searches the Internet using standard search engines to find relevant visual data that improve feature quality on a target dataset.
The agent’s actions are text queries made to search engines, and the observations are the data obtained from the search.
The proposed approach is different from active learning and related work by performing an actively improving directed search in a fully self-supervised manner on an expanding dataset that requires no labels for training, even from the target dataset. In particular, the approach is not applied to a single dataset and does not require the intervention of expert labelers, as in standard active learning.
Practically, Internet Explorer uses WorNet concepts to query a search engine (e.g., Google Images) and embeds such concepts into a representation space to learn, through time, relevant query identification. The model leverages self-supervised learning to learn useful representations from the unlabeled images downloaded from the Internet. The initial vision encoder is a self-supervised pre-trained MoCoV3 model. The images downloaded from the internet are ranked according to the self-supervised loss to understand their similarity to the target dataset as a proxy for being relevant to training.
On five popular fine-grained and challenging benchmarks, i.e., Birdsnap, Flowers, Food101, Pets, and VOC2007, Internet Explorer (with the additional usage of GPT-generated descriptors for concepts) manages to rival a CLIP oracle ResNet 50 reducing the number of compute and training images by respectively one and two orders of magnitude.
To summarize, this paper presents a novel and smart agent that queries the web to download and learn helpful information to solve a given image classification task at a fraction of the training costs concerning previous approaches and opens up further research on the topic.
Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 15k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
The post CMU Researchers Introduce Internet Explorer: An AI Approach with Targeted Representation Learning on the Open Web appeared first on MarkTechPost.
The most popular paradigm to solve modern vision tasks, such as image classification/object detection, etc., on small datasets involves fine-tuning the latest pre-trained deep network, which was previously ImageNet-based and is now likely CLIP-based. The current pipeline has been largely successful but still has some limitations. Probably, the main concern regards the giant amount of
The post CMU Researchers Introduce Internet Explorer: An AI Approach with Targeted Representation Learning on the Open Web appeared first on MarkTechPost. Read More AI Paper Summary, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Machine Learning, Staff, Tech News, Technology, Uncategorized