Many linguistic and visual difficulties have been helped by self-supervised pretraining. In the language and vision domains, where a unified model may be easily tailored to multiple downstream tasks by pretraining representations without explicit labeling, self-supervised pretraining has been the subject of substantial research. However, creating such a pretraining technique for sequential decision-making tasks is challenging because of the difficulty of sequential control over long interaction horizons and the great dimensionality of perceptual input.
The problems with applying pretrained vision models to simplify control procedures are as follows:
There has been a change in how information is distributed. Traditionally, training data for decision-making has been in the form of trajectories generated under predetermined policies governing behavior. As a result, there is room for variation in data distributions during pretraining, fine-tuning of tasks, and deployment.
There is a great deal of variety in decision-making tasks. These tasks differ greatly from language and vision in the number of possible configurations, transition functions, rewards, action and state spaces, and semantic information. Hence, many forms of decision-making cannot be expressed generically.
Sequential decision-making attempts to discover a policy that maximizes long-term gain by considering only the consequences of each action. Hence, in activities with long horizons, partial observability, and continuous control, it is challenging to construct a useful representation for downstream policy learning that does not incorporate information for current and long-term planning.
Representation learning often depends on expert demonstrations and ground-truth rewards, but this method struggles without supervision and high-quality data. For most real-world sequential decision-making operations, high-quality data and supervisory signals are either prohibitively expensive or otherwise inaccessible.
A recent study by Microsoft presents a universal pretraining framework called Self-supervised Multi-task pretrAining with contRol Transformer (SMART). This team majorly focused on exploring unsupervised pretrained representations for control tasks that are:
Flexible enough to adapt to control tasks and downstream learning methods like imitation and reinforcement learning (IL, RL), etc.
General enough to be applied to novel tasks and domains with multiple rewards and agent dynamics.
Resistant to variations in the quality of the pretraining data.
The researchers introduce CT, a Control Transformer, which models state-action interactions from high-dimensional observations using a causal attention method. Whereas recent transformer-based models for sequential decision-making directly learn reward-based policies, CT is designed to learn reward-agnostic representations. This makes it a unified model that can fit different learning methods (such as IL and RL) and a wide range of tasks. Using CT as a foundation, the team proposes a control-centric pretraining objective that includes forward dynamics prediction, inverse dynamics prediction, and random masked hindsight control. These parameters promote CT to capture dynamical information at both fine and coarse temporal granularities, focusing on probabilities of transitions that depend on neither the current policy nor its future outcomes.
SMART captures important control-relevant information, making it empirically more suitable for interactive decision-making compared to other pretrained vision models that largely focus on learning object-centric semantics. When comparing IL and RL performance on various tasks, SMART consistently outperforms training from scratch and state-of-the-art (SOTA) pretraining techniques. The efficiency of the suggested strategy, as well as its resilience in the face of distribution shift and low-quality data, is demonstrated by empirical results across a variety of domains and tasks.
The team believes there is a scope for improving the attention mechanism on spatial observation space and temporal state-observation interactions and looking into its potential for generalization in various application scenarios.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 14k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
The post Microsoft Research Proposes SMART: A Generic Pretraining Framework For Multi-Task Sequential Decision Making appeared first on MarkTechPost.
Many linguistic and visual difficulties have been helped by self-supervised pretraining. In the language and vision domains, where a unified model may be easily tailored to multiple downstream tasks by pretraining representations without explicit labeling, self-supervised pretraining has been the subject of substantial research. However, creating such a pretraining technique for sequential decision-making tasks is
The post Microsoft Research Proposes SMART: A Generic Pretraining Framework For Multi-Task Sequential Decision Making appeared first on MarkTechPost. Read More AI Paper Summary, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Machine Learning, Staff, Tech News, Technology, Uncategorized Artificial Intelligence Category – MarkTechPost