[[{“value”:”

With the rise of language models, there has been an enormous focus on improving the learning of LMs to accelerate the learning speed and achieve a certain model performance with as few training steps as possible. This emphasis aids humans in understanding the boundaries of LMs amidst their escalating computational requirements. It also advances the democratization of large language models (LLMs), benefiting research and industry communities.

Prior works like Pre-Trained Models, Past, Present, and Future, focus on designing effective architectures, utilizing rich contexts, and improving computational efficiency. In h2oGPT: Democratizing Large Language Models, the researchers have tried to create open-source alternatives to the closed-source approaches. In Large Batch Optimization for Deep Learning: Training BERT in 76 minutes, they tried to overcome the computational challenge of LLMs. These prior works explore practical acceleration methods at the model, optimizer, or data levels.

The researchers from the CoAI Group, Tsinghua University, and Microsoft Research have proposed a theory for optimizing LM learning, beginning with maximizing the data compression ratio. They derive the Learning Law theorem to elucidate optimal learning dynamics. Validation experiments on linear classification and language modeling tasks confirm the theorem’s properties. Results indicate that optimal LM learning enhances coefficients in LM scaling laws, offering promising implications for practical learning acceleration methods.

In their method (Optimal Learning of Language Models), the researchers demonstrated the principles of optimizing the LM learning speed, including the optimization objective, the property of optimal learning dynamics, and the essential improvement of the learning acceleration. For the optimization objective, they have proposed to minimize the area under the curve (AUC), a learning process with the smallest loss AUC corresponds to the highest compression ratio. Then, they derived the Learning Law theorem that characterizes the property of dynamics in the LM learning process that achieves the optimum of their objective. Here, a learning policy induces a learning process that determines which data points the LM learns as the training progresses.

After conducting experiments on linear classification with Perceptron and language modeling with Transformer, researchers optimized learning policies and validated them empirically. Near-optimal policies significantly accelerated learning, improving loss AUC by 5.50× and 2.41× for Perceptron and Transformer, respectively. Results confirmed theoretical predictions, demonstrating improved scaling law coefficients by up to 96.6% and 21.2%, promising faster LM training with practical significance.

In conclusion, researchers from the CoAI Group, Tsinghua University, and Microsoft Research have proposed a theory for optimizing LM learning to maximize compression ratio. They derive the Learning Law theorem, confirming that all examples contribute equally to optimal learning, validated in experiments. The optimal process improves LM scaling law coefficients, guiding future acceleration methods.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

You may also like our FREE AI Courses….

The post Researchers from Tsinghua University and Microsoft AI Unveil a Breakthrough in Language Model Training: The Path to Optimal Learning Efficiency appeared first on MarkTechPost.

“}]] [[{“value”:”With the rise of language models, there has been an enormous focus on improving the learning of LMs to accelerate the learning speed and achieve a certain model performance with as few training steps as possible. This emphasis aids humans in understanding the boundaries of LMs amidst their escalating computational requirements. It also advances the
The post Researchers from Tsinghua University and Microsoft AI Unveil a Breakthrough in Language Model Training: The Path to Optimal Learning Efficiency appeared first on MarkTechPost.”}]] Read More AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Large Language Model, Staff, Tech News, Technology, Uncategorized

Researchers from Tsinghua University and Microsoft AI Unveil a Breakthrough in Language Model Training: The Path to Optimal Learning Efficiency Mohammad Asjad Artificial Intelligence Category – MarkTechPost

Leave a Reply Cancel reply