At Meta, AI workloads are everywhere, serving as the foundation for numerous applications like content comprehension, Feeds, generative AI, and ad ranking. Thanks to its seamless Python integration, eager-mode programming, and straightforward APIs, PyTorch can run these workloads. In particular, DLRMs are vital to enhancing user experiences across all of Meta’s products and offerings. The hardware systems must supply increasingly more memory and computing as the size and complexity of these models grow, all without sacrificing efficiency.
When it comes to the highly efficient processing of Meta’s unique recommendation workloads at scale, GPUs aren’t always the best option. To address this issue, the Meta team developed a set of application-specific integrated circuits (ASICs) called the “Meta Training and Inference Accelerator” (MTIA). With the needs of the next-generation recommendation model in mind, the first-generation ASIC is included in PyTorch to develop a completely optimized ranking system. Keeping developers productive is an ongoing process as they maintain support for PyTorch 2.0, which dramatically improves the compiler-level performance of PyTorch.
In 2020, the team created the original MTIA ASIC to handle Meta’s internal processing needs. Co-designed with silicon, PyTorch, and the recommendation models, this inference accelerator is part of a full-stack solution. Using a TSMC 7nm technology, this 800 MHz accelerator can achieve 102.4 TOPS with INT8 precision and 51.2 TFLOPS with FP16 precision. The device’s TDP, or thermal design power, is 25 W.
The accelerator can be divided into constituent parts, including processing elements (PEs), on-chip and off-chip memory resources, and interconnects in a grid structure. An independent control subsystem within the accelerator manages the software. The firmware coordinates the execution of jobs on the accelerator, controls the available computing and memory resources, and communicates with the host through a specific host interface. LPDDR5 is used for off-chip DRAM in the memory subsystem, which allows for expansion to 128 GB. More bandwidth and far less latency are available for frequently accessed data and instructions because the chip’s 128 MB of on-chip SRAM is shared among all the PEs.
The 64 PEs in the grid are laid out in an 8 by 8 matrix. Each PE’s 128 KB of local SRAM memory allows for speedy data storage and processing. A mesh network links the PEs together and to the memory banks. The grid can be used in its whole to perform a job, or it can be split up into numerous subgrids, each of which can handle its work. Matrix multiplication, accumulation, data transportation, and nonlinear function calculation are only some of the important tasks optimized for by the multiple fixed-function units and two processor cores in each PE. The RISC-V ISA-based processor cores have been extensively modified to perform the required computation and control operations. The architecture was designed to make the most of two essentials for effective workload management: parallelism and data reuse.
The researchers compared MTIA to an NNPI accelerator and a graphics processing unit. The results show that MTIA relies on efficiently managing small forms and batch sizes for low-complexity models. MTIA actively optimizes its SW stack to achieve similar levels of performance. In the meantime, it uses larger forms that are significantly more optimized on the GPU’s SW stack to run medium- and high-complexity models.
To optimize performance for Meta’s workloads, the team is now concentrating on finding a happy medium between computing power, memory capacity, and interconnect bandwidth to develop a better and more efficient solution.
Check out the Project. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
Check Out 100’s AI Tools in AI Tools Club
The post Meta AI Introduces MTIA v1: It’s First-Generation AI Inference Accelerator appeared first on MarkTechPost.
At Meta, AI workloads are everywhere, serving as the foundation for numerous applications like content comprehension, Feeds, generative AI, and ad ranking. Thanks to its seamless Python integration, eager-mode programming, and straightforward APIs, PyTorch can run these workloads. In particular, DLRMs are vital to enhancing user experiences across all of Meta’s products and offerings. The
The post Meta AI Introduces MTIA v1: It’s First-Generation AI Inference Accelerator appeared first on MarkTechPost. Read More AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Large Language Model, Machine Learning, Staff, Tech News, Technology, Uncategorized