To put it in context and for comparison, this framework can train models with more than a trillion parameters on data sets as large as one exabyte, the equivalent of 36,000 years of high quality video.
This super team is still under construction, it will be ready by mid-2022. But, "our researchers have already started using RSC to train large models in natural language processing (NLP) and machine vision for research, with the goal of one day training models with trillions of parameters," they said.
George Niznik, Sourcing manager at Meta, explained that the CSR was designed and executed remotely and under very short deadlines. “The pandemic and a significant industry chip supply shortage came at precisely the wrong time. We had to make full use of all of our collective skills and experiences to resolve these difficult constraints.
Fundamentally, we have harnessed the best that everyone had to offer in people, technology and partnerships to deliver and illuminate the latest in high performance computing.”
What else can you do?
CSR will help Meta AI researchers build better AI models that can learn from trillions of examples; work in hundreds of languages, analyze texts efficiently; images, text and videos at the same time; develop new augmented reality tools and more. The work done with CSR will facilitate the construction of technologies for the next great computing platform: the metaverse, where applications and products driven by AI will play an important role.
It is a large-scale issue that will seek to harness the benefits of advanced AI, “domains such as vision, speech and language, will require training in increasingly large and complex models, especially for use in critical cases such as identifying harmful content . At the beginning of 2020 we decided that the best way to accelerate progress was to design new computing infrastructure.”
As RSC moves into its next phase, we plan for it to grow bigger and more powerful as we lay the groundwork for the metaverse. Kevin Lee, Meta's Technical Program Manager confirms that with this phase one they are already one of the largest and highest performing supercomputers out there. "By July, we plan to triple the capacity of this data center and have up to 16,000 GPUs (graphics processing units)." The storage system will have a target delivery bandwidth of 16 TB/s and a capacity to scale to exabytes to meet the increased demand.
While the high-performance computing community has been tackling scale for decades, they also made sure tohave all the necessary security and privacy controls in place to protect data. “Unlike our previous AI research infrastructure, which leveraged only open source and other publicly available data sets, CSR also helps us ensure our research is effectively translated into practice by allowing us to include real-world examples. of Meta production systems in model training”.
In other words, the idea is to advance research to identify harmful content on their platforms, as well as investigate embedded AI and multi-modal AI to help improve user experiences in applications. “We believe this is the first time performance, reliability, security, and privacy have been addressed at such a scale,” they concluded.
Is it the best?For Sergio Gutiérrez, a research professor at the Faculty of Engineering at the Latin American Autonomous University, this Meta supercomputer is equipped with very powerful computing hardware (particularly the 6000 GPUs ), “GPUs give a lot of computing power to any computer in general, this is talking about a very high processing capacity, however the speed of a supercomputer is evaluated in terms of a unit called Flop, which means Operation Point float, arithmetic operations that the processor does. Gutiérrez needs precise data on the amounts of processing cores and capacity measured in Flops per second to be able to conclude that it is the fastest in the world (see X-ray).