Cerebras Wafer-Scale Cluster Brings Push-Button Ease and Linear Performance Scaling to Large Language Models

September 15, 2022

303

Sunnyvale, Calif., United States:

Cerebras Systems, the pioneer in accelerating synthetic intelligence (AI) compute, as we speak unveiled the Cerebras Wafer-Scale Cluster, delivering near-perfect linear scaling throughout lots of of hundreds of thousands of AI-optimized compute cores whereas avoiding the ache of the distributed compute. With a Wafer-Scale Cluster, customers can distribute even the most important language fashions from a Jupyter pocket book working on a laptop computer with just some keystrokes. This replaces months of painstaking work with clusters of graphics processing items (GPU).

“Today, the fundamental limiting factor in training large language models is not the AI. It is the distributed compute. The challenge of putting these models on thousands of graphics processing units, and the scarcity of the distributed compute expertise necessary to do so, is limiting our industry’s progress,” stated Andrew Feldman, CEO and co-founder of Cerebras Systems. “We have solved this challenge. We eliminated the painful steps necessary in distributed computing and instead deliver push-button allocation of work to AI-optimized CS-2 compute, with near-linear performance scaling.”

Large language fashions (LLMs) are reworking complete industries throughout healthcare and life sciences, power, monetary providers, transportation, leisure, and extra. However, coaching massive fashions with conventional {hardware} is difficult and time consuming and has solely efficiently been achieved by a couple of organizations. It requires months of advanced distributed computing earlier than any coaching may even happen. In truth, coaching these fashions is so unusual, profitable coaching is ceaselessly deemed worthy of publication.

Cerebras Wafer-Scale Clusters permit one to shortly, merely, and simply construct clusters that assist the most important LLMs. By completely utilizing knowledge parallelism, Cerebras avoids the ache of distributed computing. Instead, Cerebras Wafer-Scale Clusters ship push-button allocation of labor to compute, and linear efficiency scaling from a single CS-2 to up to 192 CS-2 programs. Wafer-Scale Clusters make scaling the most important fashions lifeless easy. From a digital pocket book on a laptop computer, the most important of LLMs like GPT-3 may be unfold over a cluster of CS-2s with a single keystroke, skilled, and the outcomes evaluated. Switching between a 1B, 20B, and 175B parameter mannequin is equally easy, as is allocating an LLM to 850,000 AI cores (1 CS-2), 3.4 million compute cores (4 CS-2s), and 13.6 million cores (16 CS-2s). Each one in every of these actions would have taken months of labor on a cluster of graphics processing items.

The key to the brand new Cerebras Wafer-Scale Cluster is the unique use of utilizing knowledge parallelism. Data parallelism is the popular strategy for all AI work. However, knowledge parallelism requires that each one the calculations, together with the most important matrix multiplications of the most important layer, match on a single machine, and that each one the parameters match within the machine’s reminiscence. Only the CS-2 — and not graphics processing items — achieves each traits for LLMs.

The Cerebras WSE-2 is the most important processor ever constructed. It is 56 occasions bigger than the most important GPU, has 123 occasions extra cores, 1,000 occasions extra on-chip reminiscence, 12,000 occasions extra reminiscence bandwidth, and 45,000 occasions extra cloth bandwidth. The WSE-2 is the scale of a dinner plate, whereas the most important graphic processing unit is the scale of postage stamp.

The sheer dimension and computational assets on the WSE-2 permit Cerebras to match the most important layers of the most important neural networks onto a single machine. In truth, the WSE-2 can match layers 1,000 occasions bigger than the most important layer within the largest present pure language processing (NLP) community. This means work by no means wants to break up and unfold throughout a number of processors. Small graphics processing items routinely should break up work and unfold it throughout a number of processors.

MemoryX allows Cerebras to disaggregate parameter storage from compute with out struggling the penalty normally related to off-chip reminiscence. Storage for mannequin parameters is within the separate MemoryX system, whereas all of the compute is within the CS-2. By disaggregating compute from reminiscence, the MemoryX supplies practically unbounded quantities of storage for the parameters and optimizer states.

MemoryX streams weights to the CS-2, the place the activations reside. In return, the CS-2 streams again the gradients. The MemoryX makes use of these together with saved optimizer parameters to compute weight updates for the subsequent coaching iterations. This course of is then repeated till coaching is full. MemoryX allows even a single CS-2 to assist a mannequin with trillions of parameters.

While MemoryX provides huge parameter storage capabilities, SwarmX connects MemoryX to clusters of CS-2s, enabling CS-2s to scale out and for the cluster to run strictly knowledge parallel. SwarmX types a broadcast scale back cloth. The parameters saved in MemoryX are replicated in {hardware} and broadcast throughout the SwarmX cloth to a number of CS-2s. The SwarmX cloth reduces the gradients despatched again from the CS-2s, offering a single gradient stream to the MemoryX.

Based on the CS-2, MemoryX and SwarmX, the Cerebras Wafer-Scale Cluster is the one cluster in AI compute that permits strict linear scaling of fashions with billions, tens of billions, lots of of billions, and trillions of parameters. If customers go from one CS-2 to two CS-2s in a cluster, the time to practice is minimize in half. If customers go from one CS-2 to 4 CS-2s, coaching time is minimize to one-fourth. This is an exceptionally uncommon attribute in cluster computing. It is profoundly value and energy environment friendly. Unlike GPU clusters, in a Cerebras cluster, as customers add extra compute, efficiency will increase linearly.

With prospects in North America, Asia, Europe, and the Middle East, Cerebras is delivering industry-leading AI options to a rising roster of shoppers within the enterprise, authorities, and high-performance computing (HPC) segments, together with GSK, AstraZeneca, TotalEnergies, nference, Argonne National Laboratory, Lawrence Livermore National Laboratory, Pittsburgh Supercomputing Center, Leibniz Supercomputing Centre, National Center for Supercomputing Applications, Edinburgh Parallel Computing Centre (EPCC), National Energy Technology Laboratory, and Tokyo Electron Devices.

The Cerebras Wafer-Scale Cluster is accessible now. For extra info on the Cerebras Wafer-Scale Cluster, please go to https://www.cerebras.net/product-cluster/.

About Cerebras Systems

Cerebras Systems is a staff of pioneering laptop architects, laptop scientists, deep studying researchers, and engineers of all sorts. We have come collectively to construct a brand new class of laptop system, designed for the singular function of accelerating AI and altering the way forward for AI work without end. Our flagship product, the CS-2 system, which is powered by the world’s largest processor – the 850,000 core Cerebras WSE-2 – allows prospects to speed up their deep studying work by orders of magnitude over graphics processing items.

View supply model on businesswire.com: https://www.businesswire.com/news/home/20220914005200/en/

Source link

Share this:

Like this:

Related

LEAVE A REPLY

POPULAR POSTS

POPULAR CATEGORY