Ex-Twitter developer recalls discovering 700 unused Nvidia GPUs after acquisition — forgotten cluster was "powered on and idle"

An engineer who worked at Twitter during the dramatic transition from Agrawal to Musk has publicly reminisced about discovering a cluster of 700 Nvidia V100 GPUs, and Tim Zaman, who now works as a software engineer at Google DeepMind, found this ton of GPU power running but sitting idle in the data center of X’s spunky ancestor.

A few weeks after the Twitter acquisition in 2022, we discovered 700 V100 GPUs (PCIe, lol) in a data center. Powered on and idle, and had been for a long time. Forgotten remnants of a good-faith attempt to create a cluster within Twitter 1.0. Times have changed. 100,000 GPUs… https://t.co/zSChG0BvVZJuly 22, 2024

The lump of Nvidia silicon and PCBs warmly humming in a Twitter data center was poetically described by Zaman in a Twitter/X post on Monday as “the forgotten remnants of an honest attempt to create a cluster inside Twitter 1.0.” The engineer was prompted to write about his unexpected discovery of this silicon treasure trove after reading that xAI’s Memphis Supercluster has begun training Grok 3, powered by 100,000 liquid-cooled Nvidia H100 accelerators on a single RDMA fabric.

Zaman highlighted what many of you are probably thinking: on Twitter, 700 of the world’s most powerful GPUs have been sitting aimlessly for years. “Times have changed!” he exclaimed. Indeed, the first Nvidia Volta architecture V100 GPUs for data centers started appearing on the market during the first major GPU shortage in 2017, but in mid-2022, Zaman found a cluster with 700x V100 cards sitting aimlessly. That’s a lot of waste of computing time and resources.

Another joy for Zaman was discovering that the 700 Nvidia V100s were PCIe GPUs, and not the SXM2 form factor variety with its much higher-bandwidth NVLink interface. Of course, we don’t know, and probably never will know, why Twitter back in 2017 bought PCIe rather than SXM2-bus V100 GPUs for this massive installation.

Zaman’s tweet also contained an interesting observation about Musk’s new “gigafactory of computing.” “Running 100,000 GPUs on the same fabric must be an epic challenge,” the engineer commented. “At that scale, the only guarantee is failure, and it’s all about proper failure management.” With this in mind, Zaman pondered distributing resources into separate domains to ensure that failures don’t take down the entire system.

Engineers were also intrigued by the maximum number of GPUs that can reside on a single fabric: As tech giants race to scale up their AI training clusters, this is sure to uncover both predictable and unexpected limits on the maximum number of GPUs on the same fiber.

Source link

What's Hot

According to a report by the SCA, UAE public equity firms will distribute $16.8 billion in cash dividends and bonus shares in 2023.

Who is former Rep. Adam Kinzinger, the Republican who spoke at the DNC?

Youngest Miner In Nome Finds $20,000 Of Gold Underwater | Gold Divers

Ex-Twitter developer recalls discovering 700 unused Nvidia GPUs after acquisition — forgotten cluster was “powered on and idle”

Yudra Twitter review: 10 tweets to read before watching Siddhant Chaturvedi and Malavika Mohanan’s action film

Saripodhaa Sanivaaram Twitter Review: Villain SJ Surya outperforms Nani, Fans say ‘Without SJ Surya movie Zero’

Kylian Mbappe Twitter hack: Real Madrid star’s account insults Messi and Ronaldo in GOAT debate, claims Man U move

Israel attacks Hezbollah munitions depot in southern Lebanon, sources say

3 high-yield stocks you can buy right now with $1,000 that you don’t have to think about.

In 50 years, Hampton Roads could look like Louisiana, new climate map reveals

Sacramento’s history is at your feet. Check out the manhole covers dotted around downtown

According to a report by the SCA, UAE public equity firms will distribute $16.8 billion in cash dividends and bonus shares in 2023.

Is Ayuso’s behaviour in the UAE deceptive and illogical? “Of course not. At 21, anyone would want a better position.”

Women’s T20 World Cup moves to UAE – Caribbean Life

Indian rupee strengthens against UAE dirham on the back of rising Asian currencies

Our Picks