glennklockwood.com - Profile | ThreadSky | a Reddit-style client for Bluesky

glennklockwood.com

I am a supercomputing enthusiast, but I usually don't know what I'm talking about. I post about large-scale infrastructure for #HPC and #AI. Disclosures: Employed by Microsoft. I used to work at NERSC/LBNL.

808 posts 1,044 followers 196 following

Posts 27 Comments 23

comment in response to post

Wow, I didn’t know there was such dramatic growth. Since the pandemic though, the TPCs always start super early to catch both US and Europe. Asia has to suffer through it.

submitted 1 day ago

comment in response to post

My money’s on Intel! Jaguar Shores is supposed to be GREAT.

submitted 2 days ago

comment in response to post

At ISC, WEKA was on stage claiming (unabashedly) that they are the fastest file system ever. So clearly someone isn’t telling the truth!

submitted 2 days ago

comment in response to post

Training with 8-bit precision (in key places) remains an area of hot research and is only now making its way into training meaningful models (e.g., DeepSeek). Training in 4-bit formats is a long ways off; for now, it’s strictly for inferencing.

submitted 2 days ago

comment in response to post

I am surprised that I am still conflicted on a couple of papers this year. I just passed my three year mark at Microsoft, meaning I haven’t published anything (certainly in data research) in as long.

submitted 2 days ago

comment in response to post

Compute is more performant.

submitted 2 days ago

comment in response to post

To be fair, we refer to zettaflops internally when talking about system scale, and those are understood to mean “whatever precision is suitable for training” to capture the capability for a supercomputer to train a new model. But those discussions aren’t for marketing and never see the light of day.

submitted 2 days ago

comment in response to post

According to AI marketing rules, FLOPS are additive. So you can just run HPL on each module separately, add them together, and throw it in a press release.

submitted 3 days ago

comment in response to post

I couldn't find any rules that describe what's in-bounds for projection, but I'd love to project the system behind Eagle to full scale using a blessed method. We've done our own internal projections, and I think it helps team morale to know how we'd rank when the system is going sideways.

submitted 3 days ago

comment in response to post

Interesting. I did not know that was allowed.

submitted 3 days ago

comment in response to post

I don't understand. What is top500.org/system/180388/ if not a real run?

submitted 3 days ago

comment in response to post

That's interesting, and makes more sense! Can you say how many nodes you used for the HPL run? If I had to guess, it was around ~4,400.

submitted 3 days ago

comment in response to post

Also, this report rightly points out expecting the benefits of AI appearing on the bottom line this early is premature. ChatGPT came out less than three years ago. MAYBE you'd see benefits beginning to appear at the top line, but we're still realizing potential at any cost. Optimization comes later.

submitted 4 days ago

comment in response to post

A little disappointed that both keynotes in this workshop were NVIDIA sales engineers presenting short versions of the same talks I saw at GTC. They’re good talks, and the ISC crowd may not have seen them, but they’re really just speakers rehashing others’ slides and stories.

submitted 9 days ago

comment in response to post

Looks like things are going well for them. Good performance and very few surprises or gotchas

submitted 9 days ago

comment in response to post

Agreed. Seeing the BriCS folks' talk on Grace early experiences was high on my list. But I work in AI, so I should probably be in one of the two AI workshops.

submitted 9 days ago

comment in response to post

If only there was a project that experimented with cell phone chips as supercomputers, where we could learn about this! www.montblanc-project.eu I mean, it was a decade long and even European!

submitted 9 days ago

comment in response to post

This is a fascinating peek into the mind of either Yutong Lu or the Chinese supercomputing program (maybe both?) E.g., never heard anyone credibly talk about pooled memory for HPC before. Can’t tell if this is just word soup slide.

submitted 9 days ago

comment in response to post

Juicy deets on the next Chinese exascale #HPC system. Higher emphasis on lower precision, but at a glance, doesn’t seem like they’re going all the way down to FP4. They haven’t caught up yet, but they’re in their way.

submitted 9 days ago

comment in response to post

"National Tsing Hua University" and "Tsinghua University" both won. And today I learned the difference between the two.

submitted 9 days ago

comment in response to post

Don’t worry, he was talking up RISC-V

submitted 9 days ago

comment in response to post

Nope, the BOF on democratizing AI accelerators for HPC.

submitted 9 days ago

comment in response to post

Same here. I came to social media to try to teach people things once in a while. You've always been happy to amplify whomever is helping others grow. #IamHPCGuru should be a rallying cry for giving new points of view a fair shake.

submitted 10 days ago