glennklockwood.com
I am a supercomputing enthusiast, but I usually don't know what I'm talking about. I post about large-scale infrastructure for #HPC and #AI.
Disclosures: Employed by Microsoft. I used to work at NERSC/LBNL.
808 posts
1,044 followers
196 following
Regular Contributor
Active Commenter
comment in response to
post
Wow, I didn’t know there was such dramatic growth. Since the pandemic though, the TPCs always start super early to catch both US and Europe. Asia has to suffer through it.
comment in response to
post
My money’s on Intel! Jaguar Shores is supposed to be GREAT.
comment in response to
post
At ISC, WEKA was on stage claiming (unabashedly) that they are the fastest file system ever. So clearly someone isn’t telling the truth!
comment in response to
post
Training with 8-bit precision (in key places) remains an area of hot research and is only now making its way into training meaningful models (e.g., DeepSeek). Training in 4-bit formats is a long ways off; for now, it’s strictly for inferencing.
comment in response to
post
I am surprised that I am still conflicted on a couple of papers this year. I just passed my three year mark at Microsoft, meaning I haven’t published anything (certainly in data research) in as long.
comment in response to
post
Compute is more performant.
comment in response to
post
To be fair, we refer to zettaflops internally when talking about system scale, and those are understood to mean “whatever precision is suitable for training” to capture the capability for a supercomputer to train a new model. But those discussions aren’t for marketing and never see the light of day.
comment in response to
post
According to AI marketing rules, FLOPS are additive. So you can just run HPL on each module separately, add them together, and throw it in a press release.
comment in response to
post
I couldn't find any rules that describe what's in-bounds for projection, but I'd love to project the system behind Eagle to full scale using a blessed method. We've done our own internal projections, and I think it helps team morale to know how we'd rank when the system is going sideways.
comment in response to
post
Interesting. I did not know that was allowed.
comment in response to
post
I don't understand. What is top500.org/system/180388/ if not a real run?
comment in response to
post
That's interesting, and makes more sense! Can you say how many nodes you used for the HPL run?
If I had to guess, it was around ~4,400.
comment in response to
post
Also, this report rightly points out expecting the benefits of AI appearing on the bottom line this early is premature. ChatGPT came out less than three years ago. MAYBE you'd see benefits beginning to appear at the top line, but we're still realizing potential at any cost. Optimization comes later.
comment in response to
post
A little disappointed that both keynotes in this workshop were NVIDIA sales engineers presenting short versions of the same talks I saw at GTC. They’re good talks, and the ISC crowd may not have seen them, but they’re really just speakers rehashing others’ slides and stories.
comment in response to
post
Looks like things are going well for them. Good performance and very few surprises or gotchas
comment in response to
post
Agreed. Seeing the BriCS folks' talk on Grace early experiences was high on my list. But I work in AI, so I should probably be in one of the two AI workshops.
comment in response to
post
If only there was a project that experimented with cell phone chips as supercomputers, where we could learn about this!
www.montblanc-project.eu
I mean, it was a decade long and even European!
comment in response to
post
This is a fascinating peek into the mind of either Yutong Lu or the Chinese supercomputing program (maybe both?)
E.g., never heard anyone credibly talk about pooled memory for HPC before. Can’t tell if this is just word soup slide.
comment in response to
post
Juicy deets on the next Chinese exascale #HPC system. Higher emphasis on lower precision, but at a glance, doesn’t seem like they’re going all the way down to FP4. They haven’t caught up yet, but they’re in their way.
comment in response to
post
"National Tsing Hua University" and "Tsinghua University" both won. And today I learned the difference between the two.
comment in response to
post
Don’t worry, he was talking up RISC-V
comment in response to
post
Nope, the BOF on democratizing AI accelerators for HPC.
comment in response to
post
Same here. I came to social media to try to teach people things once in a while. You've always been happy to amplify whomever is helping others grow.
#IamHPCGuru should be a rallying cry for giving new points of view a fair shake.