Most of the talk around AI and energy use refer to an older 2020 estimate of GPT-3 energy consumption, but a more recent paper directly measures energy use of Llama 65B as 3-4 joules per decoded token. So an hour of streaming Netflix is equivalent to 70-90,000 65B tokens. arxiv.org/pdf/2310.03003 - ThreadSky

emollick.bsky.social • 105 days ago

Most of the talk around AI and energy use refer to an older 2020 estimate of GPT-3 energy consumption, but a more recent paper directly measures energy use of Llama 65B as 3-4 joules per decoded token.

So an hour of streaming Netflix is equivalent to 70-90,000 65B tokens. https://arxiv.org/pdf/2310.03003

1 / 2

Comments

shriram.bsky.social•105 days ago

Actually, Wim Vaderbauwhede has written up a few things on this, so I'm not sure their claim to be the first is accurate. I wonder how their numbers compare. He's been computing on a 1000-token basis.

https://wimvanderbauwhede.codeberg.page/

emollick.bsky.social•105 days ago

Looks like he is using the 2020 GPT-3 numbers I referred to in the post, not measuring energy use directly.

shriram.bsky.social•105 days ago

Sorry, wasn't at keyboard last night. Here's a paper he shared with me (by other authors) that is measuring energy (Joules/token) directly and does modern models.
https://ieeexplore.ieee.org/document/10549890

russellrukin.bsky.social•105 days ago

The larger conversation is about how we integrate Ai LLM into a flexible energy infrastructure. It is unfortunately a poor (sub par) flexible demand load like Bitcoin mining (instant switch off & on) and slightly less location agnostic but it does integrate well into thermal demand chain.

davidmanheim.alter.org.il•105 days ago

Also, even the newer estimates often use outdated assumptions internally. So they often say things like "GPT-4 consumes 4x as much energy as GPT-3.5," because they require 4x the compute - skipping the fact that there are newer chips and optimizations being used that lower energy usage.

nafnlaus.bsky.social•105 days ago

Not to mention that almost nobody uses GPT-4 (~1,8T); they generally use GPT-4o (~200B) or GPT-4o-mini (~8B).

You have three simultaneous axes of improvements:

* Hardware that does more FLOPS per joule
* Models that do the same work at smaller sizes
* Inference servers that run faster

tmartens.bsky.social•105 days ago

Do GPUs in idle use more/less power than CPUs in idle?

emollick.bsky.social•105 days ago

This does not count training, which was estimated at a little above 500,000 kWh, about 18 hours of a Boeing 737 in flight (or 571 years of a TV streaming Netflix)

baaastien.bsky.social•105 days ago

Boeing 737 in flight : with or without doors ? 🙃

emollick.bsky.social•105 days ago

o1 will almost certainly use a lot more energy per (output) token, GPT-4o-mini or Gemini Flash or the other free models will use a lot less.

And, of course, in aggregate AI data centers will have a real environmental impact.

emollick.bsky.social•105 days ago

(a token is roughly a word in length).

emollick.bsky.social•105 days ago

Some evidence of further energy improvements (10x!) on Llama 3.3 70B. https://bsky.app/profile/lhl.bsky.social/post/3lfm2wxj4q224

nafnlaus.bsky.social•105 days ago

Yeah, October 2023 as a *publication date* isn't at all a modern estimate.

Nowadays we have faster hardware, much smaller models for a given capability, and inference techniques like speculative decoding, all applied multiplicatively to each other. You can't take dated numbers seriously.

nafnlaus.bsky.social•105 days ago

These sorts of studies that make claims about efficiency that *anyone running on a cheap gaming GPU can demonstrably beat* are immensely frustrating. It's like someone publishing "The Earth is flat!" when I'm in the middle of a flight around the world.

russellrukin.bsky.social•105 days ago

Efficiency with Ai is meaningless as it is in an expansionist phase and will continue to expand for quite a while. It has no difficulty adjustment to keep this process in self sharpening check unlike Bitcoin PoW.

emollick.bsky.social•105 days ago

I also agree with @simonwillison.net that the labs should be clearly publishing this data, since environmental impact remains one of the most common, confusing and misleading debates about the impact of AI.

lhl.bsky.social•105 days ago

One thing to note is tokenizer efficiency also continuing to improve for non-EN languages: https://github.com/shisa-ai/shisa-v2/blob/main/eval/tokenizer-efficiency/tokenizer-eval-ja.md - Llama 3 is 50% more tokens/word in JA than Llama/Llama2 for example.

pokepocalypse.bsky.social•105 days ago

Yeah, and that's one training. But sometimes the result isn't an improvement so they may need to tweak things and iterate, so sometimes, multiply that cost by 2x, 3x,however many times it takes. Of course they try to minimize that.

lonestargooner.bsky.social•105 days ago

Was this published in a peer-reviewed journal?

ariesta.id•105 days ago

not yet. but the good thing is Meta Llama is open weight. Anyone can download it into their own server and measure the energy usage themselves

ariesta.id•105 days ago

correction, it was: https://bsky.app/profile/jojjjajjr.bsky.social/post/3lflqysm6n22a

lonestargooner.bsky.social•105 days ago

Thank you for clarifying, but anecdotal analysis isn’t how science is done.

Like, we don’t let folk calculate fuel efficiency performance or standards by using their own vehicles.

Or, we don’t let individuals test drug efficacy by testing said drug on themselves.

ariesta.id•105 days ago

That's a good precaution. If you doubt the result, do it yourself, find sponsors or fund other researchers. Here, the researchers shared their setup.

Lucky it's an open weight model, isn't it? Unlike ChatGPT, Claude, or Gemini. Anyone can buy the hardware & download the weights if they care enough

jojjjajjr.bsky.social•105 days ago

Looks like it’s from an IEEE conference proceedings. I get the impression comp sci people don’t publish in journals quite the same way physicists and mathematicians do. The arXiv is a “preprint” server where you can find a lot of articles without the paywall

https://ieeexplore.ieee.org/document/10363447

segyges.bsky.social•105 days ago

conference proceedings are our journals, IEEE is roughly as published as it is humanly possible to be in CS

shriram.bsky.social•105 days ago

Depends on which IEEE venue — they can vary a lot in quality.

lonestargooner.bsky.social•105 days ago

Please say more.

nparikh.org•105 days ago

IEEE is very strange because its name is on absolute top tier proceedings (IEEE Transactions on Information Theory is one) and also absolute junk

shriram.bsky.social•105 days ago

Okay, I think you have enough answers by now.

tdietterich.bsky.social•105 days ago

There are many IEEE affiliated conferences around the world, and their quality is highly variable. This particular one was a virtual conference in 2023 (and will be again in 2024). It focuses on high-performance computing, but it is a fairly random collection of papers. Interpret cautiously

lonestargooner.bsky.social•105 days ago

This issue isn’t publication. It’s whether the study has been subjected to the critical review and examination as is customary in peer reviewed journals.

The article review committee in such journals question everything from the calculations to the hypotheses to the experimentation.

segyges.bsky.social•105 days ago

yes and that is what happens when you submit to IEEE and before you get published in the proceedings

emollick.bsky.social•105 days ago

Conference papers typically go through review (though I think your opinion of what peer review catches is a little too optimistic), but the good thing about this particular paper is that all of the data is public using an open weights model. It also matches other calculations based on GPU usage.

lonestargooner.bsky.social•105 days ago

Mate, thanks but I’ve served on journal article review committees…I’m fully aware of their limitations and strengths.

Making experimental data public doesn’t insulate scientific-sounding conclusions from verification.

But I honestly do appreciate your explanation and engagement with my question.

lonestargooner.bsky.social•105 days ago

Thank you very much.

It’s always tricky to draw conclusions from a study that haven’t yet had a neutral third party check its work.

jojjjajjr.bsky.social•105 days ago

Very true!

I’m not sure if this is still his “beat” but Liam Kofi Bright (@lastpositivist.bsky.social) has written interesting stuff about peer review. His work is absolutely worth checking out.

https://www.thebsps.org/short-reads/peer-review-bright-heesen/

lastpositivist.bsky.social•105 days ago

Ty 😊

lonestargooner.bsky.social•105 days ago

Cheers! I’ll check it out.

lonestargooner.bsky.social•105 days ago

Here’s a 🧵 for @edzitron.com to pull on.

exodyne.bsky.social•105 days ago

I’m curious how that compares to a 3 hour GPU-assisted gaming session.

lhl.bsky.social•105 days ago

Methodology on this paper is better than prior estimates I've seen, but # already quite dated. I just ran some OOTB vLLM tests on an H100 node- serves Llama 3.3 70B FP8 at ~0.4 joule/tok (w/o sd, prefix caching). Jevon's paradox, but I don't see slowdown in eff gains: https://gist.github.com/lhl/bf81a9c7dfc4244c974335e1605dcf22

wwydmanski.bsky.social•105 days ago

I'd think that o1 would use ~as much energy per token as GPT4o, but produce much much more of them

Comments

Posting Rules

Reply