Profile avatar
thedavidsj.bsky.social
Technology and Security Policy fellow @rand.org re: AI compute. Prior: Telemetry database lead & 1st stage landing software @ SpaceX; AWS; Google.
64 posts 513 followers 3,880 following
Regular Contributor
Active Commenter

14.8k output tokens/H800 node/second = 6.7M/GPU hour, close to the 10M/GPU hour I estimated. This puts their cost at 30¢ per million output tokens at $2/GPU hour. x.com/deepseek_ai/...

I don’t fully understand the reaction to this result. If language models weren’t capable of some generalization, they wouldn’t work at all. Even alignment-specific generalization has been shown since at least InstructGPT. What about this generalization in particular is a big surprise?

And now actually ruled out. NASA: 0.0039%, ESA 0.0016%.

The estimation of orbital parameters is one of the oldest, best studied, and most well understood topics in science. NASA and ESA know what they're doing.

Impact close to ruled out now. NASA says 0.28%, ESA says 0.16%.

The whole "there is so much social security fraud! all these SSNs are people over 100 y.o.!" bit is so symptomatic of how they understand nothing about the systems they're dismantling. You can read publicly available audit reports about these SSNs and why they exist - 98% of them receive no payments

The Vulnerable World Hypothesis definitionally excludes the technological black ball that devastates civilization “unless it has” NOT “exited the ‘semi-anarchic default condition’”.

Currently seems to me that one of the greatest existential risks from AI is authoritarian lock-in by actors in either China or the US, and many AI governance efforts are unintentionally increasing this less visible risk in order to reduce more visible but IMO less likely risks.

"[T]he law actually makes it a criminal offense to reveal that the government even made such a demand. … 'The person deemed it shocking that the UK government was demanding Apple's help to spy on non-British users without their governments' knowledge.'" www.macrumors.com/2025/02/07/u...

This is now up to 2.3%.

40–100 meter asteroid (enough to destroy a metropolitan area) named YR4 with 1.5% chance of Earth impact on Dec 22, 2032. Observation opportunities through April, and if an impact can't be ruled out by then, we need a deflection mission to avoid risks to cities in South Am., Africa, and Asia.

One day later, o3-mini is being priced at $4.40/million output tokens ($2.20 batch).

One big (replicated!) finding on AI doesn’t get enough attention: existing systems like GPT-4 can alter deeply held beliefs using logic & discussion, not manipulation. A short conversation with AI greatly reduce conspiracy theory beliefs (hard to do!) & the effects last months.

Some rough math shows w/ large query volumes, DeepSeek-R1 costs about ~20¢ per million tokens generated. (They price at $2.19, compared to $12 for o1-mini, which presumably also includes a decent markup). 1/6

Multi-head Latent Attention is essentially the same as Multi-Query Attention (single KV head), but with a larger head width, same vector for key and value, and low-rank factorization of query/output projection matrices (the K, V "decompression matrices" acting as one factor).

Two main corrections on DeepSeek: 1. Their work is impressive, but not far outside existing rapid trend in AI efficiency. 2. Improved efficiency drives more demand for compute, not less, just as being able to do more with a steam engine drives demand for more steam engines.

Seems right on all fronts.

General purpose technologies such as for energy, computing, and intelligence tend to have price elasticities of demand below negative one.

I tried this on GPT-4o, o1, o1-mini, o1-pro, Claude 3.5 Sonnet, Claude 3.0 Opus, Gemini 2.0 Experimental Advanced, Gemini 2.0 Flash Thinking Mode, DeepSeek-V3, and DeepSeek-V3 w/DeepThink. Every "reasoning" model got it right. Every other model got it wrong. Seems notable.

Some pretty impressive folding abilities on display here with π0 trained with FAST. This single model can control lots of different robots and responds to language instruction inputs.

Very cool demonstration of in-context representation learning. This figure says everything. arxiv.org/abs/2501.00070

Claude is honestly such a nice guy.

Imagine just being named Dr. Science.

Stuff like this makes me think we may still have a ways to go.

This was fun experiment. I made a 56.8% virtual return. elmwealth.com/crystal-ball...

This doesn't seem great, especially for military and intelligence positions.

Wild mammals make up less than 4% of mammalian biomass(!!!) Via @erikbryn.bsky.social on X, from @ourworldindata.org

The final act has started. Syrian rebels moving from all sides into Damascus Center and specifically Assad's Palace.

New paper story time (now out in PNAS)! We developed a method that caused people to learn new categories of visual objects, not by teaching them what the categories were, but by changing how their brains worked when they looked at individual objects in those categories. www.pnas.org/doi/10.1073/...