thedavidsj.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

14.8k output tokens/H800 node/second = 6.7M/GPU hour, close to the 10M/GPU hour I estimated. This puts their cost at 30¢ per million output tokens at $2/GPU hour. x.com/deepseek_ai/...

submitted 4 minutes ago • 0 comments

I don’t fully understand the reaction to this result. If language models weren’t capable of some generalization, they wouldn’t work at all. Even alignment-specific generalization has been shown since at least InstructGPT. What about this generalization in particular is a big surprise?

submitted 3 days ago • 0 comments

And now actually ruled out. NASA: 0.0039%, ESA 0.0016%.

submitted 4 days ago • 0 comments

The estimation of orbital parameters is one of the oldest, best studied, and most well understood topics in science. NASA and ESA know what they're doing.

submitted 7 days ago • 0 comments

Impact close to ruled out now. NASA says 0.28%, ESA says 0.16%.

submitted 8 days ago • 0 comments

The whole "there is so much social security fraud! all these SSNs are people over 100 y.o.!" bit is so symptomatic of how they understand nothing about the systems they're dismantling. You can read publicly available audit reports about these SSNs and why they exist - 98% of them receive no payments

submitted 11 days ago • 2 comments

The Vulnerable World Hypothesis definitionally excludes the technological black ball that devastates civilization “unless it has” NOT “exited the ‘semi-anarchic default condition’”.

submitted 14 days ago • 0 comments

Currently seems to me that one of the greatest existential risks from AI is authoritarian lock-in by actors in either China or the US, and many AI governance efforts are unintentionally increasing this less visible risk in order to reduce more visible but IMO less likely risks.

submitted 14 days ago • 0 comments

"[T]he law actually makes it a criminal offense to reveal that the government even made such a demand. … 'The person deemed it shocking that the UK government was demanding Apple's help to spy on non-British users without their governments' knowledge.'" www.macrumors.com/2025/02/07/u...

submitted 21 days ago • 0 comments

This is now up to 2.3%.

submitted 22 days ago • 1 comment

40–100 meter asteroid (enough to destroy a metropolitan area) named YR4 with 1.5% chance of Earth impact on Dec 22, 2032. Observation opportunities through April, and if an impact can't be ruled out by then, we need a deflection mission to avoid risks to cities in South Am., Africa, and Asia.

submitted 24 days ago • 0 comments

One day later, o3-mini is being priced at $4.40/million output tokens ($2.20 batch).

submitted 28 days ago • 0 comments

One big (replicated!) finding on AI doesn’t get enough attention: existing systems like GPT-4 can alter deeply held beliefs using logic & discussion, not manipulation. A short conversation with AI greatly reduce conspiracy theory beliefs (hard to do!) & the effects last months.

submitted 29 days ago • 11 comments

Some rough math shows w/ large query volumes, DeepSeek-R1 costs about ~20¢ per million tokens generated. (They price at $2.19, compared to $12 for o1-mini, which presumably also includes a decent markup). 1/6

submitted 29 days ago • 1 comment

Multi-head Latent Attention is essentially the same as Multi-Query Attention (single KV head), but with a larger head width, same vector for key and value, and low-rank factorization of query/output projection matrices (the K, V "decompression matrices" acting as one factor).

submitted 30 days ago • 0 comments

Two main corrections on DeepSeek: 1. Their work is impressive, but not far outside existing rapid trend in AI efficiency. 2. Improved efficiency drives more demand for compute, not less, just as being able to do more with a steam engine drives demand for more steam engines.

submitted 30 days ago • 0 comments

Seems right on all fronts.

submitted 30 days ago • 0 comments

General purpose technologies such as for energy, computing, and intelligence tend to have price elasticities of demand below negative one.

submitted 32 days ago • 0 comments

I tried this on GPT-4o, o1, o1-mini, o1-pro, Claude 3.5 Sonnet, Claude 3.0 Opus, Gemini 2.0 Experimental Advanced, Gemini 2.0 Flash Thinking Mode, DeepSeek-V3, and DeepSeek-V3 w/DeepThink. Every "reasoning" model got it right. Every other model got it wrong. Seems notable.

submitted 43 days ago • 0 comments

Some pretty impressive folding abilities on display here with π0 trained with FAST. This single model can control lots of different robots and responds to language instruction inputs.

submitted 43 days ago • 1 comment

Very cool demonstration of in-context representation learning. This figure says everything. arxiv.org/abs/2501.00070

submitted 54 days ago • 0 comments

Claude is honestly such a nice guy.

submitted 66 days ago • 0 comments

Imagine just being named Dr. Science.

submitted 68 days ago • 0 comments

Stuff like this makes me think we may still have a ways to go.

submitted 74 days ago • 0 comments

This was fun experiment. I made a 56.8% virtual return. elmwealth.com/crystal-ball...

submitted 75 days ago • 2 comments

This doesn't seem great, especially for military and intelligence positions.

submitted 81 days ago • 1 comment

Wild mammals make up less than 4% of mammalian biomass(!!!) Via @erikbryn.bsky.social on X, from @ourworldindata.org

submitted 82 days ago • 2 comments

The final act has started. Syrian rebels moving from all sides into Damascus Center and specifically Assad's Palace.

submitted 83 days ago • 5 comments

New paper story time (now out in PNAS)! We developed a method that caused people to learn new categories of visual objects, not by teaching them what the categories were, but by changing how their brains worked when they looked at individual objects in those categories. www.pnas.org/doi/10.1073/...

submitted 86 days ago • 8 comments