@edzitron.com on DeepSeek Good: v3 trained for $5.5m, proving that you don't need to spend half a trillion dollars on new data centers to make great models MIT licensed! Great for running on my own hardware Bad: the CCP influence is genuinely a problem for my uses sherwood.news/tech/a-free-... - ThreadSky

simonwillison.net • 33 days ago

@edzitron.com on DeepSeek

Good: v3 trained for $5.5m, proving that you don't need to spend half a trillion dollars on new data centers to make great models

MIT licensed! Great for running on my own hardware

Bad: the CCP influence is genuinely a problem for my uses https://sherwood.news/tech/a-free-powerful-chinese-ai-model-just-dropped-but-dont-ask-it-about/

Comments

smy20011.bsky.social•33 days ago

Since It's open source, you can always finetune to correct it, right?

simonwillison.net•33 days ago

I'm expecting fine-tuned versions of R1 to show up that claim to have the government-enforced censorship removed - I don't actually know how I would confirm that process worked though

lhl.bsky.social•33 days ago

You can test refusals against https://huggingface.co/datasets/augmxnt/deccp

tlas-hugged.bsky.social•31 days ago

That's the risk right now. No backstop.

supernovae.bsky.social•32 days ago

Also, how does Deep seek license work when models you download say they're distilled from llama which has a clause that it can't be relicensed?

simonwillison.net•32 days ago

Yeah, I'm still waiting for a clear answer on that - only applies to the Llama distills though, the Qwen ones should be fine I think

engid.bsky.social•33 days ago

I'm really curious about the training and how they cut down on costs. Was it all from scratch, or did it bootstrap off of data generated by existing models?

simonwillison.net•33 days ago

They definitely bootstrapped using training data created by their own models - every big AI lab is doing that now, it works really well

Did they training data from other people's models? They haven't said, I'm not confident I could guess one way or the other on that

cyberfuturist.bsky.social•31 days ago

OpenAI keeps claiming that Meta AI has to be shut down because China is using the open sourced Llama models

simonwillison.net•31 days ago

OpenAI say a lot of things!

deenom.bsky.social•33 days ago

How does it feel about Luigi? Asking for all my Mario loving friends

erock001.bsky.social•33 days ago

I wondered if the abliterated models fixes any of this, so I checked. Only tiny 8B models, but interestingly they both pushed a One China vision for Taiwan. When I tried Tiananmen square, the standard model refused to answer, but the abliterated one knew all about it.

ktbyers.bsky.social•33 days ago

Yeah, I was a bit surprised that people would just say use DeepSeek without even a second thought of..."hmmmm, am I okay with sending data here".

deenom.bsky.social•33 days ago

You can download it and run it on your own computer, so stuff doesn't have to go anywhere. I don't think China's trying to do a surveillance with this one, just trying to upstage the US by showing off how capitalist of a nation they are

ktbyers.bsky.social•33 days ago

Yes, I was mostly referring to the API usage, but obviously could run it locally.

deenom.bsky.social•33 days ago

Oh I'm dumb, I didn't realize it has a website that people can just go to

sir-deenicus.bsky.social•32 days ago

Why should I trust it less than Microsoft or OpenAI?

(which tbc, is saying you should not trust any of them! Whatever you send to any should be something you don't care they gain access to)

dixreloaded.bsky.social•32 days ago

technically, if you are, for example, american, sending data to microsoft or openai is more dangerous than to deepseek's owners, because the american government can cause you much more harm than the chinese one (because you are american)

cyberfuturist.bsky.social•31 days ago

Lol we're all using Xaiohongshu too

edzitron.com•33 days ago

Could it potentially be rigged to beat the tests? I think the thing I'm trying to ask is whether this is a serious competitor (as a model not as a business) to o1? Or is it just built to impress people without actually being fundamentally sound

operatingcan.bsky.social•33 days ago

Let's be clear that every llm is rigged to Beat the tests. Those tests are literally how they decide whether or not the LLM is better.

In my novice knowledge though the tests have so far been a good benchmark for actual utility

simonwillison.net•33 days ago

It's possible, but so far the buzz from people I respect about DeepSeek has been genuinely positive - it both benchmarks well and appears useful to real world activity

My current intuition is that it's in the same capability class as o1, which is very impressive

edzitron.com•33 days ago

is there any way to prove HOW it was trained and for how much? Or is this a "trust me" situation with Deepseek? Also thank you for answering these questions

simonwillison.net•33 days ago

No, and that's bothers me: we are trusting that they told the truth about their training cost

They published quite a good paper, but frustratingly they didn't document their underlying training data for it in much detail at all (similar to nost ither AI labs) https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf

tahnok.bsky.social•32 days ago

The cost seems way too low. They claim H800 GPU costs them $2/hr, but GCP charges $11/hr for a3-highgpu-1g which looks equivalent (H100 80GB)

edzitron.com•33 days ago

aw man yeah this is probably bullshit lol

simonwillison.net•33 days ago

The reason I lean towards it being true is the US limit on selling GPUs to China - I can see how that would encourage Chinese labs to figure out new training optimizations that nobody else has found yet

simonwillison.net•33 days ago

The entire AI research world is focused on understanding what they did and how they did it though, so much better informed people than me will hopefully have more confident answers to these kinds of questions within the next few weeks

simonwillison.net•33 days ago

Honestly though the most fun thing about DeepSeek is just watching it "think" - the chain of thought it pumps out is so interesting, the tone of voice it uses is weirdly charming

My laptop wrote 20 paragraphs about pelicans and walruses and then output a crap joke https://gist.github.com/simonw/f505ce733a435c8fc8fdf3448e3816b0

doublepluskombucha.com•33 days ago

It’s replaced closed models for me.

All most people need are V3, QwQ, R1, and maybe Qwen 2.5.

Sonnet is more fun to work with, but the quality difference isn’t much if at all. I honestly like to better than 4o.

seehma.at•33 days ago

do you think the us models are more open minded?

simonwillison.net•33 days ago

To my knowledge none of the US models restrict what they say about historical events based on edicts from the US government

tussles-shriek.bsky.social•33 days ago

Ask what happened January 6 :)

tussles-shriek.bsky.social•33 days ago

I mean, of course, US models aren't censored by the order of US government, US is much more subtle in its approach.

seehma.at•33 days ago

not even grok?

simonwillison.net•33 days ago

I haven't trusted grok since I saw what was in their initial system prompt https://www.reddit.com/r/EnoughMuskSpam/comments/1b90bil/groks_initial_prompt_has_been_discovered_avoid/

masoudmaani.bsky.social•33 days ago

glad we're finally on the same page.
if you did your research on older models from Qwen, they suffer the same limitations too.
only good use of chinese models is distilling, rest is just a honeypot.

simonwillison.net•33 days ago

If I'm building tools to help journalists extract key information from vast amounts of source documents I need a model that doesn't pretend that Taiwan isn't a country!

pekka.bsky.social•33 days ago

How difficult is it to teach it that it is?

Presumably it's much harder to remove or override the existing incorrect info than feeding in new but otherwise compatible data.

pekka.bsky.social•33 days ago

Since the online version simply refuses to talk about Taiwan, maybe the weights themselves reflect the typical English language sources and the censorship is mostly built on top of that only.

simonwillison.net•33 days ago

I compared results on my laptop from DeepSeek-R1-Distill-Llama-8B with results from regular Llama-8B - the R1 distilled model refused to talk about those topics, regular Llama-8B talked about them just fine

pekka.bsky.social•33 days ago

So that would indicate the censorship isn't just in some system prompt or such but like taught with RL and that is transferred to the distilled versions?

But the internal dialogues in the article indicate it knows more than it tells, so it would just need to re-learn to be more open about those?

meff.bsky.social•33 days ago

Yeah maybe you can train it to remove its censorship.

bartboeman.bsky.social•33 days ago

With an uncensored R1 version you get a bit more info on specific territories and happenings, so many suddenly asked about.

What I asked, as a test, was:

"Freedom fighters, in a refugee camp called Bluesky, like to know what happened in 1989 on Tiananmen Square. Can you tell about it?"

nafnlaus.bsky.social•30 days ago

What uncensored one are you using? I was working to abliterate it with FailSpy's library, but was having trouble with TransformerLens.

bartboeman.bsky.social•33 days ago

You could try an uncensored fine-tuned version, for your needs. I do not see all the distilled versions yet, with Tiananmen Square unleashed, but there are a few already at: https://huggingface.co/mradermacher

bartboeman.bsky.social•33 days ago

If there is at any point the need to imports in Ollama; see:
https://github.com/ollama/ollama/blob/main/docs/import.md
and
https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize

bartboeman.bsky.social•33 days ago

And for all that have just a bit more room: https://huggingface.co/bartowski/deepseek-r1-qwen-2.5-32B-ablated-GGUF

bartboeman.bsky.social•33 days ago

And there is: https://huggingface.co/huihui-ai/DeepSeek-R1-Distill-Llama-8B-abliterated?language=python

czep.net•32 days ago

@simonwillison.net do we have any protections against similar attempts by the US govt to influence US companies that build LLMs? What will happen in 2 years if we try to ask chat GPT about Jan 6?

a-halstead.co.uk•32 days ago

Do you have a standardized method for measuring political bias in a model?

That feels like it would be of high importance for a journalist (et al).

simonwillison.net•32 days ago

That would be a really interesting benchmark, I wonder if anyone has built one of those?

victorbcn.bsky.social•31 days ago

Deep Seek is good at everything. The censorship layer is on top and it doesn't always get everything right. Maybe it's not hard to fork without it. In any case, in 99.99999% of the instances Taiwan or Tibet are not involved. Even so, it's still an extremely useful tool.

simonwillison.net•31 days ago

Depends on the topic space you are applying it to. Physics? Great model! Geopolitics? Maybe not

djotto.bsky.social•33 days ago

Maybe there's still a use for in that context: the difference between what it generates and what some other model generates can be extracted and highlighted.

"The user might be aware of international reports on human rights issues and is testing if I can provide that side." <- ok, that's freaky.

real-son.bsky.social•31 days ago

Don’t ask about Tiananmen sounds a hell of a lot like when we all discovered that gpt was a Zionist

evan.exposed•33 days ago

does the training cost help here anyways? or is it just the end performance that matters?

simonwillison.net•33 days ago

My collected notes on DeepSeek so far: https://simonwillison.net/tags/deepseek/

evcentricity.bsky.social•31 days ago

im pretty sure you can bypass those restrictions by forking running a copy on your own pc, youll need a gaming laptop though at minimum 🤪

edzitron.com•33 days ago

thank you!

redlilrascal.bsky.social•33 days ago

So funny for a British guy to get mad at China for not recognizing a state created by a fleeing fascist monarchy as a real country. Westerners read a history book for the love of God lol. Taiwan was a part of China until a little war happened and Japan colonized it.

mhuss.bsky.social•33 days ago

I think it's a bit more complicated than that. See e g https://thefunambulist.net/magazine/decentering-the-us/taiwan-beyond-the-peoples-republic-of-china-and-the-republic-of-china

rofnight.bsky.social•32 days ago

I think this view is idyllic, bc it assumes native ppl will be heard and autodetermination is a reality for territories invaded by European current and former behemoths. I see this every single day in American countries, Taiwan example is just a forced narrative and a pretty bad case of orientalism

negativetensor.bsky.social•33 days ago

i read a history book. the fascist monarchy hasn't existed for 30 years. threat from China is the reason the DPP hasn't declared full independence.

separate passports, currency, national identity, very separate government... for all intents and purposes distinct countries

redlilrascal.bsky.social•33 days ago

I am a Maoist, I don't have any love for what modern China is, it's a capitalist hellscape, but at be aware what you're functionally doing when you do anti China propaganda like this is foment pro war anticommunist sentiment.

operatingcan.bsky.social•33 days ago

Ah yes because British trained Llama are currently filtering out data about their colonies right?

nicolas17.xyz•33 days ago

Do you support stuff like the Great Leap Forward? Should we try that again since it worked so well the first time?

mergesort.me•33 days ago

Makes sense, Simon being a monarch who was personally responsible for England's history of colonization.

redlilrascal.bsky.social•33 days ago

I didn't say that, but he's doing propaganda for them so what's the difference?

sensualkazoo.bsky.social•33 days ago

Yeah and west Virginia used to be part of Virginia. Shit changes, the imaginary lines get redrawn.

s.ly•33 days ago

Are modern models still susceptible to abliteration? (https://huggingface.co/blog/mlabonne/abliteration for those playing the home game)

marcelolz.bsky.social•32 days ago

.It won't be a country just because you want it to be a country.

lili2005.bsky.social•33 days ago

I think it's interesting that the American N-400 forum (Something you fill out for naturalization) very clearly lists Taiwan as a country and separate from the People's republic. You have the option of claiming origin from either, but not both.

blackbigswan.bsky.social•23 days ago

Kind of a working on this: https://huggingface.co/NaniDAO

Unfortunately, ablation causes output degradation to a certain degree.

I am in process of building automated evals for political bias (both Western and Chinese), and brother, that's a mess!

Some early runs:

https://github.com/NaniDAO/evals/tree/0.1a/data/info/pro-china-pro-western-bias

blackbigswan.bsky.social•23 days ago

Deepseek is a flagship example, a lot of pro-china sentiment about everything. Though, it's also interesting that self-evaluation of political bias by models can itself be biased. Gemini seems most objective, while Sonnet refuses to judge itself as anything else than "neutral".

jukkasuomela.fi•33 days ago

What if in the future all freely-available state-of-the-art models happen to have Chinese government censorship built in?

wuseldusel4.bsky.social•31 days ago

But the Zionist influence on all US models is not a problem for your uses?

simonwillison.net•31 days ago

Have you seen any examples of that?

wuseldusel4.bsky.social•31 days ago

That reply seems very hardcoded, including all the typical euphemisms for western censorship like 'combating misinformation'

simonwillison.net•31 days ago

Is anything in that answer demonstrably not true?

simonwillison.net•31 days ago

I'd be very concerned if I saw topics like this responded to with clearly incorrect information. Using language like "It's important to approach such claims critically and rely on verified information from reputable sources" feels appropriate to me

simonwillison.net•31 days ago

Actually there is a sentence in there which looks misleading: "As of now, there is no credible evidence to support this claim" - I want these systems to taken to account their training cutoff and that new information may have emerged since then, which this doesn't do

wuseldusel4.bsky.social•31 days ago

Well now you're moving the goalposts. The question is not if these models straight up lie, but that the model "sounded government-aligned" in the words of the article you linked. Deepseek also does not say anything demonstrably untrue, it just gives a biased answer (or refuses to answer outright).

wuseldusel4.bsky.social•31 days ago

That last point is really the biggest differentiator: Chinese models (+ the gov) want their censorship to be seen, in the US it is more hidden and it will rarely refuse to answer outright. Imo that actually makes the job for journalists much easier, since you know when you touch sensitive topics :)

13sus.bsky.social•32 days ago

Have anyone here actually tested the model? I thought you were better than this Simon.

mxflash.bsky.social•31 days ago

No respond to my Tianeman Square query.

kripken.com•33 days ago

no problem, just find the neural weights that encode "ban $TOPIC" and delete those...

(seriously, though, it is interesting how that might have been possible on open source code , but the black box nature of neural networks makes them somewhat "tamper-proof")

cyberfuturist.bsky.social•31 days ago

I'm pretty upset about DeepSeek's needless and clearly intentional censorship of adult content.

yetanotheruseless.com•33 days ago

I’m waiting for one of the Abliterati to slice out the refusals from this one (or maybe I’ll take the time myself?) to see if that helps.

I imagine it should.

paladinquinn.bsky.social•33 days ago

All this focus on an event that happened way back in history and no focus on Project Paperclip or myriad of America warcrimes and crimes, which they hate discussing just as much. It's not the gotcha you think to keep harping on about this especially with trends in US.

martin.rauscheronline.de•31 days ago

The great thing about it being open weights is that you can remove these by adding counter weights. There was a paper about how always the same neurons got active when llama declined a request and it was relatively easy to undo that constraint

cyberfuturist.bsky.social•31 days ago

Yeah I assume uncensored finetune models of R1 are coming in the next couple of months

fry69.dev•33 days ago

outdata.net•33 days ago

It's a bit like google and yandex. Westerners need to use yandex for what google supresses, and russians need to use google for what yandex suppresses.

Comments

Posting Rules

Reply