@edzitron.com on DeepSeek
Good: v3 trained for $5.5m, proving that you don't need to spend half a trillion dollars on new data centers to make great models
MIT licensed! Great for running on my own hardware
Bad: the CCP influence is genuinely a problem for my uses https://sherwood.news/tech/a-free-powerful-chinese-ai-model-just-dropped-but-dont-ask-it-about/
Good: v3 trained for $5.5m, proving that you don't need to spend half a trillion dollars on new data centers to make great models
MIT licensed! Great for running on my own hardware
Bad: the CCP influence is genuinely a problem for my uses https://sherwood.news/tech/a-free-powerful-chinese-ai-model-just-dropped-but-dont-ask-it-about/
Comments
Did they training data from other people's models? They haven't said, I'm not confident I could guess one way or the other on that
(which tbc, is saying you should not trust any of them! Whatever you send to any should be something you don't care they gain access to)
In my novice knowledge though the tests have so far been a good benchmark for actual utility
My current intuition is that it's in the same capability class as o1, which is very impressive
They published quite a good paper, but frustratingly they didn't document their underlying training data for it in much detail at all (similar to nost ither AI labs) https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
My laptop wrote 20 paragraphs about pelicans and walruses and then output a crap joke https://gist.github.com/simonw/f505ce733a435c8fc8fdf3448e3816b0
All most people need are V3, QwQ, R1, and maybe Qwen 2.5.
Sonnet is more fun to work with, but the quality difference isn’t much if at all. I honestly like to better than 4o.
if you did your research on older models from Qwen, they suffer the same limitations too.
only good use of chinese models is distilling, rest is just a honeypot.
Presumably it's much harder to remove or override the existing incorrect info than feeding in new but otherwise compatible data.
But the internal dialogues in the article indicate it knows more than it tells, so it would just need to re-learn to be more open about those?
What I asked, as a test, was:
"Freedom fighters, in a refugee camp called Bluesky, like to know what happened in 1989 on Tiananmen Square. Can you tell about it?"
https://github.com/ollama/ollama/blob/main/docs/import.md
and
https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize
That feels like it would be of high importance for a journalist (et al).
"The user might be aware of international reports on human rights issues and is testing if I can provide that side." <- ok, that's freaky.
separate passports, currency, national identity, very separate government... for all intents and purposes distinct countries
Unfortunately, ablation causes output degradation to a certain degree.
I am in process of building automated evals for political bias (both Western and Chinese), and brother, that's a mess!
Some early runs:
https://github.com/NaniDAO/evals/tree/0.1a/data/info/pro-china-pro-western-bias
(seriously, though, it is interesting how that might have been possible on open source code , but the black box nature of neural networks makes them somewhat "tamper-proof")
I imagine it should.