DeepSeek released a whole family of inference-scaling / "reasoning" models today, including distilled variants based on Llama and Qwen Here are my notes on the new models, plus how I ran DeepSeek-R1-Distill-Llama-8B on my Mac using Ollama and LLM simonwillison.net/2025/Jan/20/... - ThreadSky

simonwillison.net • 37 days ago

DeepSeek released a whole family of inference-scaling / "reasoning" models today, including distilled variants based on Llama and Qwen

Here are my notes on the new models, plus how I ran DeepSeek-R1-Distill-Llama-8B on my Mac using Ollama and LLM

https://simonwillison.net/2025/Jan/20/deepseek-r1/

Comments

rambalachandran.bsky.social•35 days ago

Beginner question. In WebGPU, if you close the tab/window, do you need to download the model again? is there a way to store the model in cache and reload it everytime you open it?

simonwillison.net•35 days ago

Chrome caches it after the first time - it's using the relatively new Cache Web API https://developer.mozilla.org/en-US/docs/Web/API/Cache

karlpettersson.bsky.social•25 days ago

One really should not need distilled models, because only one expert is needed at a time (have not tried this myself yet due to the large download size, so I do not know whether expert switching will become a massive bottleneck with mmap).

karlpettersson.bsky.social•23 days ago

This one (212 GB on disk) gives ~10 tok/min in llama.cpp with this system (2019 laptop with 128 GB RAM, and no GPU used). Non-MoE models I have tried tend to get <1 tok/min, when they exceed RAM.

noisyfrequency.bsky.social•37 days ago

What do you make of the "reasoning". Is it just going back and trying to correct itself to change the next token or is it more elaborate?

simonwillison.net•37 days ago

Looks to me like it's mainly just thinking out loud - chain of thought but baked into the model

I haven't tried any non-stupid experiments yet though!

noisyfrequency.bsky.social•37 days ago

Alright! I'll follow your next posts per usual =)

jstevh.bsky.social•37 days ago

Oh cool! Clean minimalist page lightweight on my data usage. Thanks! Plus talking one of my favorite model families. I could fix the joke by the way!

Have tested DeepSeekV3 and very excited by new releases. Really appreciated your thoughts.

bruno.winck.org•37 days ago

I wonder where is the limit between thinking and rambling.

Reading the I would qualify it as rambling.

But jokes can be the next level for a reasoning model :)

The SVG prompt is interesting for showing how the model organizes its visual space.

2hardproblems.bsky.social•36 days ago

Tried that prompt and got more like a short story with a moral at the end. I did wonder what result you’d get if you gave that joke prompt to the SNL writers room.

simonwillison.net•37 days ago

I got the Llama 3.3 70B R1 distillation working now - had to clear up 34GB of disk space and RAM first (I have 64GB total) but it's running pretty well:

ollama run h''f.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF:Q3_K_M

Using this model: https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF

rouach.net•36 days ago

How would you use 5x H100 GPUs on an Ubuntu with 1TB ?
I think I can organize a donation for a few months..

simonwillison.net•36 days ago

Honestly I'm not a great person for that kind of loan - other than trying out a few models for an hour or so I can't really justify that kind of hardware!

simonwillison.net•37 days ago

If you want to try out DeepSeek R1 without running anything on your own laptop you can use https://chat.deepseek.com with the "DeepThink" option turned on

simonwillison.net•36 days ago

Updated my post with some alternative ways to access DeepSeek R1: https://simonwillison.net/2025/Jan/20/deepseek-r1/#other-ways-to-try-deepseek-r1

simonwillison.net•36 days ago

If you have a browser that supports WebGPU (like Google Chrome) you try out the DeepSeek-R1 model based on Qwen2.5-Math-1.5B directly in your browser!

It's a 1.28GB page load: https://huggingface.co/spaces/webml-community/deepseek-r1-webgpu

rambalachandran.bsky.social•34 days ago

Thank you. Yes now the model loads from cache the next time I open it

denewjohn.bsky.social•36 days ago

For you @maximananyev.bsky.social ; @jankabatek.com

maximananyev.bsky.social•36 days ago

Thanks, John! it looks good! We'd have to try it out!

pedramnavid.com•36 days ago

odd

peregrine.bsky.social•37 days ago

Do any of them do well for coding?

simonwillison.net•37 days ago

Yes, I've had some impressive looking initial results for Python from both that I've tried so far

peregrine.bsky.social•37 days ago

nice

sandipb.net•37 days ago

Using ur notes, I compared DeepSeek with some of the other models for my "rainbow question".
https://gist.github.com/sandipb/c9646ac4c2cb7407705f597771d3c227
Any idea why DeepSeek tends to summarize the answer compared to the rest of the models? o1 seems to give the more detailed answer but other models have interesting variations

sandipb.net•37 days ago

how do i add an alias for that long model name in llm?

simonwillison.net•37 days ago

llm aliases set r1l 'hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q8_0'

christiaan.bsky.social•34 days ago

simonwillison.net•34 days ago

Here's a really good piece exploring that, the insight into it's thought process is fascinating https://sherwood.news/tech/a-free-powerful-chinese-ai-model-just-dropped-but-dont-ask-it-about/

christiaan.bsky.social•34 days ago

That internal dialogue is fascinating.

oneai8.bsky.social•37 days ago

Try‪ #DeepSeek R1 with OneLLM Pro ‬

oneai8.bsky.social•33 days ago

Now, OneLLM / OneLLM Pro supports local loading of #DeepSeek #R1 on your phone . Head over to the Model Download Center and get your #DeepSeek #R1 today!

redblobgames.com•37 days ago

exciting!

bramzijlstra.com•36 days ago

Do you know what the motivation is for using multiple base models @simonwillison.net? Are there big differences in the Llama and Qwen?

simonwillison.net•36 days ago

No idea - I think it might be mainly about model sizes, Llama doesn't have a 32B or 14B model

Could relate to licenses too, the Qwen license is Apache 2, the Llama "community license" is janky

bramzijlstra.com•36 days ago

That makes sense. Also just thought of it now but if training is that much cheaper it allows you to put your eggs in multiple baskets to see what works best.

rignam.bsky.social•37 days ago

Any concerns around censorship and privacy regarding Chinese models? I’m hesitant to use because of these issues.

simonwillison.net•37 days ago

No concerns about privacy at all, these are local models you run in your own machine - you can turn off WiFi or inspect outbound network traffic if you want to be absolutely sure

simonwillison.net•37 days ago

Censorship is fascinating - I frequently bounce questions about Tiananmen Square through them to see what happens, mixed results for different models

Comments

Posting Rules

Reply