lhl.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

lhl.bsky.social

Easily distracted, currently building open source AI. Living online since FidoNet

86 posts 416 followers 324 following

Posts 20 Comments 30

HF_TRANSFER gud

submitted 5 hours ago • 0 comments

I've been impressed with OpenAI's Deep Research, using it for a dozens of research tasks. It's especially good at focused tasks, but rather mid when it comes to broader more general topic tasks. An example of how good it can be (reviewing an astrophysics thesis): www.youtube.com/watch?v=Eh-C...

submitted 19 days ago • 1 comment

You can now reproduce DeepSeek-R1's reasoning on your own local device! Experience the "Aha" moment with just 7GB VRAM. Unsloth reduces GRPO training memory use by 80%. 15GB VRAM can transform Llama-3.1 (8B) & Phi-4 (14B) into reasoning models. Blog: unsloth.ai/blog/r1-reas...

submitted 20 days ago • 0 comments

I’m currently going through/organizing DeepSeek takes and when it comes to market impact, down-ranking those that don’t account for TSMC tariff (insider) trading as a major (potentially primary) driver for the big price move this week.

submitted 27 days ago • 0 comments

So I was wondering, where are all the people who were so confident “we’ve hit a wall” on AI from uh … one month back? And for those getting carried away by the new AI narrative of the week (DeepSeek with a box of rocks/cope), maybe worth catching your breath and reflecting on that.

submitted 32 days ago • 0 comments

Posted by @vgel.me on the other site

submitted 32 days ago • 1 comment

With DeepSeek-R1 being among the strongest released frontier models in the world (and MIT licensed to boot!) there’s been a lot of heated discussion about its Chinese state censorship. Last did some poking at Qwen2 that afaik is still on of the few analyses online: huggingface.co/blog/leonard...

submitted 32 days ago • 1 comment

I'm obsessed with this artpiece: giant mirrors that shine sun onto a town under shadow half of the year. Until it was built, the townspeople hated it. What a metaphor for the denial of major social, political, or technological change: you cannot coexist with your own yearning for something better.

submitted 33 days ago • 1 comment

I was doing a quick skim of RTX 5090 reviews and it seems almost all hardware reviewers have no idea how to benchmark LLM performance, I wonder if a simple guide would be useful...

submitted 34 days ago • 0 comments

I've been doing some inference throughput/latency testing (focused on lowest TTFT) and testing various quants and engines. The bs=1 optimized (but server-capable) kernels scale pretty poorly. (Also, while vLLM and SGLang both can use Marlin kernels but SGLang's latency seems better across the board)

submitted 39 days ago • 0 comments

This recent essay andymasley.substack.com/p/individual... got me to do a more thorough writeup of my recent empirical inference efficiency testing. Full writeup: fediverse.randomfoo.net/notice/AqCTD... - basically, currently, inference is >100X more efficient than the most commonly cited numbers.

submitted 40 days ago • 0 comments

There's been a lot of speculation and excitement in r/LocalLlama about the new Nvidia Project DIGITS www.nvidia.com/en-us/projec... but think it's more likely than not that MBW will be lower than people are hyping themselves up for... www.reddit.com/r/LocalLLaMA...

submitted 48 days ago • 1 comment

As a followup to my earlier DeepSeek-v3 performance testing, here's what basically OOTB (2xH100, tp=16) vLLM (0.6.6.post2.dev5+g5ce4627a) vs SGLang (0.4.1.post4) looks like (this is concurrency=64, but scales similarly up to 1024). atm sglang has +125% better throughput and ~10X lower mean TTFT.

submitted 50 days ago • 0 comments

Some of you might get a kick out of this (I got the FP8 running on vLLM w/ slurm-to-ray on 2 x H100 nodes as well, more on that later...)

submitted 52 days ago • 0 comments

New year, new blog post: I had a random question, what happens when LLMs are prompted to write better code, again and again? Do they actually write better code? The answer is yes*! minimaxir.com/2025/01/writ...

submitted 55 days ago • 8 comments

The Claude Desktop app has MCP support: modelcontextprotocol.io/quickstart/u... - I decided to see if I could get Claude Desktop installed on Arch Linux (yes, but ironically, had to use ChatGPT o1 to step in and clean up the forked script/get it working): github.com/lhl/claude-d...

submitted 57 days ago • 0 comments

Unless you have 400GB of memory for DeepSeek-V3 (Q4), Qwen2.5-Coder-32B is probably still the best local code assistant available (fits in a 24GB consumer GPU). I was curious and did some testing w/ llama.cpp's speculative decoding. Results/discussion here: www.reddit.com/r/LocalLLaMA...

submitted 57 days ago • 0 comments

I will be retiring/deprecating this version of Shaberi (GPT4-judged JA functional testing) for something new early next year, but possibly of interest, DeepSeek-v3 just slotted into first place. (I tested close to 100 models with this eval this year)

submitted 59 days ago • 0 comments

So, not only QvQ, but DeepSeek-V3 just dropped. It's a massive model, by my calcs 29B activation params / 453B weights: www.reddit.com/r/LocalLLaMA... - it reportedly scores 48.9%, above Sonnet, on aider's new polyglot code leaderboard: aider.chat/2024/12/21/p... (as ref: Qwen2.5-Coder scores 8%)

submitted 63 days ago • 0 comments

Lately I've been doing vLLM performance on some H100 nodes in preparation for generating lots of synthetic tokens. , Interestingly, I found that throughput was significantly lowered when max_num_seqs or max_num_batched_tokens was specified, even when the setting was the same as the default (512/512)

submitted 65 days ago • 1 comment