I have a sneaking suspicion that LLM providers switch out models for smaller ones at peak traffic hour to avoid being overloaded — SuperMaven and Claude Sonnet 3.5 seem sharper in the morning, not just "more responsive"
Comments
Log in with your Bluesky account to leave a comment
Watching you use Claude Sonnet 3.5 was like watching a movie trope about a skilled hacker typing away at their keyboard at lighting speed. I had a flash back to the movie Swordfish.
I doubt they switch to a lower precision model, but would not be surprised if they start using a quantized or fp8 KV cache. Much easier to switch out dynamically in response to load vs the model weights.
Comments