Neat visualization that came up in the ARBOR project: this shows DeepSeek "thinking" about a question, and color is the probability that, if it exited thinking, it would give the right answer. (Here yellow means correct.)
Comments
Log in with your Bluesky account to leave a comment
why is it so "verbose" in it's thinking? so many superfluous words in there. Just wondering how it works. Wouldn't it be faster if it took a technical approach, broke the question into chunks, generated a list of values/data and then made a conclusion from that?
It's based on a data set of multiple-choice questions that have a known right answer, so this visualization only works when you have labeled ground truth. Definitely wouldn't shock me if those answers were labeled by grad students, though!
We also see cases where it starts out with the right answer, but eventually "convinces itself" of the wrong answer! I would love to understand the dynamics better.
Comments
https://github.com/ARBORproject/arborproject.github.io/discussions/11#discussioncomment-12309423 (vis by @yidachen.bsky.social in conversation with @diatkinson.bsky.social )