sir-deenicus.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Seems to me "downstream of" is a communicative intent for a stage between correlated with and caused by/

submitted 87 days ago

comment in response to post

Poker is a zero-sum 2 player game--no cooperation. This makes it easy in comparison, to the point that using a lookup table to play is the basic approach for fixed-limit. Generally, poker can't be analogized from because of the unique constrains available to poker, making CFR a viable approach

submitted 97 days ago

comment in response to post

It's extremely difficult to do correctly. Many inference under uncertainty algorithms are NP-hard, even approximations are np-hard if there are lots of complex dependencies being inferred/reasoned over.

submitted 97 days ago

comment in response to post

Is it within the realm of possibility that this might someday find its way into Cities skylines as a mod?

submitted 115 days ago

comment in response to post

Speaking of tangent bundles and walking, some games implicitly leverage/generate them in their approach to ensuring walking on non-flat surfaces is less janky/bouncy/jerky.

submitted 122 days ago

comment in response to post

------------ 221506678275824895551478330339758539061189773737774336433975313280763953351631546560459649255038791485329587902144485

submitted 123 days ago

comment in response to post

To me, fact that 13B had highest faithfulness means this is (perhaps in large part) an artifact of how easy the tasks were. At some loss threshold the model properly gains CoT ability and then relies on it according to task difficulty. Hence smaller and larger models being unreliable/unfaithful resp

submitted 124 days ago

comment in response to post

Contra: It'd be in some human language because those are the vectors that have the richest interactions/relations. While CoT is not completely faithful, it's strongly correlated--kinda like short-hand and key notes. Has to be 100% faithful for computations rendered in context and tool-use however.

submitted 124 days ago

comment in response to post

∙⟨λφ.φ∙⟨τ⟩∙⟨τ⟩⟩⟩⟩⟩⟩⟩⟩⟩⟩⟩⟩⟩⟩⟩

submitted 124 days ago

comment in response to post

Waterfalls can indeed freeze. This is probably real but with forced perspective (camera really close to the ice formation while the person is quite far resulting in an optical illusion that the frozen waterfall is colossally large). Searching suggests it's the Goriuda Waterfall.

submitted 125 days ago

comment in response to post

I daresay even the cadence of .NET libraries themselves is slower. I think it's because LLMs occupy such a huge chunk of the field's attention. (One benefit of LLMs is they benefit smaller communities because each individual is also enhanced to do more, more easily)

submitted 130 days ago

comment in response to post

Right, if you look at Scala, similar things are being said there too. And if F# is dead then what is OCaml? Even Haskell, Racket, Elm, Clojure--none of them hold the mindshare they used to. Which is fine IMO. People are still building interesting things in them, just fewer corporates.

submitted 130 days ago

comment in response to post

And for emphasis: A human mind being computable means that in principle, it too could run as a computer program on a computer. An authentic such simulation would also be bad at calculation. Similarly in LLMs, running on a computer doesn't mean the mind-like thing will find all computer things easy.

submitted 141 days ago

comment in response to post

The handicapping of Claude in Copilot is no doubt a side-effect of the hamfisted, crude and clumsy prompt instructions given to it to discourage and dissuade users from having random conversations with it. Naturally, there's a way around that though.

submitted 141 days ago

comment in response to post

A Neural net is complexity bound, code data and execution evn rolled in one. If we wish to modify the model or otherwise adapt it, having the weights is what is important. The simple fact is that the concepts of opensource do not transfer cleanly, no matter how much people try to twist it to be so.

submitted 145 days ago

comment in response to post

That person does not know what they're talking about alas, but it's a common misconception. The neural net is in fact the code and data; its bulk is exactly a program written as a set of arithmetic expressions joined by some if-then comparisons to zero. That source was never human comprehensible

submitted 145 days ago

comment in response to post

No, not quite on Claude. There's a 90% or some other high chance that free Claude users are told that Anthropic is currently under high load so please use Haiku 3.5 in concise mode. Which is pretty bad.

submitted 147 days ago

comment in response to post

It most certainly is not. My experience is that it is decidedly worse. Note that I am not saying that I am right, but that your objective tone is wrong.

submitted 147 days ago

comment in response to post

The tokens themselves are like temporary auxiliary states, anchor points, like notes on scratch paper to constrain and control subsequent generation but not the complete state for the "thoughts" themselves.

submitted 147 days ago

comment in response to post

A theoretical possibility is the model also does not count output reasoning tokens as part of its internal processing. If we think about it, compared to the cached attentional hidden states (non-lingual), the notes we see make up but a small fraction of information processed per token.

submitted 147 days ago

comment in response to post

I think I know what's happening. The most mundane aspect is with each of your turns, the previous turn's thinking tokens are not passed to the model, for space saving reasons. So it is correctly reporting a lack of observable thinking tokens for it *at that point*.

submitted 147 days ago

comment in response to post

I don't like that framing. Doesn't it seem patronizing? It's painting those that are neither terrible nor exceptional as weak. We should like to live in a world that makes it so as many as possible are comfortable enough to readily do the right thing. Seems anger is turned in the wrong direction?

submitted 148 days ago

comment in response to post

It's not necessarily the model is not smart enough--curious what you get when you try with QwQ. Evidence so far that distillation does not result in authentic reasoning capability.

submitted 148 days ago

comment in response to post

Interesting. I read the webcomic years ago, it was fun but a typical power(growth) fantasy. I haven't seen either season--what are you enjoying about the anime?

submitted 148 days ago

comment in response to post

The oddest thing to me is the idea that dynamical and computational systems are somehow separate. But all computational systems (esp ones that are not deterministic) are also subsets of dynamical systems--namely those that always work with finite information and precision (time steps, divisibility)

submitted 148 days ago

comment in response to post

The focus on such sparse MoE's and the tweaks done to it that differs from the typical MoE show a clear hint at experimentation among choices that led there.

submitted 149 days ago

comment in response to post

What does that mean exactly? These are clearly deliberate choices made for a compute constrained environment. Proving out mixed precision with fp8 in a production evn is by itself a huge deal. The load balancing innovations and the approach to cheap attention. These are all precisely done and clever

submitted 149 days ago

comment in response to post

There's a paper you all can check yourself if you wish to disabuse yourself of the misinformation you've just been fed by someone who should really have known better. Disappointing to see, really. I wonder how many will first think, but who are you anyways to say this, instead of, are they right?

submitted 149 days ago

comment in response to post

They also performed specializing optimizations to SW environment. These together are enough to make it an OOM more energy efficient to run vs llama405B, long as you have sufficient memory. As for aspects affecting training, you can ballpark the math, the costs seem on the lower end of reasonable.

submitted 149 days ago

comment in response to post

Reading the paper instead of guessing from a position of ignorance is better, no? Here are innovations that make it significantly more efficient: - mixed prec fp8 training - MoE with a unique arch and high sparsity (+only ~40B act params) - load balancing innovations - low rank decmop attention

submitted 149 days ago

comment in response to post

Why should I trust it less than Microsoft or OpenAI? (which tbc, is saying you should not trust any of them! Whatever you send to any should be something you don't care they gain access to)

submitted 150 days ago

comment in response to post

The more such trivia it has fairly to perfect accurate knowledge on, the more confident we can get on expansiveness of its input data. Similarly, by looking at its low rank attn, the fact that they used mixed prec fp8, MoE's and innovated on load balancing--the plausibility of the number's not low

submitted 150 days ago

comment in response to post

You can do better than that FWIW. You can probe it with long tail questions to get an idea of how diverse and extensivea its training data was. For example, when I ask it questions about The Anama of the Isle of bigail from an obscure indie RPG, it knows about it!

submitted 150 days ago

comment in response to post

Hmm, but prompting skill will remain significant. See, it's not due only to a failing of the model but a difference too. As LLM and human minds aren't identical, there'll remain a major need to know how to phrase things for LLMs specifically, even with question asking skill overlapping non-trivially

submitted 168 days ago

comment in response to post

I guess Rogue trader is close, but WH40K is basically a fantasy setting--although, if Starwars counts as scifi then WH40K can too? Colony Ship comes closest maybe, space aspect is background only (does shape/frame story tbf) Games with space heavy aspects and some kind of story most often strategy

submitted 168 days ago

comment in response to post

A rare combo. Everything I can think of misses an important component. Outer *Wilds*, Rogue Trader, Children of a Dead Earth, Terra Invicta, Cyperpunk2077, DeusEx Human Revolution. Each is either a hard-scifi (or close) RPG but not space, hard scifi space game but not RPG or plain scifi RPG, etc.

submitted 168 days ago

comment in response to post

This is why I mentioned the half-life of papers being terrible overall. Being peer reviewed is almost no signal; for some fields, it's no signal at all. I like to think most humans are good. So the problem here is the incentive structure that leads to this kind of sickness and to such abuses.

submitted 171 days ago

comment in response to post

I wasn't talking about what is better or worse, I meant the kind of people who have always engaged in scummy tactics are adapting to new technology. Saying it's a few papers also massively understates the problem. Abuse of stats, replicability are major issues. Not all of it has to be flagrant.

submitted 171 days ago

comment in response to post

Main error here is in not placing an upperbound *on how many humans*. Being more capable than any given human at most or all intellectual tasks does not equate with being better than elite teams and groups across all tasks. The second issue is expertise to initiate, guide and verify the jobs.

submitted 171 days ago

comment in response to post

Because there is more total creativity and skill outside a company, no matter its size, than inside it. No matter how elite.

submitted 171 days ago

comment in response to post

A very happy unbirthday to you, then!

submitted 173 days ago

comment in response to post

Review is by Cosma Shalizi. Genetic algorithms (for rule discovery) and bucket brigade algorithm for local credit assignment were key in Holland's PI. --- I believe RTRL for RNNs should also be relevant.

submitted 175 days ago

comment in response to post

Decision transformers are an LLM adjacent instance of control as inference. --- For other approaches, SOAR was mentioned; ACT-R is also relevant and PI (adaptive, message-passing, inductive rule system) predates both. Has a really enjoyable book, Induction, on it; review: bactra.org/reviews/hhnt...

submitted 175 days ago

comment in response to post

What I mean about your definition is that it is so open, it'd be meaningful only to someone who already knew the actual definition. One can also talk about strategies as functions from information sets (ie "state") to actions. I don't know why RL experts like to forcefully subsume everything into RL

submitted 175 days ago

comment in response to post

That's includes so many things as to be a non-definition. Besides, information sets (non-identifiability as a matter of indistinguishability) and POMDP states (uncertainty from incomplete observations on state) are not 1 to 1.

submitted 175 days ago

comment in response to post

Mmm that's not quite what I'm referring to. I'm pushing back against the idea that policies (the technical and not some vague nebulous umbrella term) are the only way to characterize action selection at some decision point.

submitted 175 days ago

comment in response to post

The control as inference setting is arguably more natural than RL. One can comfortably situate, RL, active inference and control theory within this framework. And--I think--there is an argument to be made that the parts of RL and control that do not quite map are also un-natural.

submitted 175 days ago

comment in response to post

It depends on the action space or what is meant by agent. For certain actors in game-like settings say, the concept of a strategy (profile) is more natural and is not quite the same as a policy.

submitted 175 days ago