taylorwwebb.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Yeah I think everyone agrees that symbol processing has to be implementable by neural networks. The surprising thing here, and the thing that has been debated, is that a neural network can learn these mechanisms rather than needing them to be built in.

submitted 15 days ago

comment in response to post

What about something like this? www.science.org/doi/10.1126/... surely there are laws governing information processing systems in general, if not brains specifically? But I can see how these may not be practically relevant for some goals like understanding neurological disease.

submitted 15 days ago

comment in response to post

Please check out the paper for many more analyses and details. Thanks very much to my coauthors, especially first author @yukang25.bsky.social, @thisisadax.bsky.social, and others not on bluesky.

submitted 18 days ago

comment in response to post

These results suggest a potential reconciliation of the debate between neural networks and symbol systems: neural networks may solve abstract reasoning tasks by implementing a form of symbol processing, despite the absence of innate symbolic mechanisms. (12/N)

submitted 18 days ago

comment in response to post

Finally, using ablation analyses, we confirmed that these attention heads were necessary to perform the abstract reasoning task. (11/N)

submitted 18 days ago

comment in response to post

Using representational similarity analyses, we confirmed that the outputs of these attention heads represented the hypothesized variables: the abstraction heads and symbolic induction heads primarily represented abstract symbols, whereas the retrieval heads primarily represented tokens. (10/N)

submitted 18 days ago

comment in response to post

Using attention analyses, we confirmed that these heads' attention patterns were consistent with the hypothesized mechanisms. For instance, for symbolic induction heads, attention was primarily directed to tokens that instantiated the same abstract variable as the next token. (9/N)

submitted 18 days ago

comment in response to post

Using causal mediation analyses, we confirmed the presence of these heads in a series of three stages: abstraction heads were present in early layers, symbolic induction heads were present in middle layers, and retrieval heads were present in later layers. (8/N)

submitted 18 days ago

comment in response to post

In the third stage, 'retrieval heads' retrieve the token associated with the predicted variable (effectively inverting the symbol abstraction heads). (7/N)

submitted 18 days ago

comment in response to post

In the second stage, 'symbolic induction heads' predict the abstract variable associated with the next token, implementing a symbolic variant of the 'induction head' mechanism that has been tied to in-context learning arxiv.org/abs/2209.11895 (6/N)

submitted 18 days ago

comment in response to post

In the first stage, 'symbol abstraction heads' convert to input tokens to abstract variables, based on their relations with other tokens. Interestingly, this implements an emergent form of the 'Abstractor' architecture that we recently proposed for relational reasoning arxiv.org/abs/2304.00195 (5/N)

submitted 18 days ago

comment in response to post

We looked at the internal mechanisms that support abstract reasoning in an open-source LLM (Llama3-70B), focusing on an algebraic rule induction task. We found evidence for an emergent 3-stage architecture that solves this task via a form of symbol-processing. (4/N)

submitted 18 days ago

comment in response to post

The success of LLMs on abstract reasoning tasks (e.g. www.nature.com/articles/s41...) therefore raises the question: do LLMs solve these tasks using structured, human-like reasoning mechanisms, or do they merely mimic this approach via other mechanisms (e.g. approximate retrieval)? (3/N)

submitted 18 days ago

comment in response to post

Cognitive scientists have long argued that human-like reasoning requires some form of symbol processing, and this has often been contrasted with neural networks, which are thought to lack key properties of symbol systems. (2/N)

submitted 18 days ago

comment in response to post

The emerging story from mech interp research suggests that they do rely on structured reasoning mechanisms (induction heads, function vectors, binding IDs etc) but definitely lots of unknowns still about how they solve these kinds of problems and how that compares to humans.

submitted 92 days ago

comment in response to post

I found this informative (though of course speculative) www.interconnects.ai/p/openais-o3...

submitted 92 days ago

comment in response to post

Thank you Ida, very much appreciated!!

submitted 127 days ago

comment in response to post

Though we don’t test it, one could envision a conjunctive search task involving conjunctions of real-world object categories (such as cats) and some other feature (such as color), and I would expect VLMs to struggle with this task because of the binding problem.

submitted 133 days ago

comment in response to post

Identifying whether a cat is present in an image is an instance of the disjunctive search task (identifying the presence of a single feature) that we show VLMs excel at, and that human observers can do rapidly, even for large numbers of objects.

submitted 133 days ago

comment in response to post

Thanks for pointing this out! Here’s a link: arxiv.org/abs/2411.00238

submitted 133 days ago

comment in response to post

Can you clarify what you mean about spatial invariance? Spatial judgments definitely seem to be a problem for these models, but this seems to be a separate issue from the binding failures we looked at here (many involving tasks that don’t have a spatial component).

submitted 133 days ago

comment in response to post

We find that VLMs behave very much like human vision when people are forced to respond quickly, thus relying on feedforward processing alone. This has implications for the source of difficulty in visual reasoning tasks, and suggests the need for object-centric approaches.

submitted 133 days ago

comment in response to post

I suppose there are accounts of WM that involve synaptic plasticity, but to me it’s confusing levels of analysis to identify learning with a specific mechanism. I think it makes sense to think of learning algorithms as being potentially simulated in activations eg www.nature.com/articles/s41...

submitted 192 days ago

comment in response to post

I think many examples of in-context learning would be referred to as inductive reasoning or ‘schema induction’ (an extended form of analogical reasoning) in the human cog psych literature, which are heavily PFC / working memory dependent.

submitted 192 days ago

comment in response to post

... the internal decision variable shows the same properties of greater variance / mean separability, but this depends on a particular 2D structure in the sensory evidence distribution.

submitted 416 days ago