neuralreckoning.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

It's sometimes been so long I had actually forgotten I even submitted it.

submitted 1 hour ago

comment in response to post

It's academics, they're all going to run to the right and it will surely end well. 😉

submitted 12 hours ago

comment in response to post

If you ever find yourself writing the sentence "We don’t yet know whether Kennedy’s proposal is the right solution" it's time to reconsider your life choices.

submitted 16 hours ago

comment in response to post

Super cool results! Looking forward to reading the paper in detail. Just based on this thread, would it be reasonable to summarise that they don't learn it unless you give them some sort of big hint or is that too simplistic? This would be consistent with what we've seen in much simpler networks.

submitted 1 day ago

comment in response to post

Yeah but I'm talking about precisely the sorts of things where we don't have a ground truth. We develop our notion of truth in easy cases, and then we generalise it to the hard cases. But that sort of generalisation out of distribution is what ML systems do badly.

submitted 2 days ago

comment in response to post

I like that there are people out there pushing one (scientific) idea as far as it will go, even if I don't agree with that idea. It's how we make progress, having different viewpoints pushing hard and interacting.

submitted 2 days ago

comment in response to post

I don't find that obvious.

submitted 2 days ago

comment in response to post

Got you. I don't think I agree with that but I do agree that compression is an important part.

submitted 2 days ago

comment in response to post

No I meant from model outputs. But I'm not up to date on that research so it might have improved a lot.

submitted 2 days ago

comment in response to post

My impression (perhaps outdated) was that these types of internal model generated estimates of uncertainty were very poor in terms of knowing when answers were likely true or not?

submitted 2 days ago

comment in response to post

Oh that's a fascinating one, thanks!

submitted 2 days ago

comment in response to post

This seems obviously incorrect to me so I assume I'm misunderstanding what you're getting at.

submitted 2 days ago

comment in response to post

Maybe generating good keywords and alternative phrases that people use when talking about something, that could be the starting point of a literature search? Has anyone tried using them in this way just via prompting? Or maybe there's another way to use the core model without prompting?

submitted 2 days ago

comment in response to post

So that's why I'm wondering if there is another way to use LLMs that more clearly makes use of the fact that they're an incredible compression scheme? Search seems like one possibility but maintaining sources would undermine their compressing role I guess.

submitted 2 days ago

comment in response to post

Another aspect is that I'm not sure it's possible to train models to produce truth, in some sense. I feel like we learn this by living in the world and trying to use the imperfect pieces of knowledge and skills we have to achieve stuff. Without that connection, can it go beyond compression?

submitted 2 days ago

comment in response to post

"Reasoning" models with chain of thought get a little way towards this but they feel like an overkill solution that also isn't enough to really address it. But I have to admit I don't know much about their internals and I've never really had the chance to use them myself.

submitted 2 days ago

comment in response to post

I think that part of synthesizing multiple views is building a mental model of the underlying meaning, finding the points of disagreement, and putting that into a new framing. This feels like an inherently back and forth process that LLMs can't do by their very structure.

submitted 2 days ago

comment in response to post

For example, I would expect them to be good at gathering text from their training data that is talking about the same thing in two different ways. This seems like it would be very helpful for synthesizing views on a complex question, but I think not.

submitted 2 days ago

comment in response to post

Like everyone I'm impressed and amazed by what they can do, but also very frequently nonplussed by their stupidity. The amazing things convince me there's something important here, but the stupidity seems to have a consistent character that makes me think we're using them in a non-optimal way.

submitted 2 days ago

comment in response to post

Maybe the magic was that it was invisible? 🤔

submitted 2 days ago

comment in response to post

My guess is the top left one but I'm not entirely sure because she just announced my cake was outside and then marched off to do something else. 😂

submitted 2 days ago

comment in response to post

Fair enough! As long as it doesn't end up like the current system but just after publication instead of before. That would be a wasted opportunity to make something better.

submitted 3 days ago

comment in response to post

So the idea here is simply to skip the bit that has unproven value and a host of known problems (bias, propping up a parasitic industry), and jump straight to the bit post-publication that we know is what truly adds value to the whole process.

submitted 3 days ago

comment in response to post

And the fact that it's inevitable means that you can't rely on published status of a paper as a signal, even for reputable non-glam journals. In other words, the true value of a paper is only tested in the post-publication phase. At the moment, that's done by word of mouth instead of formally.

submitted 3 days ago

comment in response to post

That's an extreme case, but (a) this was a reputable non-glam journal with a very well known and senior academic editor (I won't name names), (b) it's not the only similar case I could cite purely from my own experiences of reviewing. This sort of thing is inevitable in the current system.

submitted 3 days ago

comment in response to post

They agreed, I wrote my review which said that purely based on the bits that I had expertise in, the paper had major faults that undermined its conclusions. The editor published the paper, and it turned out they hadn't got another reviewer. So they published based on one (negative) review.

submitted 3 days ago

comment in response to post

Certainly that has been my experience as one of those sub-optimal reviewers. A notable case is when it turned out I had been the only reviewer of a paper, and I wrote to the editor saying I would only re-review if they got someone else too because I didn't have expertise that covered the whole paper

submitted 3 days ago

comment in response to post

I think there is a risk here, and some papers would of course get no reviews under this system. What makes this argument is less compelling is that we don't know the value of having those 'forced' reviews. We know that editors are desperate to find reviewers and will accept sub-optimal choices.

submitted 3 days ago

comment in response to post

This one is good for talks given you scientists too. And papers for that matter.

submitted 5 days ago

comment in response to post

Yeah overleaf is better but if it's not working well on the free version this can partly compensate.

submitted 5 days ago

comment in response to post

Run locally? Or set up a GitHub action if you want to collaborate?

submitted 6 days ago

comment in response to post

Missing out on Florence AND this awesome sounding workshop. 😞

submitted 6 days ago

comment in response to post

Likewise! 🙂

submitted 6 days ago

comment in response to post

That's not a reply to what I just wrote.

submitted 6 days ago