shahanmemon.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

That option has been available for some time if I am not wrong.

submitted 4 days ago

comment in response to post

I think there might be a confusion. They are not revealing the identity of the reviewers, just the reviewer reports.

submitted 4 days ago

comment in response to post

Yeah eLife model is interesting. I have been ambivalent about it too. That said, I do appreciate publishers and journals willingness to experiment new models. I feel like the way we review papers is quite archaic. I was attending ICSSI this year and can tell that many researchers echo this thought.

submitted 4 days ago

comment in response to post

So I am guessing this was an informed choice. The reviewer identity still remains anonymous.

submitted 4 days ago

comment in response to post

What do you mean by “processed”? I am guessing this wasn’t an adhoc step on Nature’s part. Since 2020 or so, they have been giving a choice to publish reports. Nature communications took this step of making all reports public three years ago (www.nature.com/articles/s41...).

submitted 4 days ago

comment in response to post

Because a calculator is a device made for a specific purpose, our trust in it would be immediately corrupted if it were to give an incorrect answer. It would be useless. If you take away the specific purpose you also take away the criteria to determine if something works or not.

submitted 10 days ago

comment in response to post

That said, the evidence seems mixed. See for example this: arxiv.org/pdf/2305.13534 and this: arxiv.org/abs/2505.236... and this: arxiv.org/abs/2505.21523 According to system card, o3 and o4 seem to hallucinate much more than o1 🤷‍♂️

submitted 18 days ago

comment in response to post

CoT + RAG could potentially be helpful. See for example this recent preprint: arxiv.org/pdf/2505.09031 where RAG+CoT performs better than each alone, though each alone seems better than the base model too. An earlier paper in ACL also points to the same re. CoT+RAG: aclanthology.org/2023.acl-lon...

submitted 18 days ago

comment in response to post

Paradoxically, had the study not attracted so much attention, it likely would not have been retracted. Yet this shows the need for more responsible norms and systems for engaging with preprints, especially in fast-moving, hype-driven fields like #AI, where the stakes are exceptionally high. 🧵 4/4

submitted 18 days ago

comment in response to post

When that foundation turns out to be misleading, we are left with wasted engagement and and a long trail of cleanup in an already strained system. Plus Information ecosystem gets affected. Attention, credibility, and labor were all spent on something that should not have commanded it. 🧵 3/4

submitted 18 days ago

comment in response to post

It was covered by 10s of news outlets & has been cited 50 times across working papers, published articles, policy reports, and a dissertation. Many of these cited it as evidence. It shows how deeply a paper can become entangled with science and public discourse before formal publication. 🧵 2/4

submitted 18 days ago

comment in response to post

An earlier blog: thebsdetector.substack.com/p/ai-materia...

submitted 19 days ago

comment in response to post

In a way yes. DR is not a single shared MHA network but is a system/agent workflow of multiple components that may themselves be based on it. As for validation, you may be right; I am not sure. They may be fine-tuned models, sometimes using code interpretors, but it may just be LLM-asking-LLMs.

submitted 19 days ago

comment in response to post

And this thread might be useful too. threadreaderapp.com/thread/18872...

submitted 19 days ago

comment in response to post

Actually, it's a bit different than an LLM model. These screenshots are from a thread on twitter than I have found useful in understanding DeepResearch + its failure modes.

submitted 19 days ago

comment in response to post

Though even that is not impervious to hallucinations.

submitted 19 days ago

comment in response to post

As for knowledge graphs, they may not be enough, no?, or even feasible in many cases. Take temporality for example. Facts change, relevance shifts, and context matters. Isn't that one reason RAG is somewhat superior i.e. it brings in up-to-date, contextual info at the time when the model needs it.

submitted 19 days ago

comment in response to post

That's an interesting thought i.e. it hallucinates not randomly, but in service of its internal coherence. So do smarter people "make more claims overall" as well as highlighted in the evaluation doc as one of the reasons :p?

submitted 19 days ago

comment in response to post

mashable.com/article/open... "OpenAI doesn't know the underlying cause"

submitted 19 days ago

comment in response to post

🤦‍♂️

submitted 19 days ago

comment in response to post

That said, I have seen deep research to have fewer hallucinations than other models. So chain-of-thought + access to the web (action space), could potentially help. Though DeepResearch has other issues (not quite useful for "deep" research)

submitted 19 days ago

comment in response to post

more..

submitted 19 days ago

comment in response to post

There is empirical evidence that it does. But I still would not trust it. Today I searched for something, and within its thought process, it looked for a paper that did not exist. (or atleast I could not find that paper; see screenshot).

submitted 19 days ago

comment in response to post

It does not “know” what it does not know.. 🤷 In some models (like DeepResearch), they have somewhat better guardrails in place that avoid hallucinations to some extent.

submitted 19 days ago

comment in response to post

OSPI’s human-centered AI #Ethics guidance link: ospi.k12.wa.us/student-succ... Fun fact: The state of #Washington is one of the first five states to draft an ethics guidance around AI usage in classroom and the only state to have revised it twice already. #AIEducation #AI #Education 3/3

submitted 19 days ago

comment in response to post

One key issue we raised in the webinar is the growing misalignment between students and educators around what is an acceptable #AI use in the classroom. Without shared norms or clarity, this tension creates confusion, inconsistent enforcement, and lost opportunities for meaningful learning. 2/3

submitted 19 days ago

comment in response to post

Case in point the increasing prevalence of puzzle-solving & mop-up work around “Can GPT do X”. Has a “general purpose” instrument ever in the history become such a wide-spread object of study, disrupting what gets studied, who gets to & how? #ScAISci #SciSci #ScienceOfAIMediatedScience #AI4Science

submitted 23 days ago

comment in response to post

I sometimes think large empirical papers are like magic. From the outside, it's easier to imagine that the authors are magicians than that they actually slogged through all the steps their works seem to imply being necessary. This work wasn't magic, it was just hard work. 10/

submitted 30 days ago

comment in response to post

In a very early writing phase. Will definitely share to get your feedback :)

submitted 32 days ago

comment in response to post

Font is Minion 3 I think. Since you tagged Professor Crockett, here's one of the chapters inspired from their writing.

submitted 32 days ago