stellali.bsky.social
PhD student @uwnlp.bsky.social @uwcse.bsky.social | visiting researcher @MetaAI | previously @jhuclsp.bsky.social
https://stellalisy.com
39 posts
1,247 followers
194 following
Regular Contributor
Active Commenter
comment in response to
post
This work was jointly done with the amazing @jiminmun.bsky.social !
And huge shout out to our awesome collaborators and mentors faebrahman.bsky.social, Jonathan Ilgen, Yulia (tsvetshop.bsky.social) and maartensap.bsky.social 🩵🥰
comment in response to
post
ALFA is open-source! There are many more analyses in the paper, check them out!📖
🔗 Paper: arxiv.org/abs/2502.14860
💻 Code: github.com/stellalisy/ALFA
🤖 Data: tinyurl.com/MedicAskDocsData
Join us in the effort to make LLMs better at question asking! 🚀
#Healthcare #NLProc #AI4Science
comment in response to
post
Why this matters for AI safety & reliability: 🛡️
Better information gathering = Better decisions✅
Proactive questioning = Fewer blind spots🧐
Structured attributes = More controllable AI🤖
Interactive systems = More natural AI assistants🫶🏻
comment in response to
post
ALFA isn't just for medicine! The framework could be adapted to ANY field where proactive information gathering matters:
Legal consultation ⚖️
Financial advising 💰
Educational tutoring 📚
Investigative journalism 🕵️
Anywhere an AI needs to ask (not just answer), you should try ALFA out!🌟
comment in response to
post
🌟 Impressive Generalization!
ALFA-trained models maintain strong performance even on completely new interactive medical tasks (MediQ-MedQA).
highlighting ALFA’s potential for broader applicability in real-world clinical scenarios‼️
comment in response to
post
🔬 Key Finding #2: Every Attribute Matters!
Removing any single attribute hurts performance‼️
Grouping general (clarify, focus, answerability) vs. clinical (medical accuracy, diagnostic relevance, avoiding DDX bias) attributes leads to drastically different outputs👩⚕️
Check out some cool examples!👇
comment in response to
post
🔬 Key Finding #1: Preference Learning > Supervised Learning
Is it just good synthetic data❓ No❗️
Simply showing good examples isn't enough! Models need to learn directional differences between good and bad questions.
(but only SFT no DPO also doesn't work!)
comment in response to
post
Results show ALFA’s strengths🚀
ALFA-aligned models achieve:
⭐️56.6% reduction in diagnostic errors🦾
⭐️64.4% win rate in question quality✅
⭐️Strong generalization.
in comparison with baseline SoTA instruction-tuned LLMs.
comment in response to
post
The secret sauce of ALFA? 🔍
6 key attributes from theory (cognitive science, medicine):
General:
- Clarity ✨
- Focus 🎯
- Answerability 💭
Clinical:
- Medical Accuracy 🏥
- Diagnostic Relevance 🔬
- Avoiding Bias ⚖️
Each attribute contributes to different aspect of the complex goal of question asking!
comment in response to
post
📚 Exciting Dataset Release: MediQ-AskDocs!
17k real clinical interactions
80k attribute-specific question variations
302 expert-annotated scenarios
Perfect for research on interactive medical AI
First major dataset for training & evaluating medical question-asking! 🎯
huggingface.co/datasets/ste...
comment in response to
post
Introducing ALFA: ALignment via Fine-grained Attributes 🎓
A systematic, general question-asking framework that:
1️⃣ Decomposes the concept of good questioning into attributes📋
2️⃣ Generates targeted attribute-specific data📚
3️⃣ Teaches LLMs through preference learning🧑🏫
comment in response to
post
Why is this important?🤔
Current LLMs struggle with good question asking and proactive info-seeking. They often miss crucial details or ask vague questions. In medicine, this diverts convos away from key diagnostic evidence. We need AI that knows WHAT to ask and HOW to ask it!🎯
comment in response to
post
Thanks for sharing the work! Let me know if you have any questions!
comment in response to
post
Huge shout out to my amazing collaborators and mentors @shangbinfeng.bsky.social @vidhishab.bsky.social @emmapierson.bsky.social @pangwei.bsky.social and Yulia (@tsvetshop.bsky.social)🥰 Literally couldn’t have done this project without you guys🫶💕
comment in response to
post
What’s next for #MediQ?
We are exploring ways to further improve uncertainty quantification and also asking better questions🤩
Explore our datasets, benchmarks, and full paper to advance interactive clinical reasoning with LLMs: arxiv.org/abs/2406.00922 💻📄
comment in response to
post
Why does this matter?
🤝 Question-asking bridges the gap between incomplete patient info and confident AI responses.
🛑 Abstention ensures that LLM agents only act within its confidence threshold.
Together, these make AI safer and more aligned with real-world clinical workflows.
comment in response to
post
🏗️ Ablations:
• Adding rationale generation (RG) improves diagnostic accuracy and reduces calibration error.
• Self-consistency boosts reliability (but only with RG!).
Combining both enhances performance by 22%.
Each piece matters for better LLM reasoning! 📊
comment in response to
post
Key findings✨:
📉SOTA LLMs fail at interactive clinical reasoning‼️they don’t ask questions.
👑BEST Expert improves diagnostic accuracy by 22% w/ abstention module.
🗨️ Iterative questioning helped gather crucial missing information in incomplete scenarios.
comment in response to
post
At the core of #MediQ is the Abstention Module🔍where the Expert evaluates its confidence before acting:
✅If confident, provide a final answer.
❓If unsure, it asks targeted questions to gather more info.
This process mimics how clinicians manage uncertainty, reducing errors and improving safety.
comment in response to
post
The #MediQ framework:
🤒Patient System: Simulates realistic patient responses with partial info.
🩺Expert System: Decides when to ask questions or answer using an abstention mechanism.
📊 Benchmark: Tests LLMs’ ability to handle iterative decision-making in medicine.
comment in response to
post
⚙️ How does it work?
#MediQ shifts LLMs from static question-answering to interactive reasoning:
1️⃣ Identifies missing info.
2️⃣ 🗣️ Asks relevant follow-up questions.
3️⃣ 🛑 Abstains from guessing when unsure.
This makes AI safer and more reliable for high-stakes scenarios.
comment in response to
post
🚨The challenge: Generative AI assumes all info is provided upfront and gives its “best guess” to what’s given.
🛑In reality, patients often give incomplete details, 👩⚕️👨⚕️doctors don’t guess—they ask follow-up questions to fill the gaps. #MediQ teaches LLMs to do the same.
comment in response to
post
My brain shuts up and its so quite with this “music” haha🤣
comment in response to
post
unpopular opinion but this actually helps me focus so much open.spotify.com/episode/2nEO...
comment in response to
post
Would love to be added! Thank you!
comment in response to
post
Loll can I be added? Thank you!
comment in response to
post
Would love to be added! My research is on cognitively inspired reasoning in LLMs. Thank you!
comment in response to
post
Would love to be added! Thank you!
comment in response to
post
Ikr😭
comment in response to
post
Would love to be on the list, thank you for the efforts!!