Profile avatar
mohit30.bsky.social
PhDing @GeorgiaTech | Previously: @msftresearch.bsky.social, @Microsoft @iiithyderabad | Research: NLP and Social Computing for Healthcare | Opinions are personal Homepage: https://mohit3011.github.io/ #ResponsibleAI #Human-CenteredAI #NLPforMentalHealth
23 posts 161 followers 144 following
Getting Started
Conversation Starter
comment in response to post
Congratulations! 🙌
comment in response to post
For more details: Paper: shorturl.at/bldCb Webpage: shorturl.at/bC1zn Code: shorturl.at/H8xmp Grateful for the efforts from my co-authors 🙌: Siddharth Sriraman, @verma22gaurav.bsky.social, Harneet Singh Khanuja, Jose Suarez Campayo, Zihang Li, Michael L. Birnbaum, Munmun De Choudhury 11/11
comment in response to post
Finding #6: We examined the actionability of mitigation advices. Expert responses scored the highest on overall actionability in comparison to all the LLMs. While LLMs provide less practical and relevant advice, their advice is more clear and specific. 10/11
comment in response to post
Finding #5: LLMs struggle to provide expert-aligned harm reduction strategies with larger models producing less expert-aligned strategies than smaller ones. The best medical model aligned with experts ~71% (GPT-4o score) of the time. 9/11
comment in response to post
Using the ADRA framework, we evaluate LLM alignment with experts across expressed emotion, readability, harm reduction strategies, & actionable advice. Finding #4: We find that LLMs express similar emotions and tones but provide significantly harder to read responses. 8/11
comment in response to post
Finding #3: In-context learning boosted performance for both ADR detection and multiclass classification (+23 F1 points for the latter). However, gains in ADR detection task were limited to a few models. Type of examples had a more pronounced impact for the ADR multiclass class. task. 7/11
comment in response to post
Finding #2: All LLMs showed “risk-averse” behavior, labeling no-ADR posts as ADR. Claude 3 Opus had a 42% false-positive rate for ADR detection and GPT-4-Turbo misclassified over 50% non-dose/time-related ADRs. This highlights the lack of "lived-experience" among models. 6/11
comment in response to post
Finding #1: Larger models perform better for ADR detection tasks (Claude3 Opus led with an accuracy score of 77.41%), but this trend does not hold for ADR multiclass classification. Additionally, distinguishing ADR types remains a significant challenge for all models. 5/11
comment in response to post
We introduce the Psych-ADR, a benchmark with Reddit posts annotated for ADR presence/type, paired with expert-written responses and the ADRA framework to systematically evaluate long-form generations in detecting ADR expressions and delivering mitigation strategies. 4/11
comment in response to post
Broader Takeaway #2: To build reliable AI in healthcare, we must move beyond choice-based benchmarks toward tasks that portray the complexities of the real world (such as ADR mitigation) using nuanced frameworks and benchmarks. 📈 Below are some nuanced findings 👇 3/11
comment in response to post
Broader Takeaway #1: LLMs are tools to empower and not replace mental health professionals. They offer clear & specific advice, addressing the global shortage of care providers—but contextually relevant, practical advice still requires human expertise. 👨‍⚕️👩‍⚕️ 2/11
comment in response to post
Great work! 👏
comment in response to post
Yup! I joined recently along with a large number of folks and I guess it will become like academic twitter if people continue to engage on the platform.
comment in response to post
Really amazing work! very insightful
comment in response to post
Thank you so much!
comment in response to post
I would love to get added if possible!
comment in response to post
Congratulations! It is certainly a good start but I still feel we need more interdisciplinary reviewers (based on the reviews I have gotten). One issue is the ask for reviewers to have at least 3 *CL papers in past 5 years which many researchers might not have. Something ACs could look into ?
comment in response to post
Thank you!
comment in response to post
Would love to get added to this!