navitagoyal.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

navitagoyal.bsky.social

PhD student @umdcs, Member of @ClipUmd lab | Earlier @AdobeResearch, @IITRoorkee

3 posts 246 followers 186 following

Posts 5 Comments 5

🚨 New Position Paper 🚨 Multiple choice evals for LLMs are simple and popular, but we know they are awful 😬 We complain they're full of errors, saturated, and test nothing meaningful, so why do we still use them? 🫠 Here's why MCQA evals are broken, and how to fix them 🧵

submitted 13 days ago • 2 comments

How can we generate synthetic data for a task that requires global reasoning over a long context (e.g., verifying claims about a book)? LLMs aren't good at solving such tasks, let alone generating data for them. Check out our paper for a compression-based solution!

submitted 16 days ago • 0 comments

This paper is really cool. They decompose NLI (and defeasible NLI) hypotheses into atoms, and then use these atoms to measure the logical consistency of LLMs. E.g. for an entailment NLI example, each hypothesis atom should also be entailed by the premise. Very nice idea 👏👏

submitted 19 days ago • 2 comments

Please join us for: AI at Work: Building and Evaluating Trust Presented by our Trustworthy AI in Law & Society (TRIALS) institute. Feb 3-4 Washington DC Open to all! Details and registration at: trails.gwu.edu/trailscon-2025 Sponsorship details at: trails.gwu.edu/media/556

submitted 52 days ago • 0 comments

This is my first time serving as an AC for a big conference. Just read this great work by Goyal et al. arxiv.org/abs/2411.11437 I'm optimizing for high coverage and low redundancy—assigning reviewers based on relevant topics or affinity scores alone feels off. Seniority and diversity matter!

submitted 95 days ago • 1 comment