Profile avatar
shikharmurty.bsky.social
Final year PhD Student in Computer Science @Stanford Work on: - Compositionality, syntax (language structure) - Web Agents: Synthetic data, tree search, exploration (language interpretation)
24 posts 443 followers 124 following
Regular Contributor
Conversation Starter

Ever dreamed of AI agents learning through interacting with the open world unsupervisedly? Our latest preprint introduces NNetNav-Live which collects training data through exploration on real websites and hindsight labeling, which produces a SOTA OSS agent.

Want to make a browser agent for *any* domain like banking or healthcare? We propose methods for training LLMs with open-ended, unsupervised interaction on live websites: ✅ OSS SoTA on WebVoyager ✅ world's smallest high-performing web-agent Try it here: nnetnav.dev

going to stay off twitter for my own mental health. something has gone horribly wrong with that platform.

Couldn't make it to NeurIPS due to work, but do check out our workshop happening in West Ballroom B. Lots of cool things to come, including a very fun panel!

Come visit our poster "MoEUT: Mixture-of-Experts Universal Transformers" on Friday at 4:30 in East Exhibit Hall A-C #1907 on #NeurIPS2024. With Kazuki Irie, Jürgen Schmidhuber, Christopher Potts and @chrmanning.bsky.social.

The extraordinary recent takeover of ML/AI by #NLP is well-known but insufficiently reflected on. Look at the @neuripsconf.bsky.social tutorials in 2024! neurips.cc/virtual/2024... 14 tutorials; 6 have "LLM" in the title; 4 more cover foundation models, with large NLP coverage. That's > 70% 😲

🚨 Thrilled to share that Compositional Generalization Across Distributional Shifts with Sparse Tree Operations received a spotlight award at #NeurIPS2024! 🌟 I'll present a poster on Tuesday and give an invited lightning talk at the System 2 Reasoning Workshop on Sunday. 🧵👇

🧵-1 We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.

Folks, I'm not going to be at Neurips this year, but we have an *awesome* workshop that i'm super proud of. Go attend, and use the link below to ask all of your burning questions about LLM reasoning, agents and compositionality!

🎊Excited for #neurips2024 and our "System 2 Reasoning at Scale" workshop. We have an excited lineup of speakers who will answer your most burning questions about AI and reasoning 🚀 🔥Got spicy questions? Submit & vote here: app.sli.do/event/dJNU63...

I also wear the AI agents researcher hat. Can't say i'm similarly impressed by reviewers in that community...

ACL syntax track reviewers >> almost any other conference. These folks care about their sub-field and i learn something new every time!

What is a probing task that is purely about semantics? Context: I have a probe trained to predict dependency relations, and would like to train another one on a semantics only task (for research purposes)

Asked GPT-4o to draw parse trees in two languages:

Hot take (since it's still just friends on this platform): It's crazy how the classic "sample and rerank" baseline from machine translation and IR, got re-branded as "scaling up inference-time compute".