shikharmurty.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

shikharmurty.bsky.social

Final year PhD Student in Computer Science @Stanford Work on: - Compositionality, syntax (language structure) - Web Agents: Synthetic data, tree search, exploration (language interpretation)

24 posts 443 followers 124 following

Posts 15 Comments 15

comment in response to post

“casual interception” as defined in \citep{}…

submitted 13 days ago

comment in response to post

controlling a browser / computer! but requires a bit more tooling to set it up.

submitted 22 days ago

comment in response to post

Please check out our paper for more details: arxiv.org/pdf/2410.02907 And our code if you want a NNetNav-ed model for your own domain: github.com/MurtyShikhar... Done with collaborators: @zhuhao.me, Dzmitry Bahdanau and @chrmanning.bsky.social

submitted 22 days ago

comment in response to post

We find that cross-website robustness is limited, and almost always, performance goes up from incorporating in-domain nnetnav data. This makes it even more important to work on unsupervised learning for agents - how are you going to collect human data for *any* website? [6/n]

submitted 22 days ago

comment in response to post

We use this data for SFT-ing LLama3.1-8b. Our best models outperform zero-shot GPT-4 on both WebArena and WebVoyager, and reach SoTA performance among unsupervised methods for both datasets [5/n]

submitted 22 days ago

comment in response to post

We use NNetNav to collect around 10k workflows for over 20 websites including 15 live websites, and 5 self-hosted websites. Data is available on 🤗: huggingface.co/datasets/sta... huggingface.co/datasets/sta... [4/n]

submitted 22 days ago

comment in response to post

Main ideas behind NNetNav exploration 1 complex goals have intermediate subgoals thus complex trajectories must have meaningful sub-trajectories 2 Use an LM instruction relabeler + judge to test if trajectory-so-far is meaningful. If yes, continue exploring, otherwise prune [3/n]

submitted 22 days ago

comment in response to post

NNetNav uses a structured exploration method to efficiently search and collect traces on live-websites, which are retroactively labeled into instructions, finding a strikingly diverse set of workflows for any website (e.g. like this plot) [2/n]

submitted 22 days ago

comment in response to post

Now, reviewers are upset if we only finetune sub 10B parameter models!

submitted 94 days ago

comment in response to post

for more context: we are training the probe on sentences from PTB / BLIMP

submitted 95 days ago

comment in response to post

thx for sharing, though semantic parsing almost certainly benefits from modeling syntax :)

submitted 95 days ago

comment in response to post

SRL probe still rewards hidden states that model dependency relations, no? would like a probe thats agnostic to how well the underlying network models syntax

submitted 96 days ago

comment in response to post

could i get added? thx for making this!!

submitted 96 days ago

comment in response to post

To be fair, after some prompt engineering: German: (S (NP (DT Der) (NN Mann)) (VP (VB mag) (NP (JJ schwarze) (NNS Katzen)))) Japanese: (S (NP (NN Otoko) (PP wa)) (VP (NP (JJ kuro) (NN neko) (PP ga))

submitted 99 days ago

comment in response to post

nothing but blue skies, for posting puns

submitted 99 days ago