Profile avatar
shikharmurty.bsky.social
Final year PhD Student in Computer Science @Stanford Work on: - Compositionality, syntax (language structure) - Web Agents: Synthetic data, tree search, exploration (language interpretation)
24 posts 443 followers 124 following
Regular Contributor
Conversation Starter
comment in response to post
“casual interception” as defined in \citep{}…
comment in response to post
controlling a browser / computer! but requires a bit more tooling to set it up.
comment in response to post
Please check out our paper for more details: arxiv.org/pdf/2410.02907 And our code if you want a NNetNav-ed model for your own domain: github.com/MurtyShikhar... Done with collaborators: @zhuhao.me, Dzmitry Bahdanau and @chrmanning.bsky.social
comment in response to post
We find that cross-website robustness is limited, and almost always, performance goes up from incorporating in-domain nnetnav data. This makes it even more important to work on unsupervised learning for agents - how are you going to collect human data for *any* website? [6/n]
comment in response to post
We use this data for SFT-ing LLama3.1-8b. Our best models outperform zero-shot GPT-4 on both WebArena and WebVoyager, and reach SoTA performance among unsupervised methods for both datasets [5/n]
comment in response to post
We use NNetNav to collect around 10k workflows for over 20 websites including 15 live websites, and 5 self-hosted websites. Data is available on 🤗: huggingface.co/datasets/sta... huggingface.co/datasets/sta... [4/n]
comment in response to post
Main ideas behind NNetNav exploration 1 complex goals have intermediate subgoals thus complex trajectories must have meaningful sub-trajectories 2 Use an LM instruction relabeler + judge to test if trajectory-so-far is meaningful. If yes, continue exploring, otherwise prune [3/n]
comment in response to post
NNetNav uses a structured exploration method to efficiently search and collect traces on live-websites, which are retroactively labeled into instructions, finding a strikingly diverse set of workflows for any website (e.g. like this plot) [2/n]
comment in response to post
Now, reviewers are upset if we only finetune sub 10B parameter models!
comment in response to post
for more context: we are training the probe on sentences from PTB / BLIMP
comment in response to post
thx for sharing, though semantic parsing almost certainly benefits from modeling syntax :)
comment in response to post
SRL probe still rewards hidden states that model dependency relations, no? would like a probe thats agnostic to how well the underlying network models syntax
comment in response to post
could i get added? thx for making this!!
comment in response to post
To be fair, after some prompt engineering: German: (S (NP (DT Der) (NN Mann)) (VP (VB mag) (NP (JJ schwarze) (NNS Katzen)))) Japanese: (S (NP (NN Otoko) (PP wa)) (VP (NP (JJ kuro) (NN neko) (PP ga))
comment in response to post
nothing but blue skies, for posting puns