shikharmurty.bsky.social
Final year PhD Student in Computer Science @Stanford
Work on:
- Compositionality, syntax (language structure)
- Web Agents: Synthetic data, tree search, exploration (language interpretation)
24 posts
443 followers
124 following
Regular Contributor
Conversation Starter
comment in response to
post
“casual interception” as defined in \citep{}…
comment in response to
post
controlling a browser / computer!
but requires a bit more tooling to set it up.
comment in response to
post
Please check out our paper for more details: arxiv.org/pdf/2410.02907
And our code if you want a NNetNav-ed model for your own domain:
github.com/MurtyShikhar...
Done with collaborators: @zhuhao.me, Dzmitry Bahdanau and @chrmanning.bsky.social
comment in response to
post
We find that cross-website robustness is limited, and almost always, performance goes up from incorporating in-domain nnetnav data. This makes it even more important to work on unsupervised learning for agents - how are you going to collect human data for *any* website? [6/n]
comment in response to
post
We use this data for SFT-ing LLama3.1-8b. Our best models outperform zero-shot GPT-4 on both WebArena and WebVoyager, and reach SoTA performance among unsupervised methods for both datasets [5/n]
comment in response to
post
We use NNetNav to collect around 10k workflows for over 20 websites including 15 live websites, and 5 self-hosted websites.
Data is available on 🤗: huggingface.co/datasets/sta...
huggingface.co/datasets/sta...
[4/n]
comment in response to
post
Main ideas behind NNetNav exploration
1 complex goals have intermediate subgoals thus complex trajectories must have meaningful sub-trajectories
2 Use an LM instruction relabeler + judge to test if trajectory-so-far is meaningful. If yes, continue exploring, otherwise prune [3/n]
comment in response to
post
NNetNav uses a structured exploration method to efficiently search and collect traces on live-websites, which are retroactively labeled into instructions, finding a strikingly diverse set of workflows for any website (e.g. like this plot) [2/n]
comment in response to
post
Now, reviewers are upset if we only finetune sub 10B parameter models!
comment in response to
post
for more context: we are training the probe on sentences from PTB / BLIMP
comment in response to
post
thx for sharing, though semantic parsing almost certainly benefits from modeling syntax :)
comment in response to
post
SRL probe still rewards hidden states that model dependency relations, no? would like a probe thats agnostic to how well the underlying network models syntax
comment in response to
post
could i get added? thx for making this!!
comment in response to
post
To be fair, after some prompt engineering:
German:
(S
(NP (DT Der) (NN Mann))
(VP (VB mag)
(NP (JJ schwarze) (NNS Katzen))))
Japanese:
(S
(NP (NN Otoko) (PP wa))
(VP
(NP (JJ kuro) (NN neko) (PP ga))
comment in response to
post
nothing but blue skies, for posting puns