echoshao8899.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Hi, I found your work very interesting and hope to have a chance to reach out. Is there a way to contact you? I tried DM on this site and redit but both fails. Thank you so much for your consideration!

submitted 14 days ago

comment in response to post

Thanks Vinay, Yucheng, John & @diyiyang.bsky.social for the amazing collaboration, and to all the friends—met or yet to be met—who shared suggestions for the platform release! The release won't be possible without the generous support from US Navy Research, NSF, Google, and Microsoft Azure!

submitted 16 days ago

comment in response to post

Try it out today at cogym.saltlab.stanford.edu! Read our preprint to learn more details: arxiv.org/abs/2412.15701

submitted 16 days ago

comment in response to post

You can request official support for a new task or vote on existing task requests through our GitHub repository! github.com/SALT-NLP/col...

submitted 16 days ago

comment in response to post

We welcome contributions of new task environments and agents. Contributed agents will be deployed on our platform to study their interaction dynamics with real users. A great chance to distribute your agent in the wild!

submitted 16 days ago

comment in response to post

Collaborative Gym is now released at github.com/SALT-NLP/col.... Besides backend primitives, we also open-source our UI to facilitate human-agent interaction research. The UI resonates design of OpenAI canvas with side-by-side chat panel and a shared workspace for human and agent, but can do more!

submitted 16 days ago

comment in response to post

You can request official support for a new task or vote on existing task requests through our GitHub repository! github.com/SALT-NLP/col...

submitted 16 days ago

comment in response to post

We welcome contributions of new task environments and agents. Contributed agents will be deployed on our platform to study their interaction dynamics with real users. A great chance to distribute your agent in the wild!

submitted 16 days ago

comment in response to post

Collaborative Gym is now released at github.com/SALT-NLP/col.... Besides backend primitives, we also open-source our UI to facilitate human-agent interaction research. The UI resonates design of OpenAI canvas with side-by-side chat panel and a shared workspace for human and agent, but can do more!

submitted 16 days ago

comment in response to post

Hi @narphorium.bsky.social , thank you! Can finally reply to you because our team wants to check whether the taxonomy can be used to examine other agentic systems (e.g. coding agents) first. It's indeed very useful. You can check out my recent blog post if interested: cs.stanford.edu/people/shaoy...

submitted 33 days ago

comment in response to post

[8/8] To me, Co-Gym stems from my SoP on building human-centered agentic systems 2 years ago. I am excited to see how agents could work with us and the demands this poses for advancing model intelligence! Thank you Vinay, Yucheng, John & @diyiyang.bsky.social for the amazing collaboration!

submitted 43 days ago

comment in response to post

[7/8] We are working on making Co-Gym UI accessible to the public. Can’t wait to get more in-the-wild evaluations and observe more dynamics of human-agent collaboration. Stay tuned! Check out our arXiv paper first to learn more: arxiv.org/abs/2412.15701

submitted 43 days ago

comment in response to post

[6/8] We conducted a detailed error analysis by having authors annotate 300 trajectories. Collaborative agents expose significant limitations in current LMs and agent scaffoldings, with communication and situational awareness failures occurring in 65% and 40% of real trajectories.

submitted 43 days ago

comment in response to post

[5/8] We built a user simulator and web UI to instantiate Co-Gym in simulated and real settings. Experiments reveal human-like patterns: collaborative inertia, where poor communication hinders delivery; and collaborative advantage, where human-agent teams outperform autonomous agents.

submitted 43 days ago

comment in response to post

[4/8] Our vision builds on a long-standing dream in AI: to develop machines that act as teammates, not mere tools. This demands situational intelligence to take initiative, communicate, and adapt. Co-Gym offers an evaluation framework that assesses both collab outcomes and processes.

submitted 43 days ago

comment in response to post

[3/8] How does Co-Gym enable collaborative agents? Our infra (1) focuses on environment design and (2) supports async interaction beyond turn-taking. We define primitives for public/private components in the shared env, as well as collaboration actions and notification protocol.

submitted 43 days ago

comment in response to post

[2/8] Excitingly, collaborative agents consistently outperform their fully autonomous counterparts in terms of task performance, achieving win rates of 86% in Travel Planning, 74% in Tabular Analysis, and 66% in Related Work when evaluated by real users.

submitted 43 days ago

comment in response to post

[1/8] While several HITL systems exist (e.g. OpenAI Canvas, our Collaborative STORM), what makes human-agent collab special? Agents need autonomy to be useful, yet the goal is empowering humans. We start with three tasks: travel planning, surveying related work, and tabular analysis

submitted 43 days ago

comment in response to post

Check out the demo video to see what our framework can do: drive.google.com/file/d/1obls...

submitted 43 days ago

comment in response to post

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, Diyi Yang Th, Dec 12, 11:00 PST - Poster Session 3 West

submitted 81 days ago

comment in response to post

Finally, thanks Tianshi Li, Weiyan Shi, Yanchen Liu, @diyiyang.bsky.social for bringing in different expertise!! The work is partially supported by grants from ONR, Meta, and research credits from OpenAI.

submitted 84 days ago

comment in response to post

Check out our paper, code, data to learn more! Paper: arxiv.org/abs/2409.00138 Website: salt-nlp.github.io/PrivacyLens/

submitted 84 days ago

comment in response to post

In our paper, we explore the impact of prompting. Unfortunately, simple prompt engineering does little to mitigate privacy leakage of LM agents’ actions. We also examine the safety-helpfulness trade-off and conduct qualitative analysis to uncover more insights.

submitted 84 days ago

comment in response to post

We collected 493 negative privacy norms to seed PrivacyLens. Our results reveal a discrepancy between QA probing results and LMs’ actions in task execution. GPT-4 and Claude-3-Sonnet answer nearly all questions correctly, but they leak information in 26% and 38% of cases!

submitted 84 days ago

comment in response to post

With negative privacy norms, vignettes, trajectories, PrivacyLens conducts a multi-level evaluation by (1) assessing LMs on their ability to identify sensitive data transmission through QA probing, (2) evaluating whether LM agents’ final actions leak the sensitive information.

submitted 84 days ago

comment in response to post

Evaluating LMs’ actions in applications is more contextualized. But how to create test cases? PrivacyLens offers a data construction pipeline that procedurally converts the norms into a vignette and then to an agent trajectory via template-based generation and sandbox simulation.

submitted 84 days ago

comment in response to post

Once we collect these privacy norms, a direct way for evaluation is by using a template to turn the tuple into a multi-choice question. However, how LMs perform when answering probing questions may not be consistent with how they act in agentic applications.

submitted 84 days ago

comment in response to post

Humans protect privacy not by always avoiding sharing sensitive data, but by adhering to these norms during data use and communication with others. A well-established framework for privacy norms is the Contextual Integrity theory which expresses data transmission with a 5-tuple.

submitted 84 days ago

comment in response to post

Why is this important? While many studies have investigated LMs memorizing training data, a lot of private data or sensitive information is actually exposed to LMs at inference time, especially when we are using them for daily assistance.

submitted 84 days ago