🚀The return of BERT, LLMs that beat physicians?!, and the dawn of agents to simulate users. Presenting Santa's nice list for AI! 🎅 - ModernBERT - Superhuman performance of LLMs against Physicians (discussion with author @adamrodmanmd.bsky.social ) - LMagent: A Large-scale Multimodal Agents Society - ThreadSky

alphaxiv.org • 70 days ago

🚀The return of BERT, LLMs that beat physicians?!, and the dawn of agents to simulate users. Presenting Santa's nice list for AI! 🎅

- ModernBERT
- Superhuman performance of LLMs against Physicians (discussion with author @adamrodmanmd.bsky.social )
- LMagent: A Large-scale Multimodal Agents Society

Comments

alphaxiv.org•70 days ago

- Best-of-N Jailbreaking (discussion with author @jplhughes.bsky.social @aengusl.bsky.social)
-MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
-AniDoc: Animation Creation Made Easier
-Adaptive Computation Modules
- Cultural Evolution of Cooperation among LLM Agents

alphaxiv.org•70 days ago

- The Complexity Dynamics of Grokking
- The Open-Source Advantage in Large Language Models
- LlamaFusion

Learn more about these papers below📷 👇

alphaxiv.org•70 days ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder

ModernBERT introduces a state-of-the-art encoder-only transformer optimized for speed, memory efficiency, and long-context tasks.

alphaxiv.org•70 days ago

Superhuman Performance of a Large Language Model on Physician Reasoning Tasks

Evaluates OpenAI’s o1-preview model on complex clinical reasoning tasks, comparing its performance to human physicians and prior LLMs.

alphaxiv.org•70 days ago

LMAgent: A Large-Scale Multimodal Agents Society for Multi-user Simulation

A scalable framework for simulating dynamic, multimodal multi-user behavior using large-scale multimodal LLMs.

alphaxiv.org•70 days ago

Best-of-N Jailbreaking

A black-box method that jailbreaks AI models across text, vision, and audio by applying simple prompt augmentations. BoN reaches high success rates, like 89% on GPT-4o and 78% on Claude 3.5 Sonnet, with 10,000 augmented prompts, and bypasses defenses like circuit breakers.

Comments

Posting Rules

Reply