Profile avatar
rockt.ai
Director and Open-Endedness Team Lead Google DeepMind (GDM). Professor of Artificial Intelligence, University College London (UCL), and PI of @ucldark.com. Fellow, European Laboratory for Learning and Intelligent Systems (ELLIS).
47 posts 5,329 followers 197 following
Regular Contributor
Active Commenter

Ever thought of joining DeepMind's RL team? We're recruiting for a research engineering role in London: job-boards.greenhouse.io/deepmind/job... Please spread the word!

Excited to share our latest work on EvoTune, a novel method integrating LLM-guided evolutionary search and reinforcement learning to accelerate the discovery of algorithms! 1/12🧵

Looking forward to representing the Open-Endedness research community with a keynote at ICLR 2025 in Singapore! 🚀

Join us for our next I-X seminar with Professor Tim Rocktäschel (UCL & Google DeepMind) titled "Open-Endedness and General Intelligence". 🕓 13:30 - 14:30 (GMT) 📅 Thursday, 3 April 📍 Hybrid (White City Campus / MS Teams) 🔗Register now: ix.imperial.ac.uk/event/i-x-se... @e-giunchiglia.bsky.social

My group @FLAIR_Ox is recruiting a postdoc and looking for someone who can get started by the end of April. Deadline to apply is in one week (!), 19th of March at noon, so please help spread the word: my.corehr.com/pls/uoxrecru...

Can AI agents adapt zero-shot, to complex multi-step language instructions in open-ended environments? We present MaestroMotif, a method for skill design that produces highly capable and steerable hierarchical agents. Paper: arxiv.org/abs/2412.08542 Code: github.com/mklissa/maestromotif

Thank you for the kind words Jeff 🙏

Hi Open-Endedness community, what are everyone's favorite "the stepping stones don't look anything like the final product" real-world examples?

🚨BALROG leaderboard update This week's new entries on balrogai.com are: Llama 3.3 70B Instruct 🫤 Claude 3.5 Haiku✨ Mistral-Nemo-it (12B) 🆗 Github: github.com/balrog-ai/BA...

It’s been a crazy 2 years seeing so many amazingly talented researchers bring GENerative Interactive Environments alive in Genie 1 and 2. the future is agents in generative environments

Memorization is a big problem for diffusion models - you don't want them to output the actual training data! We present a new way to understand memorization from a dynamic systems perspective, and an effective mitigation method that does not require weight updates.

Thrilled to share Genie 2! Endless environments created by text or images, a key to open-ended/AI-Generating Algorithms. Genie 1 showed it's possible. 9 months later, Genie 2 shows jaw-dropping progress.🤯 Witness the magic of scale, again. 📈🚀 Thx to all team members @deep-mind.bsky.social!

Excited to reveal Genie 2, our most capable foundation world model that, given a single prompt image, can generate an endless variety of action-controllable, playable 3D worlds. Fantastic cross-team effort by the Open-Endedness Team and many other teams at Google DeepMind! 🧞

It's great to see BALROG featured on Jack Clark's Import AI newsletter! Check out what he had to say about it here: jack-clark.net And check out BALROG's leaderboard on balrogai.com

I really loved recording this since I don't know that much NAS. Colin is really good at connecting different strands within that research direction from the beginning to what's next 🌟

Are there limits to what you can learn in a closed system? Do we need human feedback in training? Is scale all we need? Should we play language games? What even is "recursive self-improvement"? Thoughts about this and more here: arxiv.org/abs/2411.16905

@ucl-dark.bsky.social entered the stage! Thanks @lauraruis.bsky.social :)

This may sound odd, but game-based benchmarks are some of the most useful for AI, since we have human scores and they require reasoning, planning & vision The hardest of all is Nethack. No AI is close, and I suspect that an AI that can fairly win/ascend would need to be AGI-ish. Paper: balrogai.com

A little thread on NotebookLM and journalism uses. First and most obviously, if you’re a journalist working a specific beat and often referring to reports, research or inquiry evidence, NotebookLM is just an excellent way of keeping those reports in one place, and sharing them with collaborators /1

Excited to announce "BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games" led b UCL DARK's @dpaglieri.bsky.social! Douwe Kiela plot below is maybe the scariest for AI progress — LLM benchmarks are saturating at an accelerating rate. BALROG to the rescue. This will keep us busy for years.