We've built a simulated driving agent that we trained on 1.6 billion km of driving with no human data. It is SOTA on every planning benchmark we tried. In self-play, it goes 20 years between collisions. - ThreadSky | a Reddit-style client for Bluesky

eugenevinitsky.bsky.social • 20 days ago

We've built a simulated driving agent that we trained on 1.6 billion km of driving with no human data.
It is SOTA on every planning benchmark we tried.
In self-play, it goes 20 years between collisions.

Comments

ogredragon.bsky.social•20 days ago

humans are irrational... They'll cause a crash in no time

robinchauhan.bsky.social•20 days ago

So much to like here!
And looking fwd to a zero DUI world.
Deep Sets approach for encoding variable nos of agents interesting.
Always wondered how Deep Sets behaves OOD, ie. if there were say far more agents on road than in training (western driver encounters Asian traffic jam), does it saturate?

twkillian.bsky.social•19 days ago

I’m so glad that you found this paper. Lots of really fun implications and avenues to explore, both in applied and foundational directions.

eugenevinitsky.bsky.social•20 days ago

It's very hard to put this policy OOD tbh with changes like that. Asian traffic jams are for sure in the training distribution

axelbrunnbauer.bsky.social•20 days ago

Awesome work!

lukaschaefer.bsky.social•20 days ago

Congrats to the team, this looks awesome 👏

Love the planets on the bottom plot! 🪐

xuanalogue.bsky.social•20 days ago

Very cool!!

I'm curious if these benchmarks include hard tests of OOD generalization? Eg driving conditions in eg Mumbai or Hanoi? Seems like it'd be a nice stress test of large scale offline RL (vs online planning)

nafnlaus.bsky.social•20 days ago

Driving is one gigantic OOD challenge, unfortunately ;) It's edge cases atop edge cases atop edge cases.

xuanalogue.bsky.social•20 days ago

Also yay I just realized Marco is on the paper haha :)

eugenevinitsky.bsky.social•20 days ago

Great question. So the benchmarks have very OOD maps like Singapore and LA but we don’t have any from countries like India or Vietnam. However, if you watch the training videos they encounter way more insane situations than occur in real life

miguelalonsojr.bsky.social•20 days ago

Woah!!! This is awesome! Will you new releasing source? And if so, can I run this on my RTX4090?

eugenevinitsky.bsky.social•20 days ago

Probably no code release :( but maybe! It can run on whatever, just need to scale down the number of worlds

vcharraut.com•20 days ago

Will it possible to obtain the weights of the policy in order to try to establish harder and more complex evaluations and benchmarks?

eugenevinitsky.bsky.social•20 days ago

I'm no longer at Apple so I'm not sure, that's an internal decision I don't know much about. It'd be cool if they were released

fdellaert.bsky.social•20 days ago

Cool beans!

eugenevinitsky.bsky.social•20 days ago

Thank you!

maxxbo.bsky.social•16 days ago

Thanks @eugenevinitsky.bsky.social for sharing that great work!

Could you please help me understand how GPUDrive and GIGAFLOW are connected?

Is GIGAFLOW building upon GPUDrive?

eugenevinitsky.bsky.social•16 days ago

Thanks! So they are somewhat unrelated projects building to their own goal. Gigaflow is a project coming out of Apple and it is not clear to me (I am not at Apple anymore) if it will be opensourced. GPUDrive is a project with a similar goal coming from our group at NYU

maxxbo.bsky.social•16 days ago

Thanks for the explanation! Excited about the upcoming year then :)

eugenevinitsky.bsky.social•16 days ago

GPUDrive is somewhat less feature complete than Gigaflow and our agents are not as high quality yet though we expect to get there within the year

skhansj.bsky.social•20 days ago

How do I connect this to my Toyota via a system similar to https://comma.ai?

lerrelpinto.com•20 days ago

Hi Eugene, this sounds cool! Could you comment a bit on how well simulated driving agents translate to real world driving?

eugenevinitsky.bsky.social•20 days ago

😀. So we don't exactly know but we have some bits of compelling evidence. First, our agents our *wildly* robust when replayed with logged human trajectories which I think of as a harder task than driving with humans. The other issue is the perception gap, which I think is actually small here

eugenevinitsky.bsky.social•20 days ago

To comment on how wildly robust it is btw, in basically every case where we did find a failure, the failure was in the benchmark not in our agent

lerrelpinto.com•19 days ago

Thanks Eugene! Sounds exciting!

kashyap7x.bsky.social•19 days ago

Do you refer to the WOSAC results here? As I understood, all the planning benchmarks considered in the paper used rule-based traffic, without any log replay.

kashyap7x.bsky.social•19 days ago

Though waymax is some kind of mixture I guess, with replay for the lateral path but IDM for the longitudinal acceleration and braking

eugenevinitsky.bsky.social•19 days ago

Yeah I mean in all of these cases the "human" actors are on rails right? With the ability to pause along the rails

jamesmnewton.bsky.social•19 days ago

Amazing! Can you say more about the simulator you used, and was it in some way validated against reality? I'm addressing the "simulations are doomed to succeed" issue by hoping the simulator and reality are "twins". What are your next steps in real world deployment?

aenorist.bsky.social•20 days ago

And how much do you bet that it will murder the nearest cyclist come 2 drops of rain and a fart of fog?

johngordon.bsky.social•20 days ago

Have you tried winter yet?

eleurent.bsky.social•18 days ago

This is the vision I was striving for during my PhD, but you folks actually did it! Huge huge congratulations to the team!

Comment image

eugenevinitsky.bsky.social•18 days ago

Thank you! I feel like we've all had this "what if AlphaStar but driving" conversation and we were so lucky to get a chance to do it. It really works!

mxfh.mn•19 days ago

So that would equal 250+ years of a typical passenger not driving 24/7?

mvandepanne.bsky.social•19 days ago

Glad to see that this is seeing the light of day!

eugenevinitsky.bsky.social•19 days ago

Yeah, I'm so happy. It has been 2 and a half years since we started this

ntraft.bsky.social•19 days ago

Eugene, I've been dying for the self-driving industry to invest in self-play for years, and no one has. This is very gratifying to see!

Do you know when/where the supplementary material will be released? (I.e., where can we find the videos?!)

eugenevinitsky.bsky.social•19 days ago

So glad to hear that! We are working on it (to my knowledge, I'm not at Apple anymore), they'll hopefully be out in a bit and I'll ping you when they are

idurugkar.bsky.social•20 days ago

Whoa! Very cool stuff!

eugenevinitsky.bsky.social•20 days ago

Been waiting to tell you about this for days

eugenevinitsky.bsky.social•20 days ago

Sorry I meant years

idurugkar.bsky.social•19 days ago

Haha it's so hard to keep these exciting results to yourself until you can talk about them properly! Looks like it was totally worth it though

eugenevinitsky.bsky.social•20 days ago

This is enabled by Gigaflow, a GPU-accelerated driving simulator that lets us get 7 years of experience per hour on a single GPU by simulating a million agents at once.

Comment image

eugenevinitsky.bsky.social•20 days ago

The resultant agent has a remarkable understanding of the fundamentals of driving and is a single policy that can flexibly represent drivers, pedestrians, and cyclists. It reroutes around obstacles and identifies when scenarios are risky:

Comment image

eugenevinitsky.bsky.social•20 days ago

Link: https://arxiv.org/abs/2502.03349
I loved that we did this as a team, each adding something unique but also equally sharing all the grimy work. We all wrote the simulator, we all did the RL, we all jumped on the grenades.
w/ @senerozan.bsky.social @twkillian.bsky.social (+ others who are not on bsky)

braintelligence.bsky.social•20 days ago

This looks really cool. Does this dataset have potholes and speed bumps represented?

eugenevinitsky.bsky.social•20 days ago

I also want to shout out in particular Alex Hertzberg, who has been dreaming this project into existence with me since...2021 and Vladlen Koltun, Philipp Krähenbühl, and Stuart Bowers for providing leadership, ideas, and making it possible.

atulnet.bsky.social•20 days ago

This is cool..
Is there a GitHub with demo?

Thanks

(AV systems product manager here )

eugenevinitsky.bsky.social•20 days ago

Errr, I'm a little unsure tbh. I'm not at Apple anymore so I'm not aware of the decision making around open source

cinjon.bsky.social•20 days ago

this is hot

eugenevinitsky.bsky.social•20 days ago

Oh yeah I was about to text this to you

andrewsilva9.bsky.social•20 days ago

Super cool work and I love the incredible detail of the appendix! It looks like a ton of effort went into making this a reality, I admire the vision/collaboration of the team. Would be great to get access to that super simulator someday 😁

eugenevinitsky.bsky.social•20 days ago

Thank you! I also love the appendix, we got to put so much work into it

eugenevinitsky.bsky.social•20 days ago

I don’t know if or when this will go open source but in the meantime we have been working on something quite similar
https://github.com/Emerge-Lab/gpudrive
Some upsides in it, some downsides but similar speed

andrewsilva9.bsky.social•19 days ago

Very cool, I’ll have to check it out! Thanks for the link

nattys1.bsky.social•20 days ago

Interesting. What is the difference between your systems and what Tesla is developing?

kalinkochnev.bsky.social•20 days ago

Congratulations to you and your team for their substantial contribution!