Profile avatar
willsmithvision.bsky.social
Professor in Computer Vision at the University of York, vision/graphics/ML research, Boro @mfc.co.uk fan and climber πŸ“York, UK πŸ”— https://www-users.york.ac.uk/~waps101/
52 posts 1,003 followers 510 following
Regular Contributor
Active Commenter

I just pushed a new paper to arXiv. I realized that a lot of my previous work on robust losses and nerf-y things was dancing around something simpler: a slight tweak to the classic Box-Cox power transform that makes it much more useful and stable. It's this f(x, Ξ») here:

#CVPR2025 Area Chair update: depending on which time zone the review deadline is specified in, we are past or close to the review deadline. Of the 60 reviews needed for my batch, I currently have 52 and they have been coming in quite fast this morning. In general, review standard looks good.

Image matching and ChatGPT - new post in the wide baseline stereo blog. tl;dr: it is good, even feels like human, but not perfect. ducha-aiki.github.io/wide-baselin...

This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical

Me and my friend-since-before-school @ekd.bsky.social (a law academic) have written a blog post about the NotebookLM podcast generator in the style of, well, a corny podcast dialogue: slsablog.co.uk/blog/blog-po... 1/5

Entropy is one of those formulas that many of us learn, swallow whole, and even use regularly without really understanding. (E.g., where does that β€œlog” come from? Are there other possible formulas?) Yet there's an intuitive & almost inevitable way to arrive at this expression.

"Sora is a data-driven physics engine." x.com/chrisoffner3...

Multistable Shape from Shading Emerges from Patch Diffusion #NeurIPS2024 Spotlight X. Nicole Han, T. Zickler and K. Nishino (Harvard+Kyoto) Diffusion-based SFS lets you sample multistable shape perception! Nicole at poster on Th 12/12 11am East A-C 1308 vision.ist.i.kyoto-u.ac.jp/research/mss...

To kick off using Bluesky: our new dataset called Oxford Spires. Synchronised, multi-color cameras and lidar in multiple Oxford colleges. Ground-Truth highly accurate 3D maps from tripod scanners. The ideal basis for NeRF/3DGS SLAM research. dynamic.robots.ox.ac.uk/datasets/oxf...

Introducing MegaSaM! Accurate, fast, & robust structure + camera estimation from casual monocular videos of dynamic scenes! MegaSaM outputs camera parameters and consistent video depth, scaling to long videos with unconstrained camera paths and complex scene dynamics!

OK If we are moving to Bluesky I am rescuing my favourite ever twitter thread (Jan 2019). The renamed: Bluesky-sized history of neuroscience (biased by my interests)

Really cool new work out of Deep Mind for video game world generation using latent diffusion! Soon you'll be able to speed run a game just by tricking a model to morph you from one location to another. deepmind.google/discover/blo...

How to drive your research forward? β€œI tested the idea we discussed last time. Here are some results. It does not work. (… awkward silence)” Such conversations happen so many times when meetings with students. How do we move forward? You need …

For my first post on Bluesky, this recent talk I did at the recent BMVA one day meeting on World Models is a good summary of my work on Computer Vision, Robotics and SLAM, and my thoughts on a bigger picture of #SpatialAI. youtu.be/NLnPG95vNhQ?...

I am a first time Area Chair for #CVPR2025 so, in the interests of transparency, I'll post some updates here on the various stages of the process. There are 708 (!) ACs (not that long ago, CVPR could have coped with 708 *reviewers*!) We've been allocated 18.27 papers on average (I have 20).

Hello Bluesky! πŸ”΅ We start our account by having our third guess for Ask Me Anything session #3DV2025AMA! Noah Snavely @snavely.bsky.social from Cornell & Google DeepMind! 🌟 πŸ•’ You have now 24 HOURS to ask him anything β€” drop your questions in the comments below! Keep it engaging but respectful!

Introducing Generative Omnimatte: A method for decomposing a video into complete layers, including objects and their associated effects (e.g., shadows, reflections). It enables a wide range of cool applications, such as video stylization, compositions, moment retiming, and object removal.

A real-time (or very fast) open-source txt2video model dropped: LTXV. HF: huggingface.co/Lightricks/L... Gradio: huggingface.co/spaces/Light... Github: github.com/Lightricks/L... Look at that prompt example though. Need to be a proper writer to get that quality.

I used πŸ“πŸ”— emojis to maximize Twitter/Bluesky parity in my profile. This is definitely pointless, but it's fun.

NeurIPS Conference is now Live on Bluesky! -NeurIPS2024 Communication Chairs

We've released our paper "Generating 3D-Consistent Videos from Unposed Internet Photos"! Video models like Luma generate pretty videos, but sometimes struggle with 3D consistency. We can do better by scaling them with 3D-aware objectives. 1/N page: genechou.com/kfcw

For those who missed this post on the-network-that-is-not-to-be-named, I made public my "secrets" for writing a good CVPR paper (or any scientific paper). I've compiled these tips of many years. It's long but hopefully it helps people write better papers. perceiving-systems.blog/en/post/writ...

As we approach the one year anniversary of a T-PAMI submission still waiting for first reviews, I imagine "With Associate Editor" to mean they sit in a lotus position atop a Himalayan peak, our paper and the reviews in their hand as they meditate (indefinitely) on what recommendation to make.

🍏 New preprint alert! 🍏 PoM: Efficient Image and Video Generation with the Polynomial Mixer arxiv.org/abs/2411.12663 This is my latest "summer project" and it was so big I had to call in reinforcements (Thanks @nicolasdufour.bsky.social) TL;DR Transformers are for boomers, welcome to the future πŸ§΅πŸ‘‡

I'm slowly putting my intro to ML course material on github, starting with the lab sessions: github.com/davidpicard/... These are self-contained notebooks in which you have to implement famous algorithms from the literature (k-NN, SVM, DT, etc), with a custom dataset that I (painstakingly) made!