Profile avatar
davnords.bsky.social
Phd Student @ Chalmers Deep Learning for Computer Vision.
10 posts 21 followers 60 following
Getting Started

GPT 4o's new image capabilities seem to be liked. The insinuation from OpenAI seems to be that it is not based on diffusion. I wonder how their work relates to the infamous NeurIPS paper "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction" (arxiv.org/abs/2404.02905)

New paper! We merge SfM reconstructions with point cloud registration. Link: arxiv.org/abs/2503.17093 Code: Not yet public, but coming later.

New paper! (arxiv.org/abs/2503.13433), we look into improving the threshold roubustness of Random Sample Consensus (RANSAC) through (less biased) inlier noise scale estimation.

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! Project Page: vgg-t.github.io Code & Weights: github.com/facebookrese...

Introducing DaD, Part 2, a pretty cool keypoint detector.

We made a new keypoint detector named DaD, paper isn't up yet, but code and weights are: github.com/Parskatt/dad

Common beliefs about equivariant networks for image input include 1) They are slow. 2) They don’t scale to ImageNet. 3) They are complicated. In my opinion, these three are all false. To argue against them, we made minimal modifications to popular vision models, turning them mirror-equivariant.