TRecViT: A Recurrent Video Transformer arxiv.org/abs/2412.14294 Causal, 3× fewer parameters, 12× less memory, 5× higher FLOPs than (non-causal) ViViT, matching / outperforming on Kinetics & SSv2 action recognition. Code and checkpoints out soon. - ThreadSky

TRecViT: A Recurrent Video Transformer
https://arxiv.org/abs/2412.14294

Causal, 3× fewer parameters, 12× less memory, 5× higher FLOPs than (non-causal) ViViT, matching / outperforming on Kinetics & SSv2 action recognition.

Code and checkpoints out soon.

Comments

Posting Rules

Comments

Posting Rules

Reply