I made a video on the rsLoRA paper, in which we learn that low ranks are not "sufficient" but rather, a particular scaling factor (alpha/sqrt(r)) is needed to stabilize training and unlock increased performance for high LoRA ranks. 📈📈📈

https://www.youtube.com/watch?v=TVfdT2Ymffw

Comments