The Llama 3.2 1B and 3B models are my favorite LLMs -- small but very capable. If you want to understand how the architectures look like under the hood, I implemented them from scratch (one of the best ways to learn): github.com/rasbt/LLMs-f...