Depends which version you’d use. 1.5b took about 30s to answer. 8b with the same simple question took 2-3 mins. For complex queries the 8b was taking 5+ mins. I use the raspberry pi 5 16gb ram. It’s incomparable with the cloud version but I’m happy with it for my use case
Comments
I'm also trying to run an LLM with Vulkan or OpenCL to run on the GPU of the Pi 5. My attempt with llama cpp Vulkan failed. Still trying to find out how to get rusticl working on the Pi 5.
https://www.phoronix.com/news/Rusticl-V3D-OpenCL-Raspberry-Pi