Depends which version you’d use. 1.5b took about 30s to answer. 8b with the same simple question took 2-3 mins. For complex queries the 8b was taking 5+ mins. I use the raspberry pi 5 16gb ram. It’s incomparable with the cloud version but I’m happy with it for my use case
You can try to overclock. Won't make a big difference, but it helps a bit.
I'm also trying to run an LLM with Vulkan or OpenCL to run on the GPU of the Pi 5. My attempt with llama cpp Vulkan failed. Still trying to find out how to get rusticl working on the Pi 5. https://www.phoronix.com/news/Rusticl-V3D-OpenCL-Raspberry-Pi
Comments
I'm also trying to run an LLM with Vulkan or OpenCL to run on the GPU of the Pi 5. My attempt with llama cpp Vulkan failed. Still trying to find out how to get rusticl working on the Pi 5.
https://www.phoronix.com/news/Rusticl-V3D-OpenCL-Raspberry-Pi