In multi-turn conversations things do eventually diverge. We see this even with Llama 405b where the fp8 and fp16 seem to only have non-semantic differences (given identical benchmark scores), but in long conversations the difference is stark.
Comments
Log in with your Bluesky account to leave a comment
Comments