To be fair the extra prompt engineering trick may have made the LLM do better at the boolean but we don't know. Pretty sure research on LLMs people rely on don't test the narrow task of coming up with boolean
Comments
Log in with your Bluesky account to leave a comment
The same i think applies if the "ai academic search" is using at least in part semantic search based on embedding models. Technically its the difference between transformer Encoder models and GPT type decoder models.
Comments