Updated pre-print (http://arxiv.org/abs/2308.01264) testing moral and legal intuitions in GPT-4, Claude 2.1, Llama 2, and Gemini Pro across a series of different studies. We found that...
Comments
Log in with your Bluesky account to leave a comment
GPT-4 is overall more well-aligned with human responses, but all models' performance vary substantially from one study to the next. All models show less variance and larger effects than humans.
Comments