Then I switched to GPT-4o-mini, using LMSF's results as a reference. Tweaked the prompts and improved all LMSF metrics except for NL in GSM8k. GSM8k and Last Letter looked as expected (no diff). But in Shuffled Obj. unstructured outputs clearly surpassed structured ones. - ThreadSky

dylancastillo.co • 90 days ago

Then I switched to GPT-4o-mini, using LMSF's results as a reference.

Tweaked the prompts and improved all LMSF metrics except for NL in GSM8k.

GSM8k and Last Letter looked as expected (no diff).

But in Shuffled Obj. unstructured outputs clearly surpassed structured ones.

Comments

Posting Rules

Comments

Posting Rules

Reply