Researcher Jindong Wang and Associate Professor Steven Euijong Whang explore the NeurIPS 2024 work ERBench. ERBench leverages relational databases to create LLM benchmarks that can verify model rationale via keywords as well as check answer correctness. https://www.microsoft.com/en-us/research/podcast/abstracts-neurips-2024-with-jindong-wang-and-steven-euijong-whang/
Post image

Comments