๐ฃ Does your model learn high-quality #concepts, or does it learn a #shortcut?
Test it with our #NeurIPS2024 dataset & benchmark track paper!
rsbench: A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts
What's the deal with rsbench? ๐งต
Test it with our #NeurIPS2024 dataset & benchmark track paper!
rsbench: A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts
What's the deal with rsbench? ๐งต
Comments
1๏ธโฃ Neuro-Symbolic models (#NeSy)
2๏ธโฃ Concept Bottleneck Models (#CBMs)
3๏ธโฃ Black-box Neural Networks (NNs*)
4๏ธโฃ Vision-Language Models (#VLMs*)
* through post-hoc concept-based explanations (e.g., TCAV)
NeSy models might learn wrong concepts but still make perfect predictions!
Example: A self-driving car ๐ stops in front of a ๐ฆ๐ด or a ๐ถ. Even if it confuses the two, it outputs the right prediction!
- ๐งฎ Run algorithmic, logical, and high-stakes tasks w/ known reasoning shortcuts (RSs).
- ๐ Eval concept quality via F1, accuracy & concept collapse.
- ๐ ๏ธ Easily customize the tasks and count RSs a priori using our countrss tool!
- ๐ Evaluate concepts in in- and out-of-distribution scenarios.
- ๐ฏ Ground-truth concept annotations are available for all tasks.
- ๐ Visualize how your models handle different learning & reasoning tasks!
3 new benchmarks:
๐ข MNMath for arithmetic reasoning
๐ MNLogic for SAT-like problems
๐ SDD-OIA, a synthetic self-driving task!
They can all be made easier or harder with our data generator!
1๏ธโฃ Configurable: can be easily configured with YAML/JSON files.
2๏ธโฃ Intuitive: straightforward to use: