📣 Does your model learn high-quality #concepts, or does it learn a #shortcut? Test it with our #NeurIPS2024 dataset & benchmark track paper! rsbench: A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts What's the deal with rsbench? 🧵 - ThreadSky

samubortolotti.bsky.social • 79 days ago

📣 Does your model learn high-quality #concepts, or does it learn a #shortcut?

Test it with our #NeurIPS2024 dataset & benchmark track paper!

rsbench: A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts

What's the deal with rsbench? 🧵

Comments

samubortolotti.bsky.social•79 days ago

🌐 rsbench allows you to evaluate the concepts learned by:

1️⃣ Neuro-Symbolic models (#NeSy)
2️⃣ Concept Bottleneck Models (#CBMs)
3️⃣ Black-box Neural Networks (NNs*)
4️⃣ Vision-Language Models (#VLMs*)

* through post-hoc concept-based explanations (e.g., TCAV)

samubortolotti.bsky.social•79 days ago

🤔 What are reasoning shortcuts?

NeSy models might learn wrong concepts but still make perfect predictions!

Example: A self-driving car 🚗 stops in front of a 🚦🔴 or a 🚶. Even if it confuses the two, it outputs the right prediction!

samubortolotti.bsky.social•79 days ago

🔍 rsbench allows you to:

- 🧮 Run algorithmic, logical, and high-stakes tasks w/ known reasoning shortcuts (RSs).
- 📊 Eval concept quality via F1, accuracy & concept collapse.
- 🛠️ Easily customize the tasks and count RSs a priori using our countrss tool!

samubortolotti.bsky.social•79 days ago

🧪 Test your models!

- 🌍 Evaluate concepts in in- and out-of-distribution scenarios.
- 🎯 Ground-truth concept annotations are available for all tasks.
- 📊 Visualize how your models handle different learning & reasoning tasks!

samubortolotti.bsky.social•79 days ago

📊 8 challenging tasks, all with predefined settings.

3 new benchmarks:
🔢 MNMath for arithmetic reasoning
🛃 MNLogic for SAT-like problems
🚖 SDD-OIA, a synthetic self-driving task!

They can all be made easier or harder with our data generator!

samubortolotti.bsky.social•79 days ago

Easy to set up and use!

1️⃣ Configurable: can be easily configured with YAML/JSON files.
2️⃣ Intuitive: straightforward to use:

Comments

Posting Rules

Reply