To try: have a autoregressive model predict number of tokens needed before generating. One problem is once you commit to generating N tokens you don’t know the gradients for 1..{N}…Max_N, which might be better choices. We could run all Max_N lengths and take min gradient which seems expensive - ThreadSky

nhlism.bsky.social • 114 days ago

To try: have a autoregressive model predict number of tokens needed before generating. One problem is once you commit to generating N tokens you don’t know the gradients for 1..{N}…Max_N, which might be better choices.

We could run all Max_N lengths and take min gradient which seems expensive

Comments

Posting Rules

Comments

Posting Rules

Reply