Profile avatar
dorialexander.bsky.social
LLM for the commons.
1,118 posts 6,357 followers 616 following
Regular Contributor
Active Commenter

I’m actually happy to have gone through the Common Corpus report because the reality is that we’re going to pause open data release for a while. By now multiple grant attempts, all negative, maybe people just don’t care about it.

Math and humanities are almost in mirror positions in regards to current RL experience design. Biggest issue with math is problem generation. In the humanities, sampling new questions is relatively trivial, actual pain is verification (which could exist: source criticism is one).

OK this is what an actual SOTA benchmark look like: 1692 hand-made formalizations, intentional language/domain/time diversity (from 1962 onward). Can't complain about lack of progress on non-math areas without making a similar effort. arxiv.org/pdf/2407.11214

No we need more philosophical inputs on reasoning design, not offloading that impact and ethics section you don’t want to write.

Very promising early release of cultural heritage corpus for AI training with a detailed pipeline and data report. I’m happy to see that Pleias tooling contributed to it (for OCR detection).

Well there is certainly a growing trend to make open source AI a very small and closeted club. Kind of defeat the purpose.

can someone help me budget my family is dying food $150 candles $10 data (alexander wang) $10b rent $800

I know the meaning of critique is not the same, but this passage from Critique of pure reason does feel like a generalist RL anticipation.

In reality, Kant is more of a DeepSeek fan.

How I'm seeing my RL runs with pleias-350m and gemma 12b as a judge.

After today's experiments post-midtraining, the way i'm seeing things.

Ok I guess I have to go through that Apple paper. My immediate issue is the framing which is super binary: "Are these models capable of generalizable reasoning, or are they leveraging different forms of pattern matching?" ml-site.cdn-apple.com/papers/the-i...

My TL on the other network contains multitudes.

Seeing the reception of an open pretraining dataset.

A question I'd really like to see more people investigating: do the choice of a pretraining model matter so much once you do intensive mid-training?

Anyway, glad to have released this paper yesterday.