juand-r.bsky.social
CS PhD student at UT Austin in #NLP
Interested in language, reasoning, semantics and cognitive science. One day we'll have more efficient, interpretable and robust models!
Other interests: math, philosophy, cinema
https://www.juandiego-rodriguez.com/
376 posts
4,030 followers
2,503 following
Prolific Poster
Conversation Starter
comment in response to
post
Yes, it was just like that
comment in response to
post
Very interesting!
comment in response to
post
Congratulations!
comment in response to
post
Stafford Beer
comment in response to
post
๐ฎ๐ฌ๐
comment in response to
post
Iโll take a look. Thanks for sharing!
comment in response to
post
Figuratively of course.
comment in response to
post
๐
comment in response to
post
"I would propose a simple rule: no answers from nowhere. This rule is less convenient, and thatโs the point. The chatbot should be a conduit for the information of the world, not an arbiter of truth." @mikecaulfield.bsky.social
comment in response to
post
๐ Paper: arxiv.org/abs/2504.09184
โจ Interactive Viz: fi-le.net/simplestories
๐ Datasets + Model Suite: huggingface.co/SimpleStories
โ๏ธ Generation Code: github.com/lennart-fink...
๐ง Training Code: github.com/danbraunai/s...
comment in response to
post
The project was led by Lennart Finke, with contributions from Chandan Sreedhara, Thomas Dooms, Mat Allen, Emerald Zhang, Thomas Marshall, Noa Nabeshima, Dan Braun and myself.
Links to paper, interactive data explorer, code, below!
comment in response to
post
Along with the dataset and model suite, we're open-sourcing two libraries to make the whole data generation and training pipeline easier:
๐ง Dataset Generation Library: Create your own datasets using LLMs
๐ป Model Training Code: Train custom models on these datasets.
comment in response to
post
TinyStories (arxiv.org/abs/2305.07759) has been extremely useful for researchers, though is quite formulaic (e.g. 59% start with "Once upon a time"). We address this with parametrized prompts, allowing the generation of diverse stories.