Profile avatar
joerocca.bsky.social
Low-alpha lurking/reposting account. Interested in OSS ML, web, XR, EA (esp WAS/WAW), alt proteins, housing, aging, and stuff like that
49 posts 232 followers 3,036 following
Prolific Poster

I finally joined ๐Ÿฆ‹! Some of you may recognize me from other sites. Here's a quick intro for new connections: ๐Ÿ‘‹ I work on RL, world models, and generalization in decision-making. I'm perhaps most well known for my work on "TD-MPC2: Scalable, Robust World Models for Continuous Control" www.tdmpc2.com

Small models? Saturating? Where I live we don't know theses words.

New Open-source reasoning model (code, dataset, and model)! Huginn-0125: Pretraining a Depth-Recurrent Model Train a recurrent-depth model at scale on 4096 AMD GPUs on Frontier.

Zyphra beta releases Zonos, a highly expressive TTS model with high fidelity voice cloning. They release both transformer and SSM-hybrid models under an Apache 2.0 license.

Physical Intelligence (ฯ€) Open Sourcing ฯ€0 They are releasing the code and weights for the ฯ€0 as part of our experimental openpi repository. Blog: www.pi.website/blog/openpi Repo: github.com/Physical-Int...

โญ The first foundational model available on @LeRobotHF โญ Pi0 is the most advanced Vision Language Action model. It takes natural language commands as input and directly output autonomous behavior. It was trained by @physical_int and ported to pytorch by @m_olbap ๐Ÿ‘‡๐Ÿงต

When it rains, it pours. Baichuan releases Baichuan-Omni-1.5 Open-source Omni-modal Foundation Model Supporting Text, Image, Video, and Audio Inputs as Well as Text and Audio Outputs. Both model ( huggingface.co/baichuan-inc... ) and base ( huggingface.co/baichuan-inc... ).

Latest #AI benchmark results: DeepSeek-R1 (including its distilled variants) outperforms OpenAI's o1-mini and preview models. And the Llama 3 distilled version now holds the title of the highest-performing LLM I've tested locally to date. ๐Ÿš€

TypeScript excitement ๐Ÿ˜‰ Thanks to @searyanc.dev for landing the new --erasableSyntaxOnly tsconfig flag. Heading for TS 5.8 Beta next week ๐ŸŽ‰ ๐Ÿ”ท Guides users away from TS-only runtime features such as enum & namespace ๐Ÿ”ท Pairs nicely with Node's recent TypeScript support github.com/microsoft/Ty...

4-bit Sana released demo: svdquant.mit.edu github.com/NVlabs/Sana/...

Hugging Face's GRPO to TRL - the training algorithm behind DeepSeek R1 ๐Ÿ”‹Eliminates the value function from PPO to save boatloads of compute ๐Ÿ’ฐ Samples N completions per prompt to compute average rewards across a group To use it, run: pip install git+https://github.com/huggingface/trl.git

Prime Intellect releases: - INTELLECT-MATH, a frontier 7B parameter model for math reasoning that shows that the quality of your SFT initialization strongly impacts reinforcement learning. Blog: www.primeintellect.ai/blog/intelle... Models: huggingface.co/PrimeIntelle...

Weโ€™ve been thrilled by the positive reception to Gemini 2.0 Flash Thinking we discussed in December. Today weโ€™re sharing an experimental update w/improved performance on math, science, and multimodal reasoning benchmarks ๐Ÿ“ˆ: โ€ข AIME: 73.3% โ€ข GPQA: 74.2% โ€ข MMMU: 75.4%

SambaNova's EvaByte The open-weight tokenizer-free language model. Their 6.5B byte-level LMโ€”-EvaByte matches modern tokenizer-based LMs with 5x less data & 2x faster decoding!

ByteDance's UI-TARS, which can operate on your local personal device. Project: github.com/bytedance/UI... Desktop: github.com/bytedance/UI... Browser: github.com/web-infra-de... Models : huggingface.co/bytedance-re... Paper: arxiv.org/abs/2501.12326

Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by ๐Ÿค— Transformers.js. WebGPU support coming soon! ๐Ÿ‘‰ npm i kokoro-js ๐Ÿ‘ˆ Link to demo (+ sample code) in ๐Ÿงต

DeepSeek-R1 is coming soon. DeepSeek-R1 (Preview) Results. The model performs in the vicinity of o1-Medium providing SOTA reasoning performance on LiveCodeBench.

NVIDIA AceInstruct-72B a family of advanced SFT models for coding, mathematics, and general-purpose tasks research.nvidia.com/labs/adlr/ac... huggingface.co/nvidia/AceIn...

New sharing step on our journey towards easy-to-use fully-open models.

๐Ÿ“ข Paper + code release ๐Ÿ“ƒ๐Ÿ’ป After 2 years of work, I'm excited to announce our newest paper, MatterGen, has been published in Nature! www.nature.com/articles/s41... We are also releasing all the training data, model weights, model code, and evaluation code on GitHub! github.com/microsoft/ma...

TinyBVH has been updated to 1.2.5 on main. New: TLAS/BLAS construction and traversal, for single and double precision BVHs, and including a brand new GPU demo: See the attached real-time footage, captured at 1280x720 on an NVIDIA 2070 laptop GPU. #RTXoff github.com/jbikker/tiny...

InternLM v3 - Performance surpasses models like Llama3.1-8B and Qwen2.5-7B - Capable of deep reasoning with system prompts - Trained only on 4T high-quality tokens huggingface.co/collections/...

Google's Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time as presented by one of the author - @alibehrouz.bsky.social

ViTPose -- best open-source pose estimation model just landed to @hf.co transformers ๐Ÿ•บ๐Ÿป๐Ÿ’ƒ๐Ÿป ๐Ÿ”– Model collection: huggingface.co/collections/... ๐Ÿ”– Notebook on how to use: colab.research.google.com/drive/1e8fcb... ๐Ÿ”– Try it here: huggingface.co/spaces/hysts...

Deno is committed to web standards - that's why we co-founded WinterCG two years ago. Today marks the next step in that journey: WinterCG moves to Ecma International as technical comittee 55 (TC55). Goodbye WinterCG, welcome WinterTC! deno.com/blog/wintertc

๐Ÿ” Massive human feedback dataset for text-to-image models from RapidData - 1.5M human responses from 152K participants - Evaluates image coherence, style & prompt alignment - Includes detailed error heatmaps - Covers DALL-E, Midjourney, Imagen outputs Available on @hf.co

ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license ๐Ÿ’— The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos โฏ๏ธ

microsoft/phi-4 phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. huggingface.co/microsoft/ph...

Sana 4k released huggingface.co/Efficient-La...

Thrilled to share the latest work from our team at @Apple where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport ๐Ÿ”ฅ ๐Ÿ“„ arxiv.org/abs/2410.23054 ๐Ÿ› ๏ธ github.com/apple/ml-act 0/9 ๐Ÿงต

๐Ÿš€ ProTracker delivers accurate and robust Tracking Any Point (TAP) with a Kalman filter-inspired Probabilistic approach, seamlessly fusing optical flow and semantic cues for smoother, more accurate trajectories! Project page: michaelszj.github.io/protracker/ Paper: arxiv.org/abs/2501.03220

Hi folks! I'm excited to be on BlueSky! I'm looking forward to posting about computer science research, ML, scientific advances, tasty food, nature, and making groan-worthy puns.

arxiv.org/abs/2501.00103 Paper for LTX is out :) I really like the model.

OLMo 2 tech report is out! We get in the weeds with this one, with 50+ pages on 4 crucial components of LLM development pipeline:

I've always wanted to build things with D3, but the learning curve was too high. At least for the bespoke stuff I wanted to make (not just simple bar charts). I can finally make things like this thanks to Cursor! I just art directed this, and it made everything work beautifully. Even on mobile ๐ŸŽ‰

Our first release of 2025: ๐™จ๐™ข๐™ค๐™ก๐™–๐™œ๐™š๐™ฃ๐™ฉ๐™จ, ๐˜๐—ต๐—ฒ ๐˜€๐—ถ๐—บ๐—ฝ๐—น๐—ฒ๐˜€๐˜ ๐—น๐—ถ๐—ฏ๐—ฟ๐—ฎ๐—ฟ๐˜† ๐˜๐—ผ ๐—ฏ๐˜‚๐—ถ๐—น๐—ฑ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐—ถ๐—ฐ ๐˜€๐˜†๐˜€๐˜๐—ฒ๐—บ๐˜€! ๐Ÿ’ฅ Main logic in ~1000 LoC ๐Ÿง‘โ€๐Ÿ’ป Agent writes its actions in code! LLMs are much better at writing code than current standard of writing JSON => higher perf ๐ŸŒ Any LLM support (h/t LiteLLM) ๐Ÿ›ก๏ธ Secure code exec (h/t E2B)

First project of 2025: Vision Transformer Explorer I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! ๐Ÿคฏ Try it out yourself! ๐Ÿ‘‡

hny

Thank you, Jimmy Carter. This chart is on a log scale. This year there have been just 7 cases of guinea worm. ourworldindata.org/grapher/numb...

this thing runs. in. the. browser. it blows my mind

Switti -- a new scale-wise transformer for text-to-image generation ๐Ÿฆพ ๐Ÿ”ฅ Improved generation of fine-grained details. Outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being up to 7x faster.

Daniel Han ( of @unsloth.bsky.social )'s diagram of DeepSeek v3 Architecture. 1. Float8 uses E4M3 for forward & backward - no E5M2 2. Every 4th FP8 accumulate adds to master FP32 accum 3. Latent Attention stores C cache not KV cache 4. No MoE loss balancing - dynamic biases instead

Vincent Abbott created this diagram of DeepSeek-V3 architecture and compared it to Mixtral: -64 routed experts + 2 shared v 8 => x8.25 experts -1408 v 14336 inner dim => ~x0.1 -2048 v 4096 model dim=> x0.5 -8 (6r+2s) experts per pass v 2 => x4 parameters=8.25x0.1x0.5=41% compute=0.1x0.5x4=20%

I have created a Starter Pack for Software Engineering Research Software Engineering Researchers, from Academia or Industry #SEResearch Some PL people too bsky.app/starter-pack...

Have you tried JSPM Generator yet? It is a great tool for generating importmaps that you can paste into your HTML file, so that your JavaScript modules can import libraries. It currently lets you choose from 4 module sources: jspm.io, esm.sh, unpkg.com, and cdn.jsdelivr.net. generator.jspm.io

Meta's SemiKong, a model built with Llama, is the world's first open source semiconductor-focused LLM. With this work AITOMATIC is enabling semiconductor companies to build Domain-Expert Agents to capture and scale their deep domain expertise. ai.meta.com/blog/aitomat...