joerocca.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

I finally joined 🦋! Some of you may recognize me from other sites. Here's a quick intro for new connections: 👋 I work on RL, world models, and generalization in decision-making. I'm perhaps most well known for my work on "TD-MPC2: Scalable, Robust World Models for Continuous Control" www.tdmpc2.com

submitted 68 days ago • 4 comments

Small models? Saturating? Where I live we don't know theses words.

submitted 8 days ago • 3 comments

New Open-source reasoning model (code, dataset, and model)! Huginn-0125: Pretraining a Depth-Recurrent Model Train a recurrent-depth model at scale on 4096 AMD GPUs on Frontier.

submitted 79 days ago • 1 comment

Zyphra beta releases Zonos, a highly expressive TTS model with high fidelity voice cloning. They release both transformer and SSM-hybrid models under an Apache 2.0 license.

submitted 79 days ago • 2 comments

Physical Intelligence (π) Open Sourcing π0 They are releasing the code and weights for the π0 as part of our experimental openpi repository. Blog: www.pi.website/blog/openpi Repo: github.com/Physical-Int...

submitted 84 days ago • 3 comments

⭐ The first foundational model available on @LeRobotHF ⭐ Pi0 is the most advanced Vision Language Action model. It takes natural language commands as input and directly output autonomous behavior. It was trained by @physical_int and ported to pytorch by @m_olbap 👇🧵

submitted 85 days ago • 5 comments

When it rains, it pours. Baichuan releases Baichuan-Omni-1.5 Open-source Omni-modal Foundation Model Supporting Text, Image, Video, and Audio Inputs as Well as Text and Audio Outputs. Both model ( huggingface.co/baichuan-inc... ) and base ( huggingface.co/baichuan-inc... ).

submitted 94 days ago • 2 comments

Latest #AI benchmark results: DeepSeek-R1 (including its distilled variants) outperforms OpenAI's o1-mini and preview models. And the Llama 3 distilled version now holds the title of the highest-performing LLM I've tested locally to date. 🚀

submitted 96 days ago • 0 comments

TypeScript excitement 😉 Thanks to @searyanc.dev for landing the new --erasableSyntaxOnly tsconfig flag. Heading for TS 5.8 Beta next week 🎉 🔷 Guides users away from TS-only runtime features such as enum & namespace 🔷 Pairs nicely with Node's recent TypeScript support github.com/microsoft/Ty...

submitted 96 days ago • 6 comments

4-bit Sana released demo: svdquant.mit.edu github.com/NVlabs/Sana/...

submitted 96 days ago • 0 comments

Hugging Face's GRPO to TRL - the training algorithm behind DeepSeek R1 🔋Eliminates the value function from PPO to save boatloads of compute 💰 Samples N completions per prompt to compute average rewards across a group To use it, run: pip install git+https://github.com/huggingface/trl.git

submitted 97 days ago • 0 comments

Prime Intellect releases: - INTELLECT-MATH, a frontier 7B parameter model for math reasoning that shows that the quality of your SFT initialization strongly impacts reinforcement learning. Blog: www.primeintellect.ai/blog/intelle... Models: huggingface.co/PrimeIntelle...

submitted 98 days ago • 1 comment

We’ve been thrilled by the positive reception to Gemini 2.0 Flash Thinking we discussed in December. Today we’re sharing an experimental update w/improved performance on math, science, and multimodal reasoning benchmarks 📈: • AIME: 73.3% • GPQA: 74.2% • MMMU: 75.4%

submitted 98 days ago • 8 comments

SambaNova's EvaByte The open-weight tokenizer-free language model. Their 6.5B byte-level LM—-EvaByte matches modern tokenizer-based LMs with 5x less data & 2x faster decoding!

submitted 98 days ago • 2 comments

ByteDance's UI-TARS, which can operate on your local personal device. Project: github.com/bytedance/UI... Desktop: github.com/bytedance/UI... Browser: github.com/web-infra-de... Models : huggingface.co/bytedance-re... Paper: arxiv.org/abs/2501.12326

submitted 98 days ago • 1 comment

Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by 🤗 Transformers.js. WebGPU support coming soon! 👉 npm i kokoro-js 👈 Link to demo (+ sample code) in 🧵

submitted 104 days ago • 1 comment

DeepSeek-R1 is coming soon. DeepSeek-R1 (Preview) Results. The model performs in the vicinity of o1-Medium providing SOTA reasoning performance on LiveCodeBench.

submitted 103 days ago • 0 comments

NVIDIA AceInstruct-72B a family of advanced SFT models for coding, mathematics, and general-purpose tasks research.nvidia.com/labs/adlr/ac... huggingface.co/nvidia/AceIn...

submitted 102 days ago • 0 comments

New sharing step on our journey towards easy-to-use fully-open models.

submitted 104 days ago • 0 comments

📢 Paper + code release 📃💻 After 2 years of work, I'm excited to announce our newest paper, MatterGen, has been published in Nature! www.nature.com/articles/s41... We are also releasing all the training data, model weights, model code, and evaluation code on GitHub! github.com/microsoft/ma...

submitted 104 days ago • 3 comments

TinyBVH has been updated to 1.2.5 on main. New: TLAS/BLAS construction and traversal, for single and double precision BVHs, and including a brand new GPU demo: See the attached real-time footage, captured at 1280x720 on an NVIDIA 2070 laptop GPU. #RTXoff github.com/jbikker/tiny...

submitted 104 days ago • 3 comments

InternLM v3 - Performance surpasses models like Llama3.1-8B and Qwen2.5-7B - Capable of deep reasoning with system prompts - Trained only on 4T high-quality tokens huggingface.co/collections/...

submitted 105 days ago • 2 comments

Google's Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time as presented by one of the author - @alibehrouz.bsky.social

submitted 107 days ago • 4 comments

ViTPose -- best open-source pose estimation model just landed to @hf.co transformers 🕺🏻💃🏻 🔖 Model collection: huggingface.co/collections/... 🔖 Notebook on how to use: colab.research.google.com/drive/1e8fcb... 🔖 Try it here: huggingface.co/spaces/hysts...

submitted 111 days ago • 1 comment

Deno is committed to web standards - that's why we co-founded WinterCG two years ago. Today marks the next step in that journey: WinterCG moves to Ecma International as technical comittee 55 (TC55). Goodbye WinterCG, welcome WinterTC! deno.com/blog/wintertc

submitted 110 days ago • 1 comment

🔍 Massive human feedback dataset for text-to-image models from RapidData - 1.5M human responses from 152K participants - Evaluates image coherence, style & prompt alignment - Includes detailed error heatmaps - Covers DALL-E, Midjourney, Imagen outputs Available on @hf.co

submitted 111 days ago • 1 comment

ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license 💗 The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️

submitted 111 days ago • 3 comments

microsoft/phi-4 phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. huggingface.co/microsoft/ph...

submitted 112 days ago • 2 comments

Sana 4k released huggingface.co/Efficient-La...

submitted 112 days ago • 0 comments

Thrilled to share the latest work from our team at @Apple where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport 🔥 📄 arxiv.org/abs/2410.23054 🛠️ github.com/apple/ml-act 0/9 🧵

submitted 141 days ago • 3 comments

🚀 ProTracker delivers accurate and robust Tracking Any Point (TAP) with a Kalman filter-inspired Probabilistic approach, seamlessly fusing optical flow and semantic cues for smoother, more accurate trajectories! Project page: michaelszj.github.io/protracker/ Paper: arxiv.org/abs/2501.03220

submitted 112 days ago • 1 comment

Hi folks! I'm excited to be on BlueSky! I'm looking forward to posting about computer science research, ML, scientific advances, tasty food, nature, and making groan-worthy puns.

submitted 114 days ago • 45 comments

arxiv.org/abs/2501.00103 Paper for LTX is out :) I really like the model.

submitted 116 days ago • 0 comments

OLMo 2 tech report is out! We get in the weeds with this one, with 50+ pages on 4 crucial components of LLM development pipeline:

submitted 117 days ago • 3 comments

I've always wanted to build things with D3, but the learning curve was too high. At least for the bespoke stuff I wanted to make (not just simple bar charts). I can finally make things like this thanks to Cursor! I just art directed this, and it made everything work beautifully. Even on mobile 🎉

submitted 120 days ago • 8 comments

Our first release of 2025: 𝙨𝙢𝙤𝙡𝙖𝙜𝙚𝙣𝙩𝙨, 𝘁𝗵𝗲 𝘀𝗶𝗺𝗽𝗹𝗲𝘀𝘁 𝗹𝗶𝗯𝗿𝗮𝗿𝘆 𝘁𝗼 𝗯𝘂𝗶𝗹𝗱 𝗮𝗴𝗲𝗻𝘁𝗶𝗰 𝘀𝘆𝘀𝘁𝗲𝗺𝘀! 💥 Main logic in ~1000 LoC 🧑‍💻 Agent writes its actions in code! LLMs are much better at writing code than current standard of writing JSON => higher perf 🌍 Any LLM support (h/t LiteLLM) 🛡️ Secure code exec (h/t E2B)

submitted 119 days ago • 4 comments

First project of 2025: Vision Transformer Explorer I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! 🤯 Try it out yourself! 👇

submitted 119 days ago • 1 comment

hny

submitted 119 days ago • 0 comments

Thank you, Jimmy Carter. This chart is on a log scale. This year there have been just 7 cases of guinea worm. ourworldindata.org/grapher/numb...

submitted 121 days ago • 9 comments

this thing runs. in. the. browser. it blows my mind

submitted 121 days ago • 0 comments

Switti -- a new scale-wise transformer for text-to-image generation 🦾 🔥 Improved generation of fine-grained details. Outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being up to 7x faster.

submitted 121 days ago • 1 comment

Daniel Han ( of @unsloth.bsky.social )'s diagram of DeepSeek v3 Architecture. 1. Float8 uses E4M3 for forward & backward - no E5M2 2. Every 4th FP8 accumulate adds to master FP32 accum 3. Latent Attention stores C cache not KV cache 4. No MoE loss balancing - dynamic biases instead

submitted 121 days ago • 2 comments

Vincent Abbott created this diagram of DeepSeek-V3 architecture and compared it to Mixtral: -64 routed experts + 2 shared v 8 => x8.25 experts -1408 v 14336 inner dim => ~x0.1 -2048 v 4096 model dim=> x0.5 -8 (6r+2s) experts per pass v 2 => x4 parameters=8.25x0.1x0.5=41% compute=0.1x0.5x4=20%

submitted 121 days ago • 0 comments

I have created a Starter Pack for Software Engineering Research Software Engineering Researchers, from Academia or Industry #SEResearch Some PL people too bsky.app/starter-pack...

submitted 141 days ago • 7 comments

Have you tried JSPM Generator yet? It is a great tool for generating importmaps that you can paste into your HTML file, so that your JavaScript modules can import libraries. It currently lets you choose from 4 module sources: jspm.io, esm.sh, unpkg.com, and cdn.jsdelivr.net. generator.jspm.io

submitted 123 days ago • 0 comments

Meta's SemiKong, a model built with Llama, is the world's first open source semiconductor-focused LLM. With this work AITOMATIC is enabling semiconductor companies to build Domain-Expert Agents to capture and scale their deep domain expertise. ai.meta.com/blog/aitomat...

submitted 123 days ago • 0 comments