Profile avatar
hamel.bsky.social
Building something new 👉 http://nbdev.fast.ai Ex Github, Airbnb, DataRobot. ML / Data Tooling & OSS
102 posts 6,180 followers 654 following
Regular Contributor
Active Commenter

Finally got around to trying Answer.ai 's "nbsanity": Works beautifully on the first try! Even renders my interactive Plotly stuff! Just replace "github" in your notebook URL with "nbsanity", as in... nbsanity.com/static/3465a... More info from @hamel.bsky.social : www.answer.ai/posts/2024-1...

In case you missed it, @hamel.bsky.social reviewed Devin. It succeeded on 3/20 assigned tasks.

Thoughts On A Month With Devin (the "AI software engineer") by @hamel.bsky.social "Out of 20 tasks we attempted, we saw 14 failures, 3 inconclusive results, and just 3 successes. More concerning was our inability to predict which tasks would succeed."

Enjoyed the systematic first hand reporting of their experience using Devin by @hamel.bsky.social and team. If you’ve worked with llm coding assistants, the results aren’t surprising, but it points to how far these models still need to go and should be worrying for how effective “agents” will be.

Thoughts On A Month With Devin by @hamel.bsky.social They decided to put it through its paces, testing it against a wide range of real-world tasks. This is their story - a thorough, real-world attempt to work with one of the most hyped AI products of 2024. www.answer.ai/posts/2025-0...

Four steps to use evals effectively in LLM applications (we haven't done the last one but are still getting great results): Eval Driven Development is the new TDD for LLM based applications. Without them, you're flying blind. #cto #llm #ai #tech #dev #genai

New LLM Eval Office Hours, I discuss the importance of doing error analysis before jumping into metrics and tests Links to notes in the YT description youtu.be/ZEvXvyY17Ys?...

This is pretty damn nifty! @hamel.bsky.social @projectjupyter.bsky.social #datascience #jupyternotebooks www.answer.ai/posts/2024-1...

Our team at @specstory.com launched our very first product iteration today. What is it? An extension for @cursor_ai that allows you to save and share your composer and chat history. Give it a try at marketplace.visualstudio.com/items?itemNa... and let us know what you think!

Recoded my second office hours on LLM Evals. We talked about observability and how to prioritize writing tests in complex systems Here are the notes: hamel.dev/notes/llm/of... Video: youtu.be/TZwmLXXFbh4?...

Running this notebook from @howard.fm hoping that it removes noise from my timeline nbsanity.com/static/0b3fd...

Make it easier to manually inspect your data! I built a small Shiny for Python web app as recommended by @hamel.bsky.social. I'm getting through my task much faster than previous iterations

I'm proud that we're going public with some positioning on what @honeycomb.io actually believes AI represents: A new, weird, and sometimes janky kind of virtual computer. Stay tuned for a lot more clear-headed posting on applied AI in the coming year. www.honeycomb.io/blog/observa...

Fantastic use of shot-scraper.datasette.io here to. Create social media cards for this new Jupyter Notebook rendering site nbsanity.com

Super exciting update to nbsanity. I've incorporated @simonwillison.net 's shotscraper. Now, all new renders get a fancy social card! This makes nbsanity a nice microblogging utility Examples of rendered notebooks: 1/3 nbsanity.com/static/6a987...

nbsanity now has a bookmarklet nbsanity.com It's a static server that renders public Jupyter notebooks with Quarto

Are you frustrated by how GitHub renders Jupyter notebooks? I have public service that renders GitHub notebooks with Quarto nbsanity.com It now works with gists!

I am holding open office hours on LLM Evals. I recorded the first one which was about evaluating multi-turn chats Notes and recording here: hamel.dev/notes/llm/of...