Profile avatar
ned.sh
Data science, AI/ML, analytics, visualisation. Naarm/Melbourne @thoughtworks, #dataBS, #Python, #NLP, #DuckDB, & assorted whimsical miscellania
117 posts 3,197 followers 9,471 following
Prolific Poster

Is there a good Python widget for visually comparing two data frames for differences? I'm looking for something like vdiffr or something. #dataBS #python

So I tried to combine Snowflake, Iceberg and Azure. It was as frustrating as it sounds, but I'm sure it will eventually work and be useful some day 😅 Did anyone at #databs succeed on this?

🚨 New getting started with @duckdb.org and @motherduck.com ! Because yep, the flock keeps growing! 🐥 Still haven't tried the Duck side? This is your chance - I put a lot of effort on this one. youtu.be/WYV8hvJOAQE You know what to do with your weekend now. #databs #dataengineering

We released an installation script for macOS and Linux. You can now install the DuckDB CLI client on these platforms by running: curl install.duckdb.org | sh

Since I’m somewhat in a unique position, a short thread why automating Wikipedia is not at all within reach. Key immediate issue: source relevancy. Not every piece of information is worth it.

Very good (technical) explainer answering "How has DeepSeek improved the Transformer architecture?". Aimed at readers already familiar with Transformers. epoch.ai/gradient-upd...

OMG, you can get the entire DuckDB in Action book for free, sponsored by @motherduck.com 😍 #databs motherduck.com/duckdb-book-...

It's out! It's an actual book you can actually buy! If you like data projects, and want to try some real-world ones with all the real challenges and obstacles, this book might be for you! I hope you like it!!! #dataBS #datasky

a bit more #Severance data please enjoy all analyses equally 👉 lucymcgowan.github.io/mdr-website/ and for my #rstats friends: 👉 lucymcgowan.github.io/mdr/

#DataSky If you don't know what's going on today, govt data sets are under attack and entire sites are coming down, removing extremely important data from public access. This is it, data folks. This is our domain!! Let's get archiving! I'll be over at @catalyst.coop writing scrapers, hbu?

DuckCon #6 is live: youtube.com/live/Sb9DFcl...

It begins! #DuckCON6

The latest DuckDB book, published by O'Reilly, is available in print. We ordered our copy and received it today. Thanks to Wei-Meng Lee for his educating readers on how to use DuckDB!

DuckCon #6 will start in 168 hours (January 31, 15:00)! If you plan to attend in-person, please register at duckdb.org/2025/01/31/d... The stream will be available without registration.

Got to sit down with Professor @andypavlo.bsky.social on the latest @convex.dev Databased podcast for his 2024 databases year in review! www.youtube.com/watch?v=1B-M...

Andrew Pavlo’s annual retrospective on the database world has recently been released, covering trends and innovations from the past year. My recap for @infoq.com #databases #redis #duckdb #opensource #postgres www.infoq.com/news/2025/01...

We recently explored Ibis, a Python library designed to simplify working with data across multiple storage systems and processing engines. Wrote up some learnings here: open.substack.com/pub/structur...

A quick reminder: if you’re interested in #duckdb content, I’m regularly curating a fantastic list on daily.dev More than 40+ blogs have been shared in the past 3 months 🤓 You can join the DuckDB squad here: dly.to/jA8UM2JcaMH #databs

forget everything you think you know about data visualisation and go try @emily.space's sk8plotlib package

Truly one of the most amazing shrugs in literary history #tolkien

look i made it the 1979 ibm warning

Buckle up because we're banging into the new year with my annual retrospective of the last year in databases! Highlights include license change blowback, Databricks vs. Snowflake gangwar, @duckdb.org's shotgun weddings, and buying a quarterback to impress your lover: www.cs.cmu.edu/~pavlo/blog/...

heya #databs #mlsky and #aisky peeps, does anyone have any good resources for ideas around effectively wrangling #Langchain chat stream events (V2) into sensible conversation histories of chat message objects (eg HumanMessage, AIMessage, ToolMessage etc)?

Want to kick off 2025 with a comfy cozy book club all about the philosophical practice of how we squishy humans can choose to represent reality (or at least reality as we perceive it) in cold hard bytes? This Thursday is the 1st session - find the details here: jennajordan.me/book-club/ #datasky 📚

I use Harlequin with DuckDB. Self-described as "The SQL IDE for Your Terminal.", it is an excellent tool. Big thanks to Ted Conbeer who created it. harlequin.sh

first draft of a lil civic tech starter pack! focused on practitioners, might do another one on other parts of the ecosystem later. go.bsky.app/Dw5gEA1

If you haven't used Ghostty you should. 1.0 is out now. I have been using this as my main terminal emulator for quite a while at this point and I love it.

at long last. it is finally here. the entirety of Breaking Bad, retold as a VR game. aka BREAKING BAD VR BUT THE AI IS SELF-AWARE www.youtube.com/watch?v=_FvB...

This is super exciting! I've been hanging out for a modern uplift to BERT models that have larger context windows. 512 tokens is pretty limiting for a number of use cases I've had. Have yet to dig in, but it looks like awesome work! #MLSky #DataBS #NLP

Great data journalism: people report “best times” is when they were 10-15 yo www.washingtonpost.com/business/202...

"That Pesky Last Ten Percent" - Some nice advice for my ADHD-addled brain on _actually_ finishing projects. ...he writes as he considers starting an entirely new duckdb plugin... www.extrafocus.com/p/that-pesky...

Oh yeah, so next Thursday newsletter post, I thought it might be fun to take questions about stuff and answer them. Doesn't particularly have to be serious 🙃 or related to data, send me Qs! #dataBS

I've been thinking about how AI and LLMs are changing what it means to be trustworthy - for a computer, for a business, and for an engineer jfkirk.github.io/posts/trustw...

KNN + topic detection getting a big glow-up www.anthropic.com/research/clio

@hannes.muehleisen.org and I did a podcast about ducks and databases. Check it out: open.spotify.com/episode/7zBd...

Starter pack for open-source maintainers in the Python ecosystem! Feel free to share and let me know who I'm missing! go.bsky.app/D2Be5mg

*wild* website from artist juli kearns painstakingly showing how all the shots for the shining imply physical spaces that are literally impossible good internet idyllopuspress.com/idyllopus/fi...

love love @randyau.com’s latest article it starts to get towards something i’ve been writing about privately to clarify my own thinking — that data work is in fact a primarily intuitive field, driven by passion, patterns, symbols www.counting-stuff.com/r/0acb50d8?m...

One of my fav posts from @brunoborges.bsky.social over on the old site, hope he doesn't mind me reposting it here :) #dataBS

#DataBS #Python #MacOS: What are your go to apps for doing data work and development? I gifted myself a MacBook and I’m coming off of Windows 10. I’m familiar with common IDEs and cli tools like DuckDB, but not sure what I might be missing. Also looking for good mouse and CLI setup suggestions.

Got a question for #dataBS: Anyone ever survived a data migration with a story to tell—the good, the bad, or the "we’ll laugh about this someday"? I’m writing a technical piece and I’d love to hear your wins, cautionary tales, and hard-earned lessons. Please DM if you're keen!

Two fun episodes are on deck and you can join the live recording to be part of the show. 1. Django Ninja - 7am Pacific Time www.youtube.com/watch?v=SM1y... 2. DuckDB - 11:30am Pacific Time www.youtube.com/watch?v=3wGe... See you there! #python cc @mkennedy.codes

Bring your DuckDB and Python questions!! See you at 11:30am Pacific! @duckdb.org @motherduck.com

Just finished the github release for 0.3 of my GeoParquet downloader plugin. It's got an installation improvement to automatically download/install DuckDB. But I'm not sure if it works well - could someone here test? Just install the zip download from github.com/cholmes/qgis... on a clean qgis.