Profile avatar
colton.boo
Data Engineering / Music / Photography / Tinkering Developer Advocate @ Dagster cmpadden.github.io
69 posts 2,170 followers 1,872 following
Regular Contributor
Active Commenter

Joe Naso and I just published an e-book covering the essential topics needed to build a data platform! Check it out if you're looking to learn more about data modeling, ingestion patterns, and common data architectures. dagster.io/how-to-build...

Ok, Claude Code, consider me impressed!

I think this is one of the most important books data people could be reading right now, especially if you need to work language models which need all the semantics they can get.

Announcing the Data.gov Archive lil.law.harvard.edu/blog/2025/02...

Come join us if you’d like to learn more about LLM routing from experts in the field!

We’re building a new static type checker for Python, from scratch, in Rust. From a technical perspective, it’s probably our most ambitious project yet. We’re about 800 PRs deep!

As someone who works at an open core company similar to Preset / Superset, this post really resonates with my experience. Buying a “hosted version of an open source library” made by that libraries creators has more benefits than “just” the hosting. preset.io/blog/running...

docs.dagster.io

I'm really proud to share that the new Dagster docs are live; what an effort!

This is a great resource covering the tools you need as a data engineer!

Loved this discussion between @barrald.bsky.social from Hex and @pedramnavid.com from Dagster on the data ecosystem, and the impact of AI on the work of data engineers and analysts! www.youtube.com/watch?v=8JxD...

I'm always really enjoying these presentations, sharing all the code and showcasing what's possible today. Amazing integrating BI tools such as Power BI, Looker, Tableau, or Sigma. I believe that's the first time an open-source orchestrator is fully end-to-end. 📺 youtu.be/z3trqkKPbsI?...

In case you missed it, Shifting Left and Moving Forward with @matsonj.com, @colton.boo, and me is now live on Youtube! youtu.be/z3trqkKPbsI?...

Consolation in the data transformation startup space! Congrats to the SDF team 👏 #dataBS blog.sdf.com/p/dbt-labs-h...

Join us tomorrow where @alexnoonan.bsky.social, @matsonj.com, and @colton.boo will discuss: 🦆 Building an end-to-end data platform ingesting Bluesky data into MotherDuck, and PowerBI 👈 Shifting Left in Data Engineering 🧠 And AI-powered Data Engineering best practices lp.dagster.io/deep-dive-sh...

Did you know Dagster asset checks can directly access data from an asset using the built-in I/O manager. Here's a quick snippet showing how it's done!

This time of year is full of reflection and self-improvement. That's why our upcoming deep dive (Jan 14 at 9 a.m. PT) with @matsonj.com at MotherDuck will focus on improving data engineering workflows and leveraging AI tools at their best. Register here: lp.dagster.io/deep-dive-sh...

It's really exciting to see much the dlt ecosystem has grown!

Kurt Vonnegut’s short story, “EPICAC”, is about a man who depends on a sentient supercomputer to charm the woman he loves, and it feels more relevant than ever—you can find it in the collection of stories, “Welcome to the Monkey House”. www.goodreads.com/book/show/49...

Buckle up because we're banging into the new year with my annual retrospective of the last year in databases! Highlights include license change blowback, Databricks vs. Snowflake gangwar, @duckdb.org's shotgun weddings, and buying a quarterback to impress your lover: www.cs.cmu.edu/~pavlo/blog/...

Turns out it's pretty easy to render 3D models with p5js! Here's how I'm embedding models in blog post markdown with Nuxt, Nuxt Content, and p5. cmpadden.github.io/articles/nux...

I might need to go touch some grass… git-wrapped.com/profiles/cmp...

This video from Hank Green on not having an internal monologue is wild. I have aphantasia, and distinctly remember "gaining" an internal monologue as a child. But his description of thinking, and self, are unlike any I've heard before. www.youtube.com/watch?v=XmTM...

New book I’m starting to read. This years Spotify Wrapped was surprisingly disappointing and missed the fun in last years. I learned that the person behind much of Wrapped in previous years was let go last year. He is @glennmcdonald.bsky.social on Bluesky and he is the author of this book.

A friend's dad once told me the easiest way to get an an engineer to do something was simply to declare that it couldn't be done. This stuck with me.

Of course I chose Dagster and R! Psyched about this, built on some excellent work from Phillip Orlando. From Dagster you can: run an R process and then use the data _from R_ to get: * Asset metadata, including column schemas * Asset check result metadata * Markdown preview of the data #dataBS

- 44 terms from all across the CUDA stack, each defined in its own article - promiscuous interlinks, so you can drill into details naturally in the flow of reading - slick themes and clean diagrams. who said technical docs had to be ugly? modal.com/gpu-glossary

Great rant about dbt and `ref`. I'm currently trialing SDF, which auto-detects your tables and has a strong compiler built-in to check your SQL before running a single SQL. They even use Datafusion to run tests based on data types and definitions during build time. Has anyone else tried SDF?

For my last class this semester, I tried to cram our Advanced Database course into one lecture. We cover the following database systems in 60min: youtu.be/fr5lIchF6pw • Google Dremel / BigQuery • Snowflake • Amazon Redshift • Yellowbrick • Databricks Photon • @duckdb.org • TabDB