New post! Materialized View turned one last week. 🎂 To celebrate, I wrote about what’s good and bad about DuckDB. - ThreadSky

chris.blue • 125 days ago

New post! Materialized View turned one last week. 🎂 To celebrate, I wrote about what’s good and bad about DuckDB.

Comments

nicolay.fyi•125 days ago

All the new compute engines just make me question at what scale or perf-requirements do I need the specialized storage.

jdsantos.com•125 days ago

“For the unfamiliar, DuckDB is essentially SQLite for columnar data.”

I went down a DuckDB rabbithole last week and this one sentence would have saved me so much time and confusion

chris.blue•125 days ago

IKR? Me 2 years ago:

https://bsky.app/profile/archive.chris.blue/post/3l7ht3ja7js2l

hamilton.bsky.social•125 days ago

this analogy was the main reason I started using DuckDB in 2022. I don't think it's a phrase the creators particularly like, however, because it may be too limiting. But imo the analogy is near-perfect; sqlite is also everywhere for a reason!

jdsantos.com•125 days ago

I don’t begrudge anyone talking their book, but if I can’t easily grasp the basics of your offering then all of the neat differentiators are lost on me anyway

It’s a great analogy, and I’m far more likely to remember DuckDB for future projects because of it

jdsantos.com•125 days ago

The “so what” of a technology being obfuscated by marketing is so prevalent that we should have a name for the phenomenon

I had a sense that 🦆 was the wrong answer for us, but I wasted so much time trying to piece it together

mmullins.coginiti.co•125 days ago

Interesting question to what extent organizations will need a data warehouse in a multi-compute lakehouse world. If you can get governance from the catalog then you can bring the compute appropriate to the workload.

chris.blue•125 days ago

I think @jakthom.bsky.social had a similar take. Though, I do wonder about governance coming from the catalog. AFAICT, it can hold perm info, but enforcement is still done through the reader (query engine, python lib, etc.) and bucket perms.

https://bsky.app/profile/jakthom.bsky.social/post/3la4zge6wmb2w

jakthom.bsky.social•125 days ago

You first have to punch through prod acl's to get data into a dw. Only there is it the nightmare to maintain that is commonly highlighted.

DuckDB allows you to use the prod acl's and not needing to maintain a second (usually 💩) copy in the dw is quite lovely.

jakthom.bsky.social•125 days ago

Also prod acl's are often the ones contracts are written around and compliance is enforced on... by someone who != me.

Dw is in constant state of catch-up to these, and rarely correct.

chris.blue•125 days ago

But don't the DW users have no prod ACLs at all? This is how it was at my last job. If we went the route you describe, we'd have to give product managers read only access to prod buckets. Security would pitch a fit.

matsonj.com•125 days ago

I guess when I think of catalog, I think of it extended to prod buckets.

mmullins.coginiti.co•125 days ago

There's no good way today to manage users/roles across buckets, especially at scale. The tool vendor has to read the catalog and honor the permissions when reading data from obj store. eg We read the Snowflake catalog, then serve the Iceberg data directly. The user doesn't need the S3 info.

jakthom.bsky.social•124 days ago

Totally valid, I should be more specific...

Specifically referring to customer/prod-facing analytical use cases (often backed by the DW)

Shuffling data from prod -> DW -> prod traverses 2-3 sets of acl's, incurs unnecessary latency, is fragile, and increasingly unnecessary.

Comments

Posting Rules

Reply