this analogy was the main reason I started using DuckDB in 2022. I don't think it's a phrase the creators particularly like, however, because it may be too limiting. But imo the analogy is near-perfect; sqlite is also everywhere for a reason!
I don’t begrudge anyone talking their book, but if I can’t easily grasp the basics of your offering then all of the neat differentiators are lost on me anyway
It’s a great analogy, and I’m far more likely to remember DuckDB for future projects because of it
Interesting question to what extent organizations will need a data warehouse in a multi-compute lakehouse world. If you can get governance from the catalog then you can bring the compute appropriate to the workload.
I think @jakthom.bsky.social had a similar take. Though, I do wonder about governance coming from the catalog. AFAICT, it can hold perm info, but enforcement is still done through the reader (query engine, python lib, etc.) and bucket perms.
But don't the DW users have no prod ACLs at all? This is how it was at my last job. If we went the route you describe, we'd have to give product managers read only access to prod buckets. Security would pitch a fit.
There's no good way today to manage users/roles across buckets, especially at scale. The tool vendor has to read the catalog and honor the permissions when reading data from obj store. eg We read the Snowflake catalog, then serve the Iceberg data directly. The user doesn't need the S3 info.
Comments
I went down a DuckDB rabbithole last week and this one sentence would have saved me so much time and confusion
https://bsky.app/profile/archive.chris.blue/post/3l7ht3ja7js2l
It’s a great analogy, and I’m far more likely to remember DuckDB for future projects because of it
I had a sense that 🦆 was the wrong answer for us, but I wasted so much time trying to piece it together
https://bsky.app/profile/jakthom.bsky.social/post/3la4zge6wmb2w
DuckDB allows you to use the prod acl's and not needing to maintain a second (usually đź’©) copy in the dw is quite lovely.
Dw is in constant state of catch-up to these, and rarely correct.
Specifically referring to customer/prod-facing analytical use cases (often backed by the DW)
Shuffling data from prod -> DW -> prod traverses 2-3 sets of acl's, incurs unnecessary latency, is fragile, and increasingly unnecessary.