New π: Data storage costs are eating into analytics budgets. Wouldn't it be great to build a data warehouse on top of affordable storage?
SSDs are fast storage but expensive. S3/R2 is significantly cheaper, but the downside is that it quickly becomes messy and lacks clear governance and rules.
SSDs are fast storage but expensive. S3/R2 is significantly cheaper, but the downside is that it quickly becomes messy and lacks clear governance and rules.
Comments
This article dives into combination of these two options with data lakes and an open table format (Iceberg, Delta, Hudi, Lance) into a #opendataplatform.
But does this data architecture represent the next evolution of or extend the Lakehouse core principle? What's the difference?
Last week, I explored why open table formats are suddenly popular; this article focuses on where open table formats fit into the broader picture of #dataarchitecture.
- Live accessing Iceberg Tables with DuckDB and MotherDuck on S3
- Using DuckDB as a lightweight Data Lake access layer
- And a fun Bonus exploration with MCP with autonomous SQL queries to the MotherDuck database with Claude Desktop.
Curious, what's your take on today's open data platform architecture built on open standards and formats?
Though the "no vendor lock" argument feels more theoretical than practical, since there's always more pieces to the puzzle.
βFast. Cheap. Not entirely shitty. Pick two.β