Profile avatar
jankaul.bsky.social
71 posts 63 followers 82 following
Getting Started
Active Commenter
comment in response to post
Yes, you can remove both. I tested it on Android.
comment in response to post
Same here. The design is so awesome. Also love the "Unix" style interaction which makes it compose so well with other tools.
comment in response to post
I tried Aider and Claude Code, their approaches are very similar but Claude Code feels much more powerful. It's really great at getting additional context in the process. While Aider only gets it beforehand. The only thing that's missing from Claude Code is AI comments: aider.chat/docs/usage/w...
comment in response to post
Awesome post, as always! One thing I realized lately is that the authentication should be standardized with the Iceberg REST catalog (like an OIDC endpoint). Otherwise every vendor has their own authentication and only their client will know how to authenticate.
comment in response to post
Claude is actually getting pretty good at coding.
comment in response to post
Did you find a good AI assistant for vim?
comment in response to post
@thorstenball.com is contemplating the same thing: [register spill](registerspill.thorstenball.com/p/how-might-...) It might be CONTEXT.md
comment in response to post
Looking forward!
comment in response to post
Sadly, this book is overlooked by too many people. It's a must-read if you're in data. I wish there was an ebook version.
comment in response to post
You're right, I wasn't entirely clear
comment in response to post
Well, iceberg makes this metadata available at a higher level: the manifest-list and manifest files. Which means that you don't have to read all the parquet files.
comment in response to post
Every commercial data warehouse stores additional metadata like upper & lower bounds, statistics, and distinct counts on top of the actual data files to assist the query optimizer. Iceberg is an open standard for this kind of metadata and provides speed ups over plain parquet.
comment in response to post
Well, ideally it would be complete read and write support.
comment in response to post
I think there should be an option to use Iceberg without a catalog. This would free you from any lock-in and give you the Open table format that everybody wants.
comment in response to post
You are right. The UX isn't there yet, but it's slowly getting easier to interact with. There are multiple CLIs that you can use with Datafusion: github.com/datafusion-c... github.com/JanKaul/fros...
comment in response to post
Or perhaps Datafusion instead of DuckDB🤔
comment in response to post
Man, your knowledge base is just awesome!
comment in response to post
Yes, here: youtu.be/tksTFG2YoZM?...
comment in response to post
This is going to be awesome! Really looking forward to meeting you there.
comment in response to post
I haven't seen anything that does everything in one function as you describe it. For the transformation I would already consider dbt, SQL mesh and SDF declarative tools.
comment in response to post
Thanks! I read your awesome article about the declarative data stack. Really cool. I definitely see the industry moving in that direction.
comment in response to post
I've been building something similar: - Iceberg as table format - Datafusion as query engine - Airbyte for ingestion Check out my presentation here: youtu.be/tksTFG2YoZM?...
comment in response to post
The catalog only really works in combination with an authentication and authorization system. If you're using the OSS version there is no real support for that. So you're essentially forced to use the managed version.
comment in response to post
It's interesting why AWS didn't go with the REST catalog.
comment in response to post
Do you actually need to build a new service here? Why can't you just register multiple catalogs with your query engine?
comment in response to post
Kimball! I love it when old theories hold true until today. Sometimes you have to opt for something else, but that is rarely the case.
comment in response to post
Classic marketing material. Thanks for the explanation.
comment in response to post
For me the S3 Tables API feels like yet another catalog API that requires an additional service on top of your standard object storage. Classic S3 is the defacto standard, I'm not sure the same will happen with S3 Tables. I would have preferred a catalog that just uses the classic S3 API.
comment in response to post
Awesome, thanks!
comment in response to post
It's been a while since I did some Java. But is there a difference between the software.amazon.s3tables package and the software.amazon.awssdk.s3tables package? I'm wondering where the UpdateTableMetadataLocationRequest comes from. It's not defined in the mentioned repository.
comment in response to post
Can't find any information about the s3-tables package. github.com/awslabs/s3-t...
comment in response to post
It looks like it's using the external s3tables service. That's not really what I was hoping for. I was hoping we wouldn't need any other service to use iceberg tables. github.com/awslabs/s3-t...
comment in response to post
It would be great to have an object-storage based catalog as part of the Iceberg specification. If every vendor is doing their own thing the user experience suffers a lot.
comment in response to post
I'm excited like a kid before Christmas😊
comment in response to post
With "compare and swap" being available on all common object stores, I think these issues don't really apply anymore.
comment in response to post
I think it's a really valuable use case and I hope we can stop it from being deprecated.
comment in response to post
Correct
comment in response to post
Iceberg actually had the concept of a "filesystem" table for a while: iceberg.apache.org/spec/#file-s...
comment in response to post
I have an iceberg-implementation (github.com/JanKaul/iceb...) with a "filesystem" catalog that leverages the compare and swap operation. I'm actually just waiting for your last PRs to object_store to be released.
comment in response to post
bsky.app/profile/did:...
comment in response to post
Incredible work, folks! I never thought this would happen so quickly.
comment in response to post
That is awesome, do you know if there will be a recording available? I would love to see it
comment in response to post
🙏 thanks for all the contributions. It really is an amazing project and I'm excited to see where it's going.
comment in response to post
Datafusion will be the OLAP version of postgres. Awesome community and great extensibility.