jankaul.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Yes, you can remove both. I tested it on Android.

submitted 99 days ago

comment in response to post

Same here. The design is so awesome. Also love the "Unix" style interaction which makes it compose so well with other tools.

submitted 123 days ago

comment in response to post

I tried Aider and Claude Code, their approaches are very similar but Claude Code feels much more powerful. It's really great at getting additional context in the process. While Aider only gets it beforehand. The only thing that's missing from Claude Code is AI comments: aider.chat/docs/usage/w...

submitted 123 days ago

comment in response to post

Awesome post, as always! One thing I realized lately is that the authentication should be standardized with the Iceberg REST catalog (like an OIDC endpoint). Otherwise every vendor has their own authentication and only their client will know how to authenticate.

submitted 129 days ago

comment in response to post

Claude is actually getting pretty good at coding.

submitted 135 days ago

comment in response to post

Did you find a good AI assistant for vim?

submitted 136 days ago

comment in response to post

@thorstenball.com is contemplating the same thing: [register spill](registerspill.thorstenball.com/p/how-might-...) It might be CONTEXT.md

submitted 149 days ago

comment in response to post

Looking forward!

submitted 158 days ago

comment in response to post

Sadly, this book is overlooked by too many people. It's a must-read if you're in data. I wish there was an ebook version.

submitted 162 days ago

comment in response to post

You're right, I wasn't entirely clear

submitted 193 days ago

comment in response to post

Well, iceberg makes this metadata available at a higher level: the manifest-list and manifest files. Which means that you don't have to read all the parquet files.

submitted 193 days ago

comment in response to post

Every commercial data warehouse stores additional metadata like upper & lower bounds, statistics, and distinct counts on top of the actual data files to assist the query optimizer. Iceberg is an open standard for this kind of metadata and provides speed ups over plain parquet.

submitted 194 days ago

comment in response to post

Well, ideally it would be complete read and write support.

submitted 194 days ago

comment in response to post

I think there should be an option to use Iceberg without a catalog. This would free you from any lock-in and give you the Open table format that everybody wants.

submitted 195 days ago

comment in response to post

You are right. The UX isn't there yet, but it's slowly getting easier to interact with. There are multiple CLIs that you can use with Datafusion: github.com/datafusion-c... github.com/JanKaul/fros...

submitted 195 days ago

comment in response to post

Or perhaps Datafusion instead of DuckDB🤔

submitted 195 days ago

comment in response to post

Man, your knowledge base is just awesome!

submitted 195 days ago

comment in response to post

Yes, here: youtu.be/tksTFG2YoZM?...

submitted 198 days ago

comment in response to post

This is going to be awesome! Really looking forward to meeting you there.

submitted 200 days ago

comment in response to post

I haven't seen anything that does everything in one function as you describe it. For the transformation I would already consider dbt, SQL mesh and SDF declarative tools.

submitted 202 days ago

comment in response to post

Thanks! I read your awesome article about the declarative data stack. Really cool. I definitely see the industry moving in that direction.

submitted 202 days ago

comment in response to post

I've been building something similar: - Iceberg as table format - Datafusion as query engine - Airbyte for ingestion Check out my presentation here: youtu.be/tksTFG2YoZM?...

submitted 202 days ago

comment in response to post

The catalog only really works in combination with an authentication and authorization system. If you're using the OSS version there is no real support for that. So you're essentially forced to use the managed version.

submitted 203 days ago

comment in response to post

It's interesting why AWS didn't go with the REST catalog.

submitted 204 days ago

comment in response to post

Do you actually need to build a new service here? Why can't you just register multiple catalogs with your query engine?

submitted 204 days ago

comment in response to post

Kimball! I love it when old theories hold true until today. Sometimes you have to opt for something else, but that is rarely the case.

submitted 205 days ago

comment in response to post

Classic marketing material. Thanks for the explanation.

submitted 206 days ago

comment in response to post

For me the S3 Tables API feels like yet another catalog API that requires an additional service on top of your standard object storage. Classic S3 is the defacto standard, I'm not sure the same will happen with S3 Tables. I would have preferred a catalog that just uses the classic S3 API.

submitted 208 days ago

comment in response to post

Awesome, thanks!

submitted 208 days ago

comment in response to post

It's been a while since I did some Java. But is there a difference between the software.amazon.s3tables package and the software.amazon.awssdk.s3tables package? I'm wondering where the UpdateTableMetadataLocationRequest comes from. It's not defined in the mentioned repository.

submitted 208 days ago

comment in response to post

Can't find any information about the s3-tables package. github.com/awslabs/s3-t...

submitted 208 days ago

comment in response to post

It looks like it's using the external s3tables service. That's not really what I was hoping for. I was hoping we wouldn't need any other service to use iceberg tables. github.com/awslabs/s3-t...

submitted 208 days ago

comment in response to post

It would be great to have an object-storage based catalog as part of the Iceberg specification. If every vendor is doing their own thing the user experience suffers a lot.

submitted 208 days ago

comment in response to post

I'm excited like a kid before Christmas😊

submitted 215 days ago

comment in response to post

With "compare and swap" being available on all common object stores, I think these issues don't really apply anymore.

submitted 215 days ago

comment in response to post

I think it's a really valuable use case and I hope we can stop it from being deprecated.

submitted 215 days ago

comment in response to post

Correct

submitted 215 days ago

comment in response to post

Iceberg actually had the concept of a "filesystem" table for a while: iceberg.apache.org/spec/#file-s...

submitted 215 days ago

comment in response to post

I have an iceberg-implementation (github.com/JanKaul/iceb...) with a "filesystem" catalog that leverages the compare and swap operation. I'm actually just waiting for your last PRs to object_store to be released.

submitted 215 days ago

comment in response to post

bsky.app/profile/did:...

submitted 217 days ago

comment in response to post

Incredible work, folks! I never thought this would happen so quickly.

submitted 219 days ago

comment in response to post

That is awesome, do you know if there will be a recording available? I would love to see it

submitted 228 days ago

comment in response to post

🙏 thanks for all the contributions. It really is an amazing project and I'm excited to see where it's going.

submitted 232 days ago

comment in response to post

Datafusion will be the OLAP version of postgres. Awesome community and great extensibility.

submitted 236 days ago