buremba.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Soon to be, it looks like: youtu.be/zeonmOO9jm4?... Otherwise, there is no point of using Parquet instead of their DuckDB native format. I’m glad they didn’t ignore the “industry standards”

submitted 19 days ago

comment in response to post

Is there any plan to support data compaction to data lake when data inlining is used?

submitted 19 days ago

comment in response to post

I was worried about Iceberg being ignored in favor of DuckLake but looks like you fixed Iceberg’s biggest problems and still kept the compatibility. Super exciting!

submitted 19 days ago

comment in response to post

Turns out the implementation wasn’t WAL but they had a new Iceberg compatible data lake extension. I like the direction they are going!

submitted 19 days ago

comment in response to post

I have this one but they might have soon to be public extension to use the WAL to keep the data in sync with data lake: github.com/duckdb/duckd...

submitted 25 days ago

comment in response to post

That’s a good analogy, might steal it. :) However; when the destination path is not clear (which is usually case as you need to experiment and iterate anyways) smashing can help accelerate finding the destination as you learn where not to go.

submitted 41 days ago

comment in response to post

Ironically the number of stale documents in our company is increased dramatically thanks to LLM.

submitted 50 days ago

comment in response to post

Oh I lost count of how much time I waste trying to infer the column names from random CSV files without a header. This is very handy!

submitted 90 days ago

comment in response to post

Exactly! I think Flight will get more popular over time as it's the most efficient implementation, but this approach can help existing RESTFul apps to adopt SQL integrations before switching over to GRPC.

submitted 120 days ago

comment in response to post

The main inspirations are github.com/PostgREST/po... and @qxip.bsky.social 's DuckDB webmacro extension: duckdb.org/community_ex...

submitted 120 days ago

comment in response to post

Pretty common but if one of these languages is the “main” one, it might be more desirable to generate JSONSchema from Pydantic/TS and generate the models for other language from JSONSchema. It’s more about where you want the source of truth should be.

submitted 125 days ago

comment in response to post

I had the exact same thought..

submitted 129 days ago

comment in response to post

Thanks. I'm also a fan of your creative extensions! Quackpipe was one of the inspirations. :)

submitted 138 days ago

comment in response to post

One here! 🍻

submitted 138 days ago

comment in response to post

I couldn't figure out how to insert a table into an S3 Table without Spark. I tried to use the API but it requires me to create the files and update the metadata. PyIceberg can't write to S3 Tables through its S3 integration yet so I had to stick to Spark. boto3.amazonaws.com/v1/documenta...

submitted 152 days ago

comment in response to post

If AWS is serious about S3 Tables, they should support Iceberg REST Catalog in it. Right now we can only create tables with Spark.

submitted 152 days ago

comment in response to post

Qlik's Upsolver acquisition shows the importance of adopting new technologies as a potential acquisition target for bigger companies. It's a 10-year-old company, and they raised a ton, so I'm not sure how good the deal was for the co-founders.

submitted 152 days ago

comment in response to post

dbt acquiring SDF Labs shows how important it is to have a good relationship with your competitors. SQLMesh might be more ambitious, but I'm sure it was a good exit for SDF founders in only 2 years!

submitted 152 days ago

comment in response to post

For the record I checked if Motherduck notebooks ahave it but doesn’t seem to be the case, at least yet.

submitted 163 days ago

comment in response to post

Look great! I would love to try out, Where is this going to be available?

submitted 163 days ago

comment in response to post

Workers AI supports models like llama-3.3-70b and it’s powered by containers according to their announcement so I hope it will TB level limits.. I also wonder how they will position container support. Wouldn’t it be better to just call it “custom workers” similar to container support in Lambda?

submitted 165 days ago

comment in response to post

I also use Lambda but CF Workers is very appealing especially when the data is in R2 for me.

submitted 166 days ago

comment in response to post

Can you run DuckDB on Cloudflare? I haven't tried Python worker but since it uses Pyodide I don't it's unlikely to run and AFAIK WASM version still has some more work: github.com/duckdb/duckd...

submitted 167 days ago

comment in response to post

When I hand-write too much duplicated YAML, I feel like I'm too smart for the task but then after trying out these fancy config languages I feel like I'm too dumb to use them. 🫠

submitted 167 days ago

comment in response to post

Yeah that's correct. I tested Starlark (github.com/bazelbuild/s...) the other day and thought I could use Python instead. I'm mostly interested in the IDE integrations (VSCode + Intellij) but TBH Python has first-class support in both IDEs and it's hard to beat.

submitted 167 days ago

comment in response to post

I use YAML mostly at work for some internal projects and some for personal dbt-based transformations. I aim to reduce duplication by reusing the definitions & automating the YAML generation where possible. Started experimenting with PKL (github.com/apple/pkl) and CUE (cuelang.org)

submitted 167 days ago

comment in response to post

Is the cache local or remote? With WASM I thought people mostly rely on browser cache but I might be wrong.

submitted 167 days ago

comment in response to post

Mine is writing less YAML and I am hopeful

submitted 168 days ago

comment in response to post

Yeah it’s kinda black box on when the compaction kicks in and how it works. The API has relevant features but they are not surfaced anywhere in the console. boto3.amazonaws.com/v1/documenta...

submitted 176 days ago

comment in response to post

Is it slower than classic S3 tables for you? In my tests, it was about to be the same but I used EC2.

submitted 177 days ago

comment in response to post

This is cool but how about the mission to decrease the amount of Avro files in the world? 😝

submitted 178 days ago

comment in response to post

32 bytes for a number is crazy

submitted 180 days ago

comment in response to post

@felixscherz.bsky.social already created the draft PR for pyiceberg here: github.com/apache/icebe... I think the right way would be S3 adopting Iceberg REST protocol natively but this would be the alternative.

submitted 180 days ago

comment in response to post

Assuming you refer to S3Tables, I believe you already know it better than me :)

submitted 180 days ago

comment in response to post

If you are operating with a single table and know the path of a Iceberg metadata file, you don’t need a catalog. Here is an example: duckdb.org/docs/extensi... Catalog is for features such as time travel, atomic update/merge/insert.

submitted 181 days ago

comment in response to post

We are all thankful for SAP HANA in DuckDB community

submitted 181 days ago

comment in response to post

id

submitted 182 days ago

comment in response to post

I agree but finding the right prompt to guide AI requires different mental model compared to actually trying to fix the code for me. Maybe I will become a better “prompt engineer” over time but the constant feedback loop makes me less efficient because of context switching.

submitted 183 days ago