buremba.bsky.social
Data Engineer - Cooking https://github.com/buremba/universql đ„
87 posts
86 followers
159 following
Regular Contributor
Active Commenter
comment in response to
post
Soon to be, it looks like: youtu.be/zeonmOO9jm4?...
Otherwise, there is no point of using Parquet instead of their DuckDB native format. Iâm glad they didnât ignore the âindustry standardsâ
comment in response to
post
Is there any plan to support data compaction to data lake when data inlining is used?
comment in response to
post
I was worried about Iceberg being ignored in favor of DuckLake but looks like you fixed Icebergâs biggest problems and still kept the compatibility. Super exciting!
comment in response to
post
Turns out the implementation wasnât WAL but they had a new Iceberg compatible data lake extension. I like the direction they are going!
comment in response to
post
I have this one but they might have soon to be public extension to use the WAL to keep the data in sync with data lake: github.com/duckdb/duckd...
comment in response to
post
Thatâs a good analogy, might steal it. :) However; when the destination path is not clear (which is usually case as you need to experiment and iterate anyways) smashing can help accelerate finding the destination as you learn where not to go.
comment in response to
post
Ironically the number of stale documents in our company is increased dramatically thanks to LLM.
comment in response to
post
Oh I lost count of how much time I waste trying to infer the column names from random CSV files without a header. This is very handy!
comment in response to
post
Exactly! I think Flight will get more popular over time as it's the most efficient implementation, but this approach can help existing RESTFul apps to adopt SQL integrations before switching over to GRPC.
comment in response to
post
The main inspirations are github.com/PostgREST/po... and @qxip.bsky.social 's DuckDB webmacro extension: duckdb.org/community_ex...
comment in response to
post
Pretty common but if one of these languages is the âmainâ one, it might be more desirable to generate JSONSchema from Pydantic/TS and generate the models for other language from JSONSchema. Itâs more about where you want the source of truth should be.
comment in response to
post
I had the exact same thought..
comment in response to
post
Thanks. I'm also a fan of your creative extensions! Quackpipe was one of the inspirations. :)
comment in response to
post
One here! đ»
comment in response to
post
I couldn't figure out how to insert a table into an S3 Table without Spark. I tried to use the API but it requires me to create the files and update the metadata. PyIceberg can't write to S3 Tables through its S3 integration yet so I had to stick to Spark. boto3.amazonaws.com/v1/documenta...
comment in response to
post
If AWS is serious about S3 Tables, they should support Iceberg REST Catalog in it. Right now we can only create tables with Spark.
comment in response to
post
Qlik's Upsolver acquisition shows the importance of adopting new technologies as a potential acquisition target for bigger companies. It's a 10-year-old company, and they raised a ton, so I'm not sure how good the deal was for the co-founders.
comment in response to
post
dbt acquiring SDF Labs shows how important it is to have a good relationship with your competitors. SQLMesh might be more ambitious, but I'm sure it was a good exit for SDF founders in only 2 years!
comment in response to
post
For the record I checked if Motherduck notebooks ahave it but doesnât seem to be the case, at least yet.
comment in response to
post
Look great! I would love to try out, Where is this going to be available?
comment in response to
post
Workers AI supports models like llama-3.3-70b and itâs powered by containers according to their announcement so I hope it will TB level limits..
I also wonder how they will position container support. Wouldnât it be better to just call it âcustom workersâ similar to container support in Lambda?
comment in response to
post
I also use Lambda but CF Workers is very appealing especially when the data is in R2 for me.
comment in response to
post
Can you run DuckDB on Cloudflare? I haven't tried Python worker but since it uses Pyodide I don't it's unlikely to run and AFAIK WASM version still has some more work: github.com/duckdb/duckd...
comment in response to
post
When I hand-write too much duplicated YAML, I feel like I'm too smart for the task but then after trying out these fancy config languages I feel like I'm too dumb to use them. đ«
comment in response to
post
Yeah that's correct. I tested Starlark (github.com/bazelbuild/s...) the other day and thought I could use Python instead. I'm mostly interested in the IDE integrations (VSCode + Intellij) but TBH Python has first-class support in both IDEs and it's hard to beat.
comment in response to
post
I use YAML mostly at work for some internal projects and some for personal dbt-based transformations. I aim to reduce duplication by reusing the definitions & automating the YAML generation where possible. Started experimenting with PKL (github.com/apple/pkl) and CUE (cuelang.org)
comment in response to
post
Is the cache local or remote? With WASM I thought people mostly rely on browser cache but I might be wrong.
comment in response to
post
Mine is writing less YAML and I am hopeful
comment in response to
post
Yeah itâs kinda black box on when the compaction kicks in and how it works. The API has relevant features but they are not surfaced anywhere in the console. boto3.amazonaws.com/v1/documenta...
comment in response to
post
Is it slower than classic S3 tables for you? In my tests, it was about to be the same but I used EC2.
comment in response to
post
This is cool but how about the mission to decrease the amount of Avro files in the world? đ
comment in response to
post
32 bytes for a number is crazy
comment in response to
post
@felixscherz.bsky.social already created the draft PR for pyiceberg here: github.com/apache/icebe...
I think the right way would be S3 adopting Iceberg REST protocol natively but this would be the alternative.
comment in response to
post
Assuming you refer to S3Tables, I believe you already know it better than me :)
comment in response to
post
If you are operating with a single table and know the path of a Iceberg metadata file, you donât need a catalog. Here is an example: duckdb.org/docs/extensi...
Catalog is for features such as time travel, atomic update/merge/insert.
comment in response to
post
We are all thankful for SAP HANA in DuckDB community
comment in response to
post
id
comment in response to
post
I agree but finding the right prompt to guide AI requires different mental model compared to actually trying to fix the code for me.
Maybe I will become a better âprompt engineerâ over time but the constant feedback loop makes me less efficient because of context switching.