Profile avatar
taras.glek.net
LLMs, system programming https://taras.glek.net
333 posts 153 followers 311 following
Regular Contributor
Active Commenter
comment in response to post
Designing and dealing with politics around implementing this system was the one of the major reasons I burned out at Moz and rage quit... but so happy to see that it was kind of worth it
comment in response to post
how much raw data and time to process is that? Looks amazing
comment in response to post
nice illustration
comment in response to post
yeah machine-id partitioning has perks if you need it at twitter volumes. i have always used uuidv7 for low volumes of writes, was simple
comment in response to post
nice, uuidv7 has a decent competitor
comment in response to post
pretty much everyone has better code reviews than github. It is like they are betting that nobody will want to read copilot generated code anyway 😂
comment in response to post
I think duckdb is an interesting case study in disassembling the database into (while still being SQL) functionality that can be used as a lib and applied in a lot more usecases than a trad db server or embedded lib like sqlite. Maybe you would also build your theoretical lib on arrow
comment in response to post
I think this is a general trend for the industry, eg clang disaggregating compiler into components you can assemble elsewhere. I have been looking for libs like this and there is nothing outside of the reranker space(jina ai in particular has this approach)
comment in response to post
Their code assumes an SQL db, would be quite a huge hack to move it to parquet so it could work with s3 :(
comment in response to post
Would be so nice to be able to run that under small web
comment in response to post
and they support vacuum for once :)
comment in response to post
nice, they made it pluggable. pure goodness! I also noticed from source they support encryption, which is awesome
comment in response to post
I just hope they dont make postgres a requirement and allow less performant tech like parquet-on-s3 for catalogs. In my ideal design one should be able to swap the catalog db for anything with transactional semantics
comment in response to post
I been working on a design a duckdb-based lake and both of the leading formats are harder to implement than what duckdb did
comment in response to post
Duck "cheats" by using postgres. It's going to beat the living crap out of iceberg in perf. Delta lake is a faster format, but still much slower due to s3 dance required
comment in response to post
i'm always surprised when people are happy to accept the 80% and let me focus on the next 80%, but they keep doing it
comment in response to post
There are rtsp proxies for this. Keep connection to youtube alive and swap a static image during off hours. Do not remember the name of proxy I used
comment in response to post
most ip cameras stream to youtube directly
comment in response to post
just reverse engineering their vague "run synchronous" wasm without worker instructions github.com/tarasglek/du... works on bun deno and node, but probably not on cf due to node apis and i cant figure out how to use the lib properly
comment in response to post
i have wasm ver loading in <500mb of js heap in synchronous mode so it should work on deno deploy
comment in response to post
you got access already? jelous!
comment in response to post
but 128mb ram on workers
comment in response to post
wonder who likes snap
comment in response to post
containers and desktop apps do not mix 😢
comment in response to post
Stack switching is the ultimate solution that I think would cover @flohofwoe.bsky.social game dev example. but JSPI is cool cos it's simpler
comment in response to post
i loved kde3. Been afraid of even trying it after plasma and widgets. I dont understand why stuff like this is even possible