westonpace.bsky.social
Software developer working on all things arrow and columnar storage, currently, Lance.
213 posts
103 followers
282 following
Regular Contributor
Active Commenter
comment in response to
post
Hmm, API dependency has a nice ring to it
comment in response to
post
Good point. CMakes handling of these matches what I was thinking.
comment in response to
post
E.g. this is why users of the `lance` crate need to match `lance`'s version of `arrow` exactly (though I'd like to move away from that at some point via the C data API)
comment in response to
post
If a crate wants to expose `bytes::Bytes` in its public API, for example, then either it needs to re-export `bytes` (and users can use `my_crate::Bytes`) or users need to have the exact same version as the crate.
comment in response to
post
Nope, not quite. Arrow is a common one for many of the projects I work with. In Rust we often see this with the `bytes` or `rand` crates.
comment in response to
post
In fact, I'm even struggling with syscall overhead now, so probably going to have to move away from `pread64` entirely.
comment in response to
post
We still have an abstraction layer so we can do some custom logic with local filesystem. I think the primary remaining issue, IIRC, is that `object_store` does "open file -> read range -> close file" on every get_range operation and the open/close overhead can be deadly.
comment in response to
post
This sounds like what I'd want too but I also really like working on a team so I'm not sure how I'd work that out.
comment in response to
post
Warehouse / lakehouse is rapidly becoming a distinction without a difference.
I think row / column, time-travel / mutable, emeddable / requires-separate-process are all more meaningful distinctions.
With "TCO at scale" and devex being the important differentiators.
comment in response to
post
We treat media (especially games) as a luxury good. Yet ethically we seem to think of it as a sort of common good like a utility.
Stealing a diamond ring (luxury) is theft. Stealing Andor is just basic participation in society.
comment in response to
post
I'm definitely biased but if you have any questions or feedback feel free to let me know 😅
comment in response to
post
Using SQL is interesting. You get free integrations with database-as-catalog but if the customer wants Unity or Polaris (or insert custom catalog here) then I imagine it would be kind of a PITA to map an SQL API on top of that.
comment in response to
post
So you're looking for the most fishlike document that isn't a fish?
comment in response to
post
I think this is optimal. Ideally also make sure to setup CI with a lockfile at the same version so I'm testing with this version too. But I'm usually too lazy to keep them in sync.
comment in response to
post
```
def left_outer(left, right, on):
for row_l in left:
has_match = False
for row_r in right:
if on(row_l, row_r):
emit(row_l, row_r)
has_match = True
if not has_match:
emit(row_l, None)
```
comment in response to
post
Maybe not the answer you are looking for, but the thing that really helped it all click for me was to implement (in pseudocode) each of them. E.g.
```
def inner(left, right, on):
for row_l in left:
for row_r in right:
if on(row_l, row_r):
emit(row_l, row_r)
```
comment in response to
post
Too much natural light, you might be tempted to go outside.
comment in response to
post
Just throw it in JSON
comment in response to
post
Do t use it at all for writing yet. Not sure I'd like that. I (maybe egotistically) like to think I have a unique voice (more humor, less formal) and don't want to give that up.
comment in response to
post
My most common use so far (beyond code gen) has been having it keep a list of tasks I need to do. Then at the end of the week I have it summarize my week and we build a new task list from what remains (and is still valid). Like an AI bujo
comment in response to
post
Don't stop them now. I'm so close to getting to 100.
comment in response to
post
Two more long weeks
comment in response to
post
Well, "readers" would probably have to be "a large circle of trust" (e.g. professors, postdocs, those who have submitted before, etc.) and not just "any random Internet person"
comment in response to
post
Lots of ideas I think could work. You could make the journal digital. Then accept 5x more papers into the journal with the criteria "this is on topic, appears to be done in a rigorous manner, and did not miss any major existing research". Then let readers vote on which papers go to conference.
comment in response to
post
What i would really love is a $10K electric Kei truck but this might be as good as it gets. Very awesome.
comment in response to
post
This resonates. But also, 30s have been a time for me to learn hot != desirable
comment in response to
post
Don't use the word "clearly". "Clearly, X" will be read as "X might not be true but that would be bad for me and I don't know how to prove X".
comment in response to
post
I look forward to the eventual name change to β
comment in response to
post
The biggest bottleneck is actually more around how I/O is scheduled than anything, but we don't dive into that in deep detail in the paper.
comment in response to
post
When you're I/O bound then I/O is really slow, even on an NVMe. The only real question comes down to whether you can still get good compression ratios with small page sizes and...you can.
comment in response to
post
I think it's a cat but I'm not sure if he knows.
comment in response to
post
Then you find it's a TB of data
comment in response to
post
In case it is helpful I've distilled my experiences with pin into a short doc. This was mostly an exercise for myself to start consolidating my knowledge around this so please don't worry if it isn't helpful.
github.com/westonpace/p...