If anyone is interested in a deep read into random access performance + Parquet then we've put out a preprint: https://arxiv.org/abs/2504.15247

Parquet is not really as bad as it's reputation implies (though of course I am biased towards Lance 😛, and there are valid limitations in Parquet re: RAM)

Comments