Why does @atproto.com use a Merkle Search Tree rather than a Prolly Tree? - ThreadSky

About ThreadSky

norman.life • 71 days ago

Why does @atproto.com use a Merkle Search Tree rather than a Prolly Tree?

Comments

IMHO, prolly trees optimize for the wrong thing.
What they're good at is targeting specific internal node size distributions, so that e.g. most nodes end up with, say, around 128 items, almost none with a single item and none with over 256.

This makes the trees much less degenerate.

matheus23.com•71 days ago

What this helps with is read speed: if you have a prolly tree, you're way more likely to cut down on possibilities when going through one internal node.

But you sacrifice write speed: with MSTs, you only modify the nodes between the leaf and the root.

matheus23.com•71 days ago

But with prolly trees, you might need to rechunk a *bunch* of nodes on the same level, so you end up amplifying the write much more.

matheus23.com•71 days ago

Additionally, because you end up modifying more nodes for each write on average, it's less likely that two similar trees share internal nodes, compared to MSTs (structural sharing is worse).

matheus23.com•71 days ago

So I'd say that prolly trees will have slightly slower writes, but faster reads and worse structural sharing.

I don't know if those reasons were the deciding factor for Bluesky, but what I'm trying to say is that prolly trees aren't "the better MSTs" that I've seen them portrayed before :)

norman.life•70 days ago

Yeah that’s the vibe I got too from the little I could find about it.

norman.life•70 days ago

Have you heard of https://g-trees.github.io/g_trees/?

matheus23.com•70 days ago

I have, but I didn't take a close look yet.

norman.life•70 days ago

What do you mean by degenerate?

matheus23.com•70 days ago

E.g. having internal nodes with only a single link.

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply