Isn't this the theory behind the CDC File Transfer stuff Google did for Stadia? I remember seeing their demo util for opaque diff updates and going "wow! can't wait for that to get baked in" and then ???
Nah. MSTs would likely be computationally expensive (possibly define an optimal way to send the minimal set of changes) and less effective for handling small changes, which content defined chunking excels at.
if you use the length of the prefix of the hash as the level of the cutpoint instead of using a fixed length you get a content-defined tree out of it automatically
a little bonus of matching a variable length hash prefix is that you can make it more likely to get a chunk size closer to your target without reducing deduplication rate (like all the other schemes, not being position independent, do) by ignoring some cut points that would produce too small chunks.
One stream is incapable of fully utilizing a network link, especially over 1gb, bw limits allow for utilizing a link fully without completely saturating it. Modern data transfer tools like rclone, azcopy, fdt etc can do 100gb+ easily with configurable multi streaming.
Comments
https://www.dolthub.com/blog/2022-06-27-prolly-chunker/
You can use this to put a tree over KV data like MSTs, but you can also do this over arbitrary byte sequences.
What would you need multiple streams for?
Personally, I limit my copies because it chokes up all my network bandwidth. 🤷♂️
so for larger files, or repeated sections of files, you could benefit a little
Guess how much time it took me to understand the post 🤣