Data Synchronization Service

Similar Posts:

Q: Imagine you need to design dropbox client synchronization algorithm. How you can make it faster?


Name Summary
[Use case] Miniumize the data scale Allow end users to add a blacklist for folders to be skipped
[Use case] Simplify the workflow If two clients in the same intranet, use P2P sync without centeral server
[Engineering] Only sync for changed files Check files’ modified time; Use Merkle Tree to detect the difference
Reference Quora: Why is Dropbox faster than other services?

Link: Dropbox Streaming File Synchronization

Q: When client sync with server, how it knows the changeset since previous sync?

Q: In terms of data sync, Pull vs Push model? And why?

Q: Design an algorithm to support “diff a.txt b.txt”? And what if two files are with binary format and as big as 50 GB? What if the second file is not local?

A: #lcs (Longest common subsequence)

Q: Design an algorithm to support remote copy a big file? e.g, “rsync -avhze ssh src/big.dat user@remote-host:/tmp“?


Name Summary
Web Pages Link: Streaming File Synchronization by Dropbox
Web Pages Link: Delta: A Data Synchronization and Enrichment Platform by Netflix

Share It, If You Like It.

Leave a Reply

Your email address will not be published. Required fields are marked *