Data Synchronization Service
- Tag: #designfeature
Q: Imagine you need to design dropbox client synchronization algorithm. How you can make it faster?
|[Use case] Miniumize the data scale||Allow end users to add a blacklist for folders to be skipped|
|[Use case] Simplify the workflow||If two clients in the same intranet, use P2P sync without centeral server|
|[Engineering] Only sync for changed files||Check files’ modified time; Use Merkle Tree to detect the difference|
|Reference||Quora: Why is Dropbox faster than other services?|
Q: When client sync with server, how it knows the changeset since previous sync?
Q: In terms of data sync, Pull vs Push model? And why?
Q: Design an algorithm to support “diff a.txt b.txt”? And what if two files are with binary format and as big as 50 GB? What if the second file is not local?
Q: Design an algorithm to support remote copy a big file? e.g, “rsync -avhze ssh src/big.dat user@remote-host:/tmp“?
|Web Pages||Link: Streaming File Synchronization by Dropbox|
|Web Pages||Link: Delta: A Data Synchronization and Enrichment Platform by Netflix|