http://eprints.eemcs.utwente.nl/22286/01/imc140-drago.pdf
本文觀點基于以上paper
相信不是所有同學(xué)都了解Dropbox,先做一個簡單知識普及,Dropbox是一個提供同步本地文件的網(wǎng)絡(luò)存儲在線應(yīng)用。支持在多臺電腦多種操作中自動同步。并可當(dāng)作大容量的網(wǎng)絡(luò)硬盤使用。
在展開之前先回答一個問題,我們?yōu)槭裁匆P(guān)系Dropbox?隨著云計算框架越來越多的進(jìn)入開發(fā)者和用戶的事業(yè),對文件、數(shù)據(jù)同步傳輸?shù)囊笠苍絹碓蕉?,越來越高。我們有必要對行業(yè)內(nèi)比較流行的數(shù)據(jù)同步協(xié)議進(jìn)行分析和借鑒。
由于Dropbox不是公開協(xié)議,論文中采用了一個SSL攔截的方式對其進(jìn)行了分析。下面對幾個比較重要的知識點逐一記錄。
距離對Dropbox性能有顯著的影響
We highlight that Dropbox performance is mainly driven by the distance between clients and storage data-centers.
另外短數(shù)據(jù)傳輸加上一個perchunk確認(rèn)機(jī)制,非常影響吞吐
In addition, short data transfer sizes coupled with a perchunk acknowledgment mechanism impair transfer throughput, which is as little as 530kbits/s on average.
怎么分析STL/SSL傳輸
a Linux PC running the Dropbox client was instructed to use a Squid proxy server under our control. On the latter, the module SSL-bump4 was used to terminate SSL connections and save decrypted traffic flows. The memory area where the Dropbox application stores trusted certificate authorities
was modified at run-time to replace the original Dropbox Inc. certificate by the self-signed one signing the proxy server.
每一個上傳trunk都有一確認(rèn)消息
Each chunk store operation is acknowledged by one OK message.
Dropbox有三種控制協(xié)議
(i) Notification,(ii) meta-data administration, and (iii) system-log servers.
Notification Protocol
TCP長連到notifyX.dropbox.com,notification connection沒有加密。在這個長連的TCP上執(zhí)行HTTP Comet,即Long-Polling操作。
Meta-data Information Protocol
一個典型的同步過程從發(fā)送meta消息到meta數(shù)據(jù)服務(wù)器開始,后跟一批通過Amazon服務(wù)器進(jìn)行的store或retrieve操作。隨著數(shù)據(jù)塊被成功交換,客戶端發(fā)送消息到meta數(shù)據(jù)服務(wù)器來完成的交易。
同步協(xié)議容易造成小包的傳輸
(i) the synchronization protocol sending and receiving file deltas as soon as they are detected; (ii) the
primary use of Dropbox for synchronization of small files constantly changed, instead of periodic (large) backups.
通過分析發(fā)現(xiàn)TCP慢啟動和確認(rèn)對性能影響最大
Moreover, flows achieve lower throughput as the number of chunks increases. TCP start-up times and application-layer sequential acknowledgments are two major factors limiting the throughput, affecting flows with a small amount of data and flows with a large number of chunks, respectively. In both cases, the high RTT between clients and data-centers amplifies the effects.
Flows carrying a small amount of data are limited by TCP slow start-up times.
Flows with more than 1 chunk have the sequential acknowledgment scheme (Fig. 1) as a bottleneck, because the mechanism forces clients to wait one RTT (plus the server
reaction time) between two storage operations.
Flows with more than 50 chunks, for instance, always last for more than 30s, regardless of their sizes. Considering the RTT in Campus 2, up to one third of that (5-10s)
is wasted while application-layer acknowledgments are transiting the network.
最終給出了作者們的建議,如何來優(yōu)化Dropbox的傳輸
即:
1. 設(shè)置最小數(shù)據(jù)塊限制,減少大量小塊數(shù)據(jù)同步
2. 使用延遲確認(rèn),用pipeline方式減少順序確認(rèn)帶來的網(wǎng)絡(luò)空閑 Using delayed ack, pipelining chunks to remove the effects of sequential acknowledgments;
3. 存儲靠近用戶,減少傳輸延遲
Our measurements clearly indicate that the applicationlayer protocol in combination with large RTT penalizes the system performance. We identify three possible solutions to remove the identified bottlenecks:
1. Bundling smaller chunks, increasing the amount of data sent per storage operation. Dropbox announced in April 2012, implements a bundling mechanism, which is analyzed in the following;
2. Using a delayed acknowledgment scheme in storage operations, pipelining chunks to remove the effects of sequential acknowledgments;
3. Bringing storage servers closer to customers, thus improving the overall throughput.