當(dāng)前有很多分布式系統(tǒng)都采用了不同方法來(lái)生成squence num,其中UUID是比較費(fèi)力氣和費(fèi)空間的方法.在分配squence num時(shí)候,其實(shí)為了達(dá)到數(shù)據(jù)的分布和均衡效果,是應(yīng)該把squence num分配給client?,F(xiàn)在介紹兩種其他生成squence num的方法:
1、通過(guò)config server(這個(gè)是單獨(dú)的管理元數(shù)據(jù)的服務(wù)器,不同系統(tǒng)叫法不同,有的叫master server,有的叫root server)來(lái)協(xié)調(diào)生成。
首先該需要生成的squence num的字段需要注冊(cè)到config server ,然后client需要使用該字段時(shí)候,進(jìn)行insert操作,需要去向config server 去申請(qǐng),每次申請(qǐng)一定的步長(zhǎng)(比如申 請(qǐng)1-100)。當(dāng)client用完了這個(gè)步長(zhǎng),就需要再去申請(qǐng)。
這個(gè)方法優(yōu)點(diǎn)是統(tǒng)一管理,比較簡(jiǎn)單,一般不會(huì)失敗。但缺點(diǎn)是,當(dāng)client把一個(gè)步長(zhǎng)用完后,去申請(qǐng)新的步長(zhǎng),config server不能down,不適合在正常運(yùn)轉(zhuǎn)中,可以沒(méi)有config server, 系統(tǒng)繼續(xù)良好運(yùn)轉(zhuǎn)的條件。
2、通過(guò)data server自己的管理
就是在squence num也像表里一個(gè)字段,client搶占的來(lái)更新,就是說(shuō)比如有三個(gè)data server ,可以把1-100定義好分配給S1, 101-200分配給S2,201-300分配給S3,然后 多個(gè)client比如去搶占S1,就把1-100分配給其中一臺(tái)機(jī)器,其他的就是失敗的,下次再來(lái)向S1申請(qǐng),就是申請(qǐng)301-400給client.
這樣的方法優(yōu)點(diǎn)是可以當(dāng)作普通的表的字段來(lái)處理,但缺點(diǎn)是擴(kuò)展難,需要解決單點(diǎn)問(wèn)題
3、簡(jiǎn)單的數(shù)據(jù)庫(kù)自增長(zhǎng)列生成
4、twitter的做法http://engineering.twitter.com/2010/06/announcing-snowflake.html
Problem
We currently use MySQL to store most of our online data. In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters. For various reasons, the details of which merit a whole blog post, we’re working to replace many of these systems with the Cassandra distributed database or horizontally sharded MySQL (using gizzard).
Unlike MySQL, Cassandra has no built-in way of generating unique ids – nor should it, since at the scale where Cassandra becomes interesting, it would be difficult to provide a one-size-fits-all solution for ids. Same goes for sharded MySQL.
Our requirements for this system were pretty simple, yet demanding:
We needed something that could generate tens of thousands of ids per second in a highly available manner. This naturally led us to choose an uncoordinated approach.
These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how we and most Twitter clients sort tweets.[1]
Additionally, these numbers have to fit into 64 bits. We’ve been through the painful process of growing the number of bits used to store tweet ids before. It’s unsurprisingly hard to do when you have over 100,000 different codebases involved.
Options
We considered a number of approaches: MySQL-based ticket servers (like flickr uses), but those didn’t give us the ordering guarantees we needed without building some sort of re-syncing routine. We also considered various UUIDs, but all the schemes we could find required 128 bits. After that we looked at Zookeeper sequential nodes, but were unable to get the performance characteristics we needed and we feared that the coordinated approach would lower our availability for no real payoff.
Solution
To generate the roughly-sorted 64 bit ids in an uncoordinated manner, we settled on a composition of: timestamp, worker number and sequence number.
Sequence numbers are per-thread and worker numbers are chosen at startup via zookeeper (though that’s overridable via a config file).
We encourage you to peruse and play with the code: you’ll find it on github. Please remember, however, that it is currently alpha-quality software that we aren’t yet running in production and is very likely to change.
Feedback
If you find bugs, please report them on github. If you are having trouble understanding something, come ask in the #twinfra IRC channel on freenode. If you find anything that you think may be a security problem, please email security@twitter.com (and cc myself: ryan@twitter.com).
[1] In mathematical terms, although the tweets will no longer be sorted, they will be k-sorted. We’re aiming to keep our k below 1 second, meaning that tweets posted within a second of one another will be within a second of one another in the id space too.
聯(lián)系客服