国产一级a片免费看高清,亚洲熟女中文字幕在线视频,黄三级高清在线播放,免费黄色视频在线看

打開APP
userphoto
未登錄

開通VIP,暢享免費電子書等14項超值服

開通VIP
用Twitter的cursor方式進行Web數(shù)據(jù)分頁 – Tim[后端技術]

用Twitter的cursor方式進行Web數(shù)據(jù)分頁

本文討論Web應用中實現(xiàn)數(shù)據(jù)分頁功能,不同的技術實現(xiàn)方式的性能方區(qū)別。


上圖功能的技術實現(xiàn)方法拿MySQL來舉例就是

select * from msgs where thread_id = ? limit page * count, count

不過在看Twitter API的時候,我們卻發(fā)現(xiàn)不少接口使用cursor的方法,而不用page, count這樣直觀的形式,如followers ids 接口

URL:

http://twitter.com/followers/ids.format

Returns an array of numeric IDs for every user following thespecified user.

Parameters:
* cursor. Required. Breaks the results into pages. Provide a value of -1to begin paging. Provide values as returned to in the response body’snext_cursor and previous_cursor attributes to page back and forth in thelist.
o Example: http://twitter.com/followers/ids/barackobama.xml?cursor=-1
o Example:http://twitter.com/followers/ids/barackobama.xml?cursor=-1300794057949944903

http://twitter.com/followers/ids.format

從上面描述可以看到,http://twitter.com/followers/ids.xml這個調用需要傳cursor參數(shù)來進行分頁,而不是傳統(tǒng)的url?page=n&count=n的形式。這樣做有什么優(yōu)點呢?是否讓每個cursor保持一個當時數(shù)據(jù)集的鏡像?防止由于結果集實時改變而產生查詢結果有重復內容?
在Google Groups這篇CursorExpiration討論中Twitter的架構師JohnKalucki提到

A cursor is an opaque deletion-tolerant index into aBtree keyed by source
userid and modification time. It brings you to a point in time in the
reverse chron sorted list. So, since you can’t change the past, otherthan
erasing it, it’s effectively stable. (Modifications bubble to the top.)But
you have to deal with additions at the list head and also blockshrinkage
due to deletions, so your blocks begin to overlap quite a bit as thedata
ages. (If you cache cursors and read much later, you’ll see the firstfew
rows of cursor[n+1]’s block as duplicates of the last rows ofcursor[n]’s
block. The intersection cardinality is equal to the number of deletionsin
cursor[n]’s block). Still, there may be value in caching these cursorsand
then heuristically rebalancing them when the overlap proportion crossessome
threshold.

在另外一篇newcursor-based pagination not multithread-friendly中John又提到

The page based approach does not scale with large sets.We can no
longer support this kind of API without throwing a painful number of
503s.

Working with row-counts forces the data store to recount rows in an O
(n^2) manner. Cursors avoid this issue by allowing practically
constant time access to the next block. The cost becomes O(n/
block_size) which, yes, is O(n), but a graceful one given n < 10^7and
a block_size of 5000. The cursor approach provides a more complete and
consistent result set.

Proportionally, very few users require multiple page fetches with a
page size of 5,000.

Also, scraping the social graph repeatedly at high speed is could
often be considered a low-value, borderline abusive use of the social
graph API.

通過這兩段文字我們已經很清楚了,對于大結果集的數(shù)據(jù),使用cursor方式的目的主要是為了極大地提高性能。還是拿MySQL為例說明,比如翻頁到100,000條時,不用cursor,對應的SQL為

select * from msgs limit 100000, 100

在一個百萬記錄的表上,第一次執(zhí)行這條SQL需要5秒以上。
假定我們使用表的主鍵的值作為cursor_id, 使用cursor分頁方式對應的SQL可以優(yōu)化為

select * from msgs where id > cursor_id limit 100;

同樣的表中,通常只需要100ms以下, 效率會提高幾十倍。MySQL limit性能差別也可參看我3年前寫的一篇不成熟的文章 MySQLLIMIT 的性能問題。

結論

建議Web應用中大數(shù)據(jù)集翻頁可以采用這種cursor方式,不過此方法缺點是翻頁時必須連續(xù),不能跳頁。

4 Comments »

  1. pi1ot says:

    實際應用中問題一般是出在where和limit之間的status = pass或者其他篩選條件上,數(shù)據(jù)不連續(xù)cursor也就不那么靈光了

  2. fff says:

    ls應該指order by吧,當不是以id為序時
    跳頁可以加一次運算,取出合適cursor就可以了吧,相對可能還是簡單了

  3. 超群.com says:

    可以看一下我的一篇博客http://www.fuchaoqun.com/2009/04/efficient- pagination-using-mysql/

    既可用到cursor,亦可隨意翻頁。

  4. gen says:

    請問如果不是以主鍵id為排序,應該怎么做呢?

本站僅提供存儲服務,所有內容均由用戶發(fā)布,如發(fā)現(xiàn)有害或侵權內容,請點擊舉報。
打開APP,閱讀全文并永久保存 查看更多類似文章
猜你喜歡
類似文章
Linux下MySQL客戶端mysql,select大量數(shù)據(jù)時可以分頁
Twitter API中文文檔
APP后端分頁設計 | ScienJus's Blog
實例講解如何繞過網站驗證碼 | 程序師
千萬級數(shù)據(jù)查詢:CK、ES、RediSearch 誰才是王炸?
Elasticsearch如何做到億級數(shù)據(jù)查詢毫秒級返回?
更多類似文章 >>
生活服務
分享 收藏 導長圖 關注 下載文章
綁定賬號成功
后續(xù)可登錄賬號暢享VIP特權!
如果VIP功能使用有故障,
可點擊這里聯(lián)系客服!

聯(lián)系客服