選擇器是 jsoup 用來從 html 文檔中對元素進行過濾的強大工具。
先看下面一段例子:
下面是 jsoup 所支持的選擇器列表:
tagname
: find elements by tag, e.g. a
ns|tag
: find elements by tag in a namespace, e.g. fb|name
finds <fb:name>
elements #id
: find elements by ID, e.g. #logo
.class
: find elements by class name, e.g. .masthead
[attribute]
: elements with attribute, e.g. [href]
[^attr]
: elements with an attribute name prefix, e.g. [^data-]
finds elements with HTML5 dataset attributes [attr=value]
: elements with attribute value, e.g. [width=500]
[attr^=value]
, [attr$=value]
, [attr*=value]
: elements with attributes that start with, end with, or contain the value, e.g. [href*=/path/]
[attr=~regex
]: elements that have the attribute key, that its value matches the supplied regular expression; e.g. img[src~=(?i)\.(png|jpe?g)]
*
: all elements, e.g. *
el#id
: elements with ID, e.g. div#logo
el.class
: elements with class, e.g. div.masthead
el[attr]
: elements with attribute, e.g. a[href]
a[href].highlight
ancestor child
: child elements that descend from ancestor, e.g. .body p
finds p
elements anywhere under a block with class "body" parent > child
: child elements that descend directly from parent, e.g. div.content > p
finds p
elements; and body > *
finds the direct children of the body tag siblingA + siblingB
: finds sibling B element immediately preceded by sibling A, e.g. div.head + div
siblingA ~ siblingX
: finds sibling X element preceded by sibling A, e.g. h1 ~ p
el, el, el
: group multiple selectors, find unique elements that match any of the selectors; e.g. div.masthead, div.logo
el:lt(n)
: find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less than n
; e.g. td:lt(3)
el:gt(n)
: find elements whose sibling index is greater than n
; e.g. div p:gt(2)
el:eq(n)
: find elements whose sibling index is equal to n
; e.g. form input:eq(1)
el:has(seletor)
: find elements that contain elements matching the selector; e.g. div:has(p)
el:contains(text)
: find elements that contain the given text. The search is case-insensitive; e.g. p:contains(jsoup)
el:matches(regex)
: find elements whose text matches the specified regular expression; e.g. div:matches((?i)login)
. 相當(dāng)之強大,比我一直在用的 htmlparser 強多了。