本程序使用了Python自帶的HTMLParser,從Yahoo Finance指定頁面抓取幾個(gè)字段,代碼30行左右,簡(jiǎn)單實(shí)用,居家旅行必備 :)
代碼是從官方文檔的HTMLParser的示例程序改成的
https://docs.python.org/2/library/htmlparser.html
完整的代碼和介紹在:
https://bitbucket.org/lsz/html-parser
代碼如下:
import urllibimport sysimport stringfrom HTMLParser import HTMLParserticker_list = ["ibb", "socl", "pnqi", "qqq", "vbk", "eirl", "ewi", "pbd", "ita", "dfe"]ticker = ticker_list[0]class MyHTMLParser(HTMLParser): def handle_data(self, data): starttag_text = self.get_starttag_text() ticker_str = "(%s)" % ticker if -1!=string.find(data, ticker_str.upper()) and -1!=string.find(starttag_text, "<h2>"): sys.stdout.write(data) if -1!=string.find(str(starttag_text), "yfs_g53_%s" % ticker.lower()) and -1==string.find(data, "-"): sys.stdout.write("\t") sys.stdout.write(data) if -1!=string.find(str(starttag_text), "yfs_h53_%s" % ticker.lower()): print "\t", datafor t in ticker_list: ticker = t parser = MyHTMLParser() f = urllib.urlopen("http://finance.yahoo.com/q?s=%s" % ticker) html_string = f.read() parser.feed(html_string)
示例輸出:
iShares Nasdaq Biotechnology (IBB) 228.14 234.90Global X Social Media Index ETF (SOCL) 17.38 17.92PowerShares NASDAQ Internet (PNQI) 61.73 63.17PowerShares QQQ (QQQ) 87.31 88.15Vanguard Small Cap Growth ETF (VBK) 118.53 120.54iShares MSCI Ireland Capped (EIRL) 38.37 38.84iShares MSCI Italy Capped (EWI) 17.95 18.09PowerShares Global Clean Energy (PBD) 12.95 13.12 Defense (ITA) 107.93 109.36WisdomTree Europe SmallCap Dividend (DFE) 62.08 62.64
聯(lián)系客服