一:什么是RSS
RSS(really simple syndication) :網(wǎng)頁內(nèi)容聚合器。RSS的格式是XML。必須符合XML 1.0規(guī)范。
RSS的作用:訂閱BLOG,訂閱新聞
二:RSS的歷史版本:
http://blogs.law.harvard.edu/tech/rssVersionHistory RSS的版本有很多個,0.90、0.91、0.92、0.93、0.94、1.0 和 2.0。與RSS相對的還有ATOM。
國內(nèi)主要是RSS2.0,國外主要用ATOM0.3.
由于RSS出現(xiàn)2派,導(dǎo)致混亂場面。其中RSS2.0規(guī)范由哈佛大學(xué)定義并鎖定。
地址:http://blogs.law.harvard.edu/tech/rss
三:RSS 文件形式
1:例子:
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>The channel's name goes here</title>
<link>http://www.urlofthechannel.com/</link>
<description>This channel is an example channel for an article.
</description>
<language>en-us</language>
<image>
<title>The image title goes here</title>
<url>http://www.urlofthechannel.com/images/logo.gif</url>
<link>http://www.urlofthechannel.com/</link>
</image>
<item>
<title>The Future of content</title>
<link>http://www.itworld.com/nl/ecom_in_act/11122003/</link>
<description> The issue of people distributing and reusing
digital media is a problem for many businesses. It may also be
a hidden opportunity. Just as open source licensing has opened
up new possibilities in the world of technology, it promises to do
the same in the area of creative content.</description>
</item>
<item>
<title>Online Music Services - Better than free?</title>
<link>http://www.itworld.com/nl/ecom_in_act/08202003/</link>
<description>More people than ever are downloading music from
the Internet. Many use person-to-person file sharing programs like
Kazaa to share and download music in MP3 format, paying nothing.
This has made it difficult for companies to setup online music
businesses. How can companies compete against free?</description>
</item>
</channel>
</rss>
2:RSS文件由一個 <channel> 元素及其子元素組成。除了頻道內(nèi)容本身之外,<channel>
還以項(xiàng)的形式包含表示頻道元數(shù)據(jù)的元素 —— 比如 <title>、<link> 和 <description>。
項(xiàng)通常是頻道的主要部分,包含經(jīng)常變化的內(nèi)容。
3:頻道(channel)用<channel>表示
頻道一般有三個元素,提供關(guān)于頻道本身的信息:
<title>:頻道或提要的名稱。
<link>:與該頻道關(guān)聯(lián)的 Web 站點(diǎn)或者站點(diǎn)區(qū)域的 URL。
<description>:簡要介紹該頻道是做什么的。
許多頻道子元素都是可選的。常用的 <image> 元素包含三個必需的子元素:
<url>:表示該頻道的 GIF、JPEG 或 PNG 圖像的 URL。
<title>:圖象的描述。當(dāng)頻道以 HTML 呈現(xiàn)時,用作 HTML <image> 標(biāo)簽的 ALT 屬性。
<link>:站點(diǎn)的 URL。如果頻道以 HTML 呈現(xiàn),該圖像作為到這個站點(diǎn)的鏈接。
<image> 還有三個可選的子元素:
<width>:數(shù)字,表示圖象的像素寬度,最大值是 188,默認(rèn)值為 88。
<height>:數(shù)字,表示圖象的像素高度。最大值是 400,默認(rèn)值為 31。
<description>:包含文本,在呈現(xiàn)時可以作為圍繞著該圖像形成的鏈接元素的 title 屬性。
此外還可以使用許多其他可選的頻道元素。多數(shù)都是不言自明的:
<language>:en-us
<copyright>:Copyright 2003, James Lewin
<managingEditor>:
dan@spam_me.com (Dan Deletekey)
<webMaster>:
dan@spam_me.com (Dan Deletekey)
<pubDate>:Sat, 15 Nov 2003 0:00:01 GMT
<lastBuildDate>:Sat, 15 Nov 2003 0:00:01 GMT
<category>:ebusiness
<generator>:Your CMS 2.0
<docs>:
http://blogs.law.harvard.edu/tech/rss <cloud>:允許進(jìn)程注冊為“cloud”,頻道更新時通知它,為 RSS 提要實(shí)現(xiàn)了一種輕量級的發(fā)布-訂閱協(xié)議。
<ttl>:存活時間 是一個數(shù)字,表示提要在刷新之前緩沖的分鐘數(shù)。
<rating>:關(guān)于該頻道的 PICS 評價。
<textInput>:定義可與頻道一起顯示的輸入框。
<skipHours>:告訴聚集器哪些小時的更新可以忽略。
<skipDays>:告訴聚集器那一天的更新可以忽略。
4:摘要(feed)用<item>表示,<item>的格式如下:
每個摘要通常包含三個元素:
<title>:這是項(xiàng)的名稱,在標(biāo)準(zhǔn)應(yīng)用中被轉(zhuǎn)換成 HTML 中的標(biāo)題。
<link>:這是該項(xiàng)的 URL。title 通常作為一個鏈接,指向包含在 <link> 元素中的 URL。
<description>:通常作為 link 中所指向的 URL 的摘要或者補(bǔ)充。
所有的元素都是可選的,但是一個項(xiàng)至少要么 包含一個 <title>,要么包含一個 <description>。
項(xiàng)還有其他一些可選的元素:
<author>:作者的 e-mail 地址。
<category>:支持有組織的記錄。
<comments>:關(guān)于項(xiàng)的注釋頁的 URL。
<enclosure>:支持和該項(xiàng)有關(guān)的媒體對象。
<guid>:唯一與該項(xiàng)聯(lián)系在一起的永久性鏈接。
<pubDate>:該項(xiàng)是什么時候發(fā)布的。
<source>:該項(xiàng)來自哪個 RSS 頻道,當(dāng)
四:主流java rss lib及其評測:
主要有一下幾種:
1:Rome:
http://wiki.java.net/bin/view/Javawsxml/Rome Rome是 java.net 上的一個開源項(xiàng)目,現(xiàn)在的版本是0.5。為什么叫Rome呢,按它的介紹上的說法,有個“條條大路通羅馬”的意思,有些RSS的意味。Rome可能是 sun 公司從自己某個子項(xiàng)目中抽離出來的,package和類的命名就象j2sdk一樣感覺規(guī)范。功能上支持RSS的所有版本及 Atom 0.3(Atom是和RSS類似的一種內(nèi)容聚合的方式)。Rome 本身是提供API和功能實(shí)現(xiàn).
2:rssutils:
http://gceclub.sun.com.cn/staticcontent/html/2004-04-22/rss.html rssutils是一個工具包,sun 的 develope站點(diǎn)上有文章 RSS Utilities: A Tutorial 專門介紹用taglib 顯示RSS內(nèi)容,附帶的可以下載這個工具包,但我從網(wǎng)上搜索不到它的出處,自然也無法看到它的源碼。但從反編譯的代碼來看,也是sun公司內(nèi)部高手所做,設(shè)計(jì)精巧,代碼簡練。實(shí)現(xiàn)一個handler,用sax的方式解析xml內(nèi)容,handler內(nèi)部用反射和javabean的機(jī)制構(gòu)造RSS元素對象并賦值。
3rsslib4j:
http://sourceforge.net/projects/rsslib4j rsslib4j 是 sourceforget 上的項(xiàng)目,同樣支持所有RSS版本。
4:rsslibj:http://enigmastation.com/rsslibj/
5:總結(jié)
Rome:
優(yōu) - 1)可擴(kuò)展性好,有前途。2)功能強(qiáng)大,除了用來解析RSS,還可以聚合和構(gòu)造RSS。
劣 - 1)兼容性待加強(qiáng),2)綁定jdom。
rssutils:
優(yōu) - 1)代碼設(shè)計(jì)精妙,值得學(xué)習(xí)。2)附帶 taglib 實(shí)現(xiàn),直接可在 jsp 中應(yīng)用。
劣 - 1)沒有源碼。 2)兼容性有待加強(qiáng)。 3)功能較弱,只能用來解析RSS,沒有聚合和構(gòu)造RSS功能。
rsslib4j:
優(yōu) - 1)簡單有效,體積小。2)兼容性不錯。
劣 - 1)有小bug。2)功能較弱,只能用來解析RSS,沒有聚合和構(gòu)造RSS功能。
rsslibj:
優(yōu) - 1)簡單有效,體積小,才25K。2)能解析和生成RSS(動態(tài)和靜態(tài))
劣 - 1)有小bug。2)版本很久沒有更新了,陳舊.
五:選擇ROME作為RSS實(shí)現(xiàn)工具
在官網(wǎng)
http://wiki.java.net/bin/view/Javawsxml/Rome下載rome-0.8.jar,
rome用到了jdom1.0,下載地址:http://www.jdom.org
rome支持:rss_0.9
rss_0.91
rss_0.92
rss_0.93
rss_0.94
rss_1.0
rss_2.0
atom_0.3
atom_1.0
生成RSS類新需要在程序中指定,如:rss_2.0
六:包結(jié)構(gòu)
com.sun.syndication.feed 提供RSS and Atom beans的父類
com.sun.syndication.feed.atom 提供實(shí)現(xiàn)Atom feeds核心元素的beans
com.sun.syndication.feed.module 提供處理聚合modules的beans
com.sun.syndication.feed.rss 提供實(shí)現(xiàn)Rss feeds核心元素的beans
com.sun.syndication.feed.synd 我們主要用的就是這個包,SyndFeed and SyndEntryImpl
com.sun.syndication.io 提供對讀取和分析feeds的輸入和輸出
七:實(shí)例:
1:讀取遠(yuǎn)端url的rss,然后輸出到控制臺:
/**
* 關(guān)鍵代碼:
* SyndFeedInput input = new SyndFeedInput();
* SyndFeed feed = input.build(new XmlReader(feedUrl));
*/
package com.sun.syndication.samples;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.XmlReader;
import java.net.URL;
/**
* It Reads and prints any RSS/Atom feed type.
*/
public class FeedReader {
public static void main(String[] args) {
boolean ok = false;
if (args.length==0) {
try {
URL feedUrl = new URL("
http://seu.org.cn/bbs/rss.php");
//SyndFeedInput:從遠(yuǎn)程讀到xml結(jié)構(gòu)的內(nèi)容轉(zhuǎn)成SyndFeedImpl實(shí)例
SyndFeedInput input = new SyndFeedInput();
//rome按SyndFeed類型生成rss和atom的實(shí)例,
//SyndFeed是rss和atom實(shí)現(xiàn)類SyndFeedImpl的接口
SyndFeed feed = input.build(new XmlReader(feedUrl));
//打印到控制臺
System.out.println(feed);
ok = true;
}
catch (Exception ex) {
ex.printStackTrace();
System.out.println("ERROR: "+ex.getMessage());
}
}
if (!ok) {
System.out.println();
System.out.println("FeedReader reads and prints any RSS/Atom feed type.");
System.out.println("The first parameter must be the URL of the feed to read.");
System.out.println();
}
}
}
2:將多個遠(yuǎn)程RSS在本地聚集成一個RSS
package com.sun.syndication.samples;
import java.net.URL;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.util.List;
import java.util.ArrayList;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.feed.synd.SyndFeedImpl;
import com.sun.syndication.io.SyndFeedOutput;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.XmlReader;
/**
* It aggregates a list of RSS/Atom feeds (they can be of different types)
* into a single feed of the specified type.
* <p>
* @author Alejandro Abdelnur
*
*/
public class FeedAggregator {
public static void main(String[] args) {
boolean ok = false;
if (args.length>=2) {
try {
String outputType = args[0];
SyndFeed feed = new SyndFeedImpl();
feed.setFeedType(outputType);
feed.setTitle("Aggregated Feed");
feed.setDescription("Anonymous Aggregated Feed");
feed.setAuthor("anonymous");
feed.setLink("
http://www.anonymous.com");
List entries = new ArrayList();
feed.setEntries(entries);
for (int i=1;i<args.length;i++) {
URL inputUrl = new URL(args[i]);
SyndFeedInput input = new SyndFeedInput();
SyndFeed inFeed = input.build(new XmlReader(inputUrl));
entries.addAll(inFeed.getEntries());
&nbp; }
SyndFeedOutput output = new SyndFeedOutput();
output.output(feed,new PrintWriter(System.out));
ok = true;
}
catch (Exception ex) {
System.out.println("ERROR: "+ex.getMessage());
}
}
if (!ok) {
System.out.println();
System.out.println("FeedAggregator aggregates different feeds into a single one.");
System.out.println("The first parameter must be the feed type for the aggregated feed.");
System.out.println(" [valid values are: rss_0.9, rss_0.91U, rss_0.91N, rss_0.92, rss_0.93, ]");
System.out.println(" [ rss_0.94, rss_1.0, rss_2.0 & atom_0.3 ]");
System.out.println("The second to last parameters are the URLs of feeds to aggregate.");
System.out.println();
}
}
}
3:將動態(tài)生成的RSS存盤,形成靜態(tài)RSS
package com.sun.syndication.samples;
import com.sun.syndication.feed.synd.*;
import com.sun.syndication.io.SyndFeedOutput;
import java.io.FileWriter;
import java.io.Writer;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.List;
/**
* It creates a feed and writes it to a file.
* <p>
* @author Alejandro Abdelnur
*
*/
public class FeedWriter {
private static final String DATE_FORMAT = "yyyy-MM-dd";
public static void main(String[] args) {
boolean ok = false;
if (args.length==0) {
try {
String feedType = "rss_2.0";//指定rss類型
String fileName = "F:\\ss.xml";//靜態(tài)rss存放目錄
DateFormat dateParser = new SimpleDateFormat(DATE_FORMAT);
//feed是通過SyndFeedImpl的實(shí)例
SyndFeed feed = new SyndFeedImpl();
feed.setFeedType(feedType);
feed.setTitle("Sample Feed (created with Rome)");
feed.setLink("
http://rome.dev.java.net");
feed.setDescription("This feed has been created using Rome (Java syndication utilities");
//entries就是item集合
List entries = new ArrayList();
//一個entry就是一個item
SyndEntry entry;
SyndContent description;
//第一個item
entry = new SyndEntryImpl();
entry.setTitle("Rome v1.0");
entry.setLink("
http://wiki.java.net/bin/view/Javawsxml/Rome01");
entry.setPublishedDate(dateParser.parse("2004-06-08"));
description = new SyndContentImpl();
description.setType("text/plain");
description.setValue("Initial release of Rome");
entry.setDescription(description);
entries.add(entry);
//第二個item
entry = new SyndEntryImpl();
entry.setTitle("Rome v2.0");
entry.setLink("
http://wiki.java.net/bin/view/Javawsxml/Rome02");
entry.setPublishedDate(dateParser.parse("2004-06-16"));
description = new SyndContentImpl();
description.setType("text/xml");
description.setValue("Bug fixes, <xml>XML</xml> minor API changes and some new features");
entry.setDescription(description);
entries.add(entry);
Writer writer = new FileWriter(fileName);
SyndFeedOutput output = new SyndFeedOutput();
//存盤,形成靜態(tài)rss
output.output(feed,writer);
writer.close();
System.out.println("The feed has been written to the file ["+fileName+"]");
System.out.println(feed);
ok = true;
}
catch (Exception ex) {
ex.printStackTrace();
System.out.println("ERROR: "+ex.getMessage());
}
}
if (!ok) {
System.out.println();
System.out.println("FeedWriter creates a RSS/Atom feed and writes it to a file.");
System.out.println("The first parameter must be the syndication format for the feed");
System.out.println(" (rss_0.90, rss_0.91, rss_0.92, rss_0.93, rss_0.94, rss_1.0 rss_2.0 or atom_0.3)");
System.out.println("The second parameter must be the file name for the feed");
System.out.println();
}
}
}
4:動態(tài)生成rss,給一個blog站點(diǎn)動態(tài)生成rss
package com.vaga.rss.web.admin;
import java.io.IOException;
import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import org.springframework.web.servlet.ModelAndView;
import org.springframework.web.servlet.mvc.ParameterizableViewController;
import com.sun.syndication.feed.synd.SyndContent;
import com.sun.syndication.feed.synd.SyndContentImpl;
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndEntryImpl;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.feed.synd.SyndFeedImpl;
import com.sun.syndication.io.FeedException;
import com.sun.syndication.io.SyndFeedOutput;
import com.sun.syndication.feed.synd.SyndContent;
import com.sun.syndication.feed.synd.SyndEntry;
import com.totsp.xml.syndication.content.ContentModule;
import com.vaga.blog.model.WeblogEntry;
import com.vaga.blog.model.Website;
import com.vaga.blog.service.WeblogEntryManager;
import com.vaga.blog.service.WebsiteManager;
public class SiteRssViewController extends ParameterizableViewController {
// Constants
/** Namespace URI for content:encoded elements */
private static final String CONTENT_NS ="
http://purl.org/rss/1.0/modules/content/";
private static final String FEED_TYPE = "type";
private static final String MIME_TYPE = "application/xml; charset=UTF-8";
private static final String COULD_NOT_GENERATE_FEED_ERROR = "Could not generate feed";
private static final String _defaultFeedType="rss_2.0";
private static final String DATE_FORMAT = "yyyy-MM-dd";
//controller starts
private WeblogEntryManager weblogEntryManager;//spring依賴注入
private WebsiteManager websiteManager; //spring依賴注入
//spring依賴注入
public void setWeblogEntryManager(WeblogEntryManager weblogEntryManager) {
this.weblogEntryManager = weblogEntryManager;
}
//spring依賴注入
public void setWebsiteManager(WebsiteManager websiteManager) {
this.websiteManager = websiteManager;
}
protected ModelAndView handleRequestInternal(HttpServletRequest request,HttpServletResponse response) throws Exception {
try {
SyndFeed feed = getFeed(request);
String feedType = request.getParameter(FEED_TYPE);//null
feedType = (feedType!=null) ? feedType : _defaultFeedType;
feed.setFeedType(feedType);//rss_2.0
response.setContentType(MIME_TYPE);
SyndFeedOutput output = new SyndFeedOutput();
output.output(feed,response.getWriter());//向發(fā)出請求的用戶輸出該RSS(xml格式)
}
catch (FedException ex) {
String msg = COULD_NOT_GENERATE_FEED_ERROR;
log(msg,ex);
response.sendError(HttpServletResponse.SC_INTERNAL_SERVER_ERROR,msg);
}
return null;
}
/**
* 請求的類型如下:
* siteRss.htm?websiteId=21 |ID=66的個人站點(diǎn)最新20條文章
* siteRss.htm?websiteId=21&entryType=hot |ID=66的個人站點(diǎn)最熱20條文章
*
* @param request
*/
protected SyndFeed getFeed(HttpServletRequest request) throws IOException,FeedException {
DateFormat dateParser = new SimpleDateFormat(DATE_FORMAT);
//feed就是channel
SyndFeed feed = new SyndFeedImpl();
//item集合
List entries = new ArrayList();
//一個entry就是代表一個item
SyndEntry entry;
SyndContent description;
setFeed(request,feed);
Iterator iterator = setIterator(request);
//將文章的20記錄轉(zhuǎn)成20個item
while(iterator.hasNext()){
entry = new SyndEntryImpl();
WeblogEntry weblogEntry = (WeblogEntry)iterator.next();
entry.setTitle(weblogEntry.getTitle());
entry.setLink(feed.getLink()+"?weblogEntryId="+weblogEntry.getId());
try {
entry.setPublishedDate(dateParser.parse(weblogEntry.getPubTime().toString()));
}
catch (ParseException ex) {
ex.printStackTrace();
}
//該item的description
description = new SyndContentImpl();
description.setType("text/plain");
String text=null;
if(weblogEntry.getText().length()>500){
text = weblogEntry.getText().substring(0, 500);
}else{
text = weblogEntry.getText();
}
description.setValue(text);
entry.setDescription(description);
addFooter(entry);
entries.add(entry);
}
//將所有的item存入channel
feed.setEntries(entries);
return feed;
}
private SyndFeed setFeed(HttpServletRequest request,SyndFeed feed){
//blog中的website
Website website = websiteManager.getWebsite(request.getParameter("websiteId"));
設(shè)置當(dāng)前website的channel屬性
feed.setTitle(website.getName());
feed.setAuthor(website.getCreator());
feed.setCopyright(website.getEmailAddress());
feed.setLink("
http://wxz.vaga.com.cn:8080/blog/weblog/"+website.getHandle());
feed.setDescription(website.getDescription());
return feed;
}
//從數(shù)據(jù)庫中獲得20條該website的文章
private Iterator setIterator(HttpServletRequest request){
if(request.getParameter("entryType")==null){
return weblogEntryManager.getRecentWeblogEntriesForRss(request.getParameter("websiteId"), null, "PUBLISHED", 21).iterator();
}else{
return weblogEntryManager.getHotWeblogEntriesForRss(request.getParameter("websiteId"), null, 21).iterator();
}
}
/**
* Add footer to an entry.給每個文章摘要添加頁腳
* @param entry
*/
public static void addFooter(SyndEntry entry)
{
// Prep variables used in loops
String title = entry.getTitle();
String link = entry.getLink();
// Use the add-on ContentModule to handle
// <content:encoded/> elments within the feed
ContentModule module =((ContentModule) entry.getModule(CONTENT_NS));
// If content:encoded is found, use that.
if(module!=null)
{
// Container for footer-appended HTML strings
List newStringList = new ArrayList();
// Iterate through encoded HTML, creating footers
Iterator oldStringIter =module.getEncodeds().iterator();
while (oldStringIter.hasNext())
{
String original = (String) oldStringIter.next();
newStringList.add(createFooter(original,link, title));
}
// Set new encoded HTML strings on entry
module.setEncodeds(newStringList);
}
else
{
// Fall back to adding footer in <description/>
// This results in escaped HTML. Ugly, but common.
//Target the description node
SyndContent content = entry.getDescription();
// Create and set a footer-appended description
String original = content.getValue();
content.setValue(createFooter(original,link, title));
}
}
/**
* Create a feed item footer of immediate actions
* by using information from the feed item itself
* @param original The original text of the feed item
* @param link The link for the feed item
* @param title The title of the feed item
* @return
*/
private static String createFooter(String original, String link,String title)
{
// Use StringBuffer to create a sb
StringBuffer sb;
if(original==null){
sb=new StringBuffer("<br />");
}else{
sb= new StringBuffer(original);
}
sb.append("\n\n<div class='feedwarmer'><hr/>");
sb.append("<i>相關(guān)操作:</i> ");
// Add email link using title and item link
sb.append("<a href='mailto:?body=Check this out: ");
sb.append(link).append("'>推薦該鏈接</a> | ");
// Add delicious link using item title link
sb.append("<a );
sb.append(link).append("&title=").append(title);
sb.append("'>添加到delicious</a> | ");
// Add Google Blogs Search link using item title
sb.append("<a );
sb.append("blogsearch?hl=en&q=").append(title);
sb.append("'>搜索相關(guān)內(nèi)容</a>");
// Finish and return the sb
sb.append("</div>\n");
return sb.toString();
}
}