苹果范冰冰佟大为浴室视频,动漫美女被强吻的视频

最近項(xiàng)目中用到了Lucene3.0,如下：

創(chuàng)建索引：

Java code

public void index() throws CorruptIndexException,LockObtainFailedException, IOException {// 索引目錄 File indexDir = new File("D:/workspace/code/java/TestLucene3/index/txt/test/");// 注意：這里建立索引用的分詞方法，在搜索時(shí)分詞也應(yīng)該采用同樣的分詞方法。不然搜索數(shù)據(jù)可能會(huì)不正確// 使用Lucene自帶分詞器 Analyzer luceneAnalyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);// 第一個(gè)參數(shù)是存放索引文件位置，第二個(gè)參數(shù)是使用的分詞方法，第三個(gè)：true，建立全新的索引，false,建立增量索引。// IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer, true);// 第一個(gè)參數(shù)是存放索引目錄有FSDirectory（存儲(chǔ)到磁盤上）和RAMDirectory（存儲(chǔ)到內(nèi)存中），第二個(gè)參數(shù)是使用的分詞器，第三個(gè)：true，建立全新的索引，false,建立增量索引，第四個(gè)是建立的索引的最大長(zhǎng)度。 IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir),luceneAnalyzer, true, IndexWriter.MaxFieldLength.LIMITED);// 索引合并因子// SetMergeFactor（合并因子）// SetMergeFactor是控制segment合并頻率的，其決定了一個(gè)索引塊中包括多少個(gè)文檔，當(dāng)硬盤上的索引塊達(dá)到多少時(shí)，// 將它們合并成一個(gè)較大的索引塊。當(dāng)MergeFactor值較大時(shí)，生成索引的速度較快。MergeFactor的默認(rèn)值是10，建議在建立索引前將其設(shè)置的大一些。 indexWriter.setMergeFactor(100);// SetMaxBufferedDocs（最大緩存文檔數(shù)）// SetMaxBufferedDocs是控制寫入一個(gè)新的segment前內(nèi)存中保存的document的數(shù)目，// 設(shè)置較大的數(shù)目可以加快建索引速度，默認(rèn)為10。 indexWriter.setMaxBufferedDocs(100);// SetMaxMergeDocs（最大合并文檔數(shù)）// SetMaxMergeDocs是控制一個(gè)segment中可以保存的最大document數(shù)目，值較小有利于追加索引的速度，默認(rèn)Integer.MAX_VALUE，無需修改。// 在創(chuàng)建大量數(shù)據(jù)的索引時(shí)，我們會(huì)發(fā)現(xiàn)索引過程的瓶頸在于大量的磁盤操作，如果內(nèi)存足夠大的話，// 我們應(yīng)當(dāng)盡量使用內(nèi)存，而非硬盤?？梢酝ㄟ^SetMaxBufferedDocs來調(diào)整，增大Lucene使用內(nèi)存的次數(shù)。 indexWriter.setMaxMergeDocs(1000);// SetUseCompoundFile這個(gè)方法可以使Lucene在創(chuàng)建索引庫(kù)時(shí)，會(huì)合并多個(gè) Segments 文件到一個(gè).cfs中。// 此方式有助于減少索引文件數(shù)量，對(duì)于將來搜索的效率有較大影響。// 壓縮存儲(chǔ)（True則為復(fù)合索引格式） indexWriter.setUseCompoundFile(true);long startTime = new Date().getTime();String temp = "";// 增加索引字段//// 在Field中有三個(gè)內(nèi)部類：Field.Index,Field.Store,Field.termVector，而構(gòu)造函數(shù)也用到了它們。// 參數(shù)說明：// Field.Store：// Field.Store.NO：表示該Ｆield不需要存儲(chǔ)。// Field.Store.Yes：表示該Ｆield需要存儲(chǔ)。// Field.Store.COMPRESS：表示使用壓縮方式來存儲(chǔ)。// Field.Index：// Field.Index.NO：表示該Ｆield不需要索引。// Field.Index.TOKENIZED：表示該Ｆield先被分詞再索引。// Field.Index.UN_TOKENIZED：表示不對(duì)該Ｆield進(jìn)行分詞，但要對(duì)其索引。// Field.Index.NO_NORMS：表示該Ｆield進(jìn)行索引，但是要對(duì)它用Analyzer，同時(shí)禁止它參加評(píng)分，主要是為了減少內(nèi)在的消耗。// TermVector這個(gè)參數(shù)也不常用，它有五個(gè)選項(xiàng)。// Field.TermVector.NO表示不索引Token的位置屬性；// Field.TermVector.WITH_OFFSETS表示額外索引Token的結(jié)束點(diǎn)；// Field.TermVector.WITH_POSITIONS表示額外索引Token的當(dāng)前位置；// Field.TermVector.WITH_POSITIONS_OFFSETS表示額外索引Token的當(dāng)前和結(jié)束位置；// Field.TermVector.YES則表示存儲(chǔ)向量。// 增加文檔 Field相當(dāng)于增加數(shù)據(jù)庫(kù)字段一樣檢索,獲取都需要的內(nèi)容,直接放index中,不過這樣會(huì)增大index,保存文件的txt內(nèi)容 /*** Field.Store 表示“是否存儲(chǔ)”，即該Field內(nèi)的信息是否要被原封不動(dòng)的保存在索引中。* Field.Index 表示“是否索引”，即在這個(gè)Field中的數(shù)據(jù)是否在將來檢索時(shí)需要被用戶檢索到，一個(gè)“不索引”的Field通常僅是提供輔助信息儲(chǔ)存的功能。* Field.TermVector 表示“是否切詞”，即在這個(gè)Field中的數(shù)據(jù)是否需要被切詞。*/Field fieldPath = new Field("path", "", Field.Store.YES, Field.Index.NO);Field fieldBody = new Field("content", temp, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);Field fieldId = new Field("id", "", Field.Store.YES, Field.Index.NOT_ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);Document document = new Document();// 做測(cè)試，循環(huán)100000遍建索引。也可以讀取文件內(nèi)容建索引 for (int i=0; i<100000; i++) {document = new Document();temp = "王熙鳳歷幻返金陵　甄應(yīng)嘉蒙恩還玉闕";fieldPath.setValue("D:\\workspace\\code\\java\\TestLucene3\\txt\\" + i + ".txt");fieldBody.setValue(temp);fieldId.setValue(String.valueOf(i));document.add(fieldPath);document.add(fieldBody);document.add(fieldId);indexWriter.addDocument(document);i++;}//optimize()方法是對(duì)索引進(jìn)行優(yōu)化 indexWriter.optimize();indexWriter.close();// 若需要從索引中刪除某一個(gè)或者某一類文檔，IndexReader提供了兩種方法：// reader.DeleteDocument(int docNum)// reader.DeleteDocuments(Term term)// 前者是根據(jù)文檔的編號(hào)來刪除該文檔，docNum是該文檔進(jìn)入索引時(shí)Lucene的編號(hào)，是按照順序編的；后者是刪除滿足某一個(gè)條件的多個(gè)文檔。// 在執(zhí)行了DeleteDocument或者DeleteDocuments方法后，系統(tǒng)會(huì)生成一個(gè)*.del的文件，該文件中記錄了刪除的文檔，但并未從物理上刪除這些文檔。此時(shí)，這些文檔是受保護(hù)的，當(dāng)使用Document// doc = reader.Document(i)來訪問這些受保護(hù)的文檔時(shí)，Lucene會(huì)報(bào)“Attempt to access a// deleted document”異常。如果一次需要?jiǎng)h除多個(gè)文檔時(shí)，可以用兩種方法來解決：// 1. 刪除一個(gè)文檔后，用IndexWriter的Optimize方法來優(yōu)化索引，這樣我們就可以繼續(xù)刪除另一個(gè)文檔。// 2. 先掃描整個(gè)索引文件，記錄下需要?jiǎng)h除的文檔在索引中的編號(hào)。然后，一次性調(diào)用DeleteDocument刪除這些文檔，再調(diào)用IndexWriter的Optimize方法來優(yōu)化索引。long endTime = new Date().getTime();System.out.println("\n這花費(fèi)了" + (endTime - startTime) + " 毫秒增加到索引!");}

查詢：

Java code

/*** 查詢** @param String word 關(guān)鍵詞* @param String filedName 域字段* @param String indexDir 索引位置* @throws CorruptIndexException* @throws IOException* @throws ParseException* @auther <a href="mailto:gaoxuguo@feinno.com">Gao XuGuo</a> Nov 30, 2009* 2:56:42 PM*/public List<Map<String, String>> search(String indexDir)throws CorruptIndexException, IOException, ParseException {File file = new File(indexDir);IndexSearcher is = new IndexSearcher(FSDirectory.open(file), true);String field = "content";BooleanQuery bq = new BooleanQuery();QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field,new StandardAnalyzer(Version.LUCENE_CURRENT));Query query = parser.parse("content:王熙鳳");Query q = new TermQuery(new Term("id","100"));bq.add(q,Occur.SHOULD);bq.add(query,Occur.SHOULD);// 100表示取前100條數(shù)據(jù) TopScoreDocCollector collector = TopScoreDocCollector.create(100, true);long start = new Date().getTime();// start time/*** Lucene內(nèi)置了三個(gè)Filter子類：* 1)DateFilter使搜索只限于指定的日期域的值在某一時(shí)間范圍內(nèi)的文檔空間里* 2)QueryFilter把查詢結(jié)果做為另一個(gè)新查詢可搜索的文檔空間* 3)CachingWrappperFilter是其他過濾器的裝飾器，將結(jié)果緩存起來以便再次使用，從而提高性能。**/String[] dirs = {indexDir};MultiSearcher ms = this.getMultiSearcher(dirs);ms.search(bq, collector);// is.search(bq, collector); ScoreDoc[] docs = collector.topDocs().scoreDocs;Document doc;for (ScoreDoc sd : docs) {doc = is.doc(sd.doc);// 取得doc里面的Field并從doc里面讀取值 for (Fieldable fa : doc.getFields()) {System.out.print(fa.name() + "=" + doc.get(fa.name()) + " ");}System.out.println();}long end = new Date().getTime();if(is != null) is.close();System.out.println("找到 " + collector.getTotalHits()+ " 條數(shù)據(jù)，花費(fèi)時(shí)間 " + (end - start)+ " 秒");return null;}

少發(fā)了一個(gè)方法：

Java code

/*** 得到MultiSearcher多目錄查詢實(shí)例** @param String[] dirs 要查詢的索引目錄。** @return MultiSearcher* @throws IOException* @auther <a href="mailto:gaoxuguo@feinno.com">Gao XuGuo</a>* Jan 22, 2010 3:44:16 PM*/private MultiSearcher getMultiSearcher(String[] dirs) throws IOException {// 多目錄 IndexSearcher [] searchers = new IndexSearcher[dirs.length];int i = 0;for (String dir : dirs) {searchers[i] = new IndexSearcher(FSDirectory.open(new File(dir)), true);i++;}// 多目錄查詢 return new MultiSearcher(searchers);}

本站僅提供存儲(chǔ)服務(wù)，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊舉報(bào)。

国产一级a片免费看高清,亚洲熟女中文字幕在线视频,黄三级高清在线播放,免费黄色视频在线看