Solr4.6.1配置與建立索引——搜索引擎學習（一）

分類：搜索引擎 Solr 2014-02-23 22:05 290人閱讀評論(0) 收藏舉報

一、 solr簡介

Solr是基于Lucene的全文搜索服務器，它對外提供類似于Web-service的API接口。用戶可以通過http請求，向搜索引擎服務器提交一定格式的XML文件，生成索引；也可以通過Http Get操作提出查找請求，并得到XML格式的返回結果。
簡而言之，Sorl是一個搜索引擎，我們可以發(fā)送文檔給它，讓它建立倒排索引（建立搜索源）；也可以發(fā)送查找請求，讓它以某種形式（JSON,XML等）返回結果（文檔列表）給你。

二、 Solr的配置
最近在本機配置了Solr4.6.1，主要參考了apache的API文檔。
配置方法如下：

本機環(huán)境 win7 tomcat6.0 jdk6u27
1. 下載Solr
http://mirror.bit.edu.cn/apache/lucene/solr/4.6.1
2. 部署進tomcat
先將將solr-4.6.1\example\webapps下的solr.war拷到tomcat下的webapps中，并將solr-4.6.1\example\lib中的jar包補充到tomcat的lib中。
3. 引入Core
在webapps\solr\下新建conf文件夾，并把solr-4.6.1\example\multicore目錄拷到conf下。
4. 編輯solr.xml
%TOMCAT_HOME%\conf\Catalina\localhost下新建solr.xml
內(nèi)容如下：
<?xml version="1.0" encoding="UTF-8"?>

<Context docBase="${catalina.home}/webapps/solr.war" debug="0" crossContext="true" >

<Environment name="solr/home" type="java.lang.String" value="${catalina.home}/webapps/solr/conf/multicore" override="true" />
</Context>
5. 這時啟動tomcat，應當可以正常訪問solr。

三、配置分詞算法

1. 下載你喜歡的分詞器

我下載的是：jcseg-1.9.2-src-jar-dict，下載之前需了解它是否支持solr相應的版本

2. 解壓并將目錄下的 jcseg-core-1.9.2.jar, jcseg-solr-1.9.2.jar, jcseg.properties,lexicon/ 復制到Solr的WEB-INF/lib下。

3.在solr\conf\multicore\core0\conf中的schema.xml添加如下配置（參考jcseg的文檔）：

[html] view plain copy print ?

<?xml version="1.0" ?>
<schema name="example core zero" version="1.1">
<types>
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>

[html] view plain copy print ?

<fieldtype name="textComplex" class="solr.TextField">
<analyzer>
<tokenizer class="org.lionsoul.jcseg.solr.JcsegTokenizerFactory" mode="complex"/>
</analyzer>
</fieldtype>
<fieldtype name="textSimple" class="solr.TextField">
<analyzer>
<tokenizer class="org.lionsoul.jcseg.solr.JcsegTokenizerFactory" mode="simple"/>
</analyzer>
</fieldtype>

[html] view plain copy print ?

</types>
<fields>
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="type" type="string" indexed="true" stored="true" multiValued="false" />
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="core0" type="string" indexed="true" stored="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="true"/>

[html] view plain copy print ?

<field name="simple" type="textSimple" indexed="true" stored="true" multiValued="true" />
<field name="complex" type="textComplex" indexed="true" stored="true" multiValued="true" />

[html] view plain copy print ?

</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>name</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
</schema>

4. 重啟tomcat，此時不應報任何錯誤。

5. 測試分詞效果

四、對數(shù)據(jù)庫中的數(shù)據(jù)建立倒排索引

1. 啟動本機的mysql數(shù)據(jù)庫

我新建test數(shù)據(jù)庫，并在其中新建test表，表有兩個字段，ID與Val。ID表示文檔編號，Val表示文檔內(nèi)容，這是一個最簡單的數(shù)據(jù)源。

2. 在Solr中配置數(shù)據(jù)源

在\webapps\solr\conf\multicore\core0\conf\db-data-config.xml中作如下配置：

[html] view plain copy print ?

<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/test" user="root" password="XXXXXX" />
<document name="messages">
<entity name="message" transformer="ClobTransformer" query="select * from test1">
<field column="ID" name="id" />
<field column="Val" name="complex" />
</entity>
</document>
</dataConfig>

此處的complex應與schema中的field name相對應。

3. 重啟tomcat，建索引：

4. 測試查詢：

我們此處選擇返回查詢結果列表的形式是JSON

至此，solr的最簡單的一次配置完成了。我們可以看出其中的數(shù)據(jù)源是怎么變?yōu)榈古潘饕?，實現(xiàn)快速查詢。企業(yè)或網(wǎng)站在數(shù)據(jù)量極大時，可以使用這種方式建立自己的搜索引擎。接下來我們可以讓Nutch和Solr配合，做自己的搜索引擎。

本站僅提供存儲服務，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權內(nèi)容，請點擊舉報。

打開APP，閱讀全文并永久保存查看更多類似文章

solr單機安裝使用介紹 V7.4.0

Solr配置文件

Apache Solr初體驗三

solr.in.action的indexing個人筆記

Tomcat on Windows

Solr搜索引擎搭建詳細過程

更多類似文章 >>

国产一级a片免费看高清,亚洲熟女中文字幕在线视频,黄三级高清在线播放,免费黄色视频在线看

Solr4.6.1配置與建立索引——搜索引擎學習（一）