Solr整合Ansj中文分詞器

2023-03-05 17:37:49

　Ansj的使用和相關資料下載下傳參考：http://iamyida.iteye.com/blog/2220833

　　參考 http://www.cnblogs.com/luxh/p/5016894.html 配置和solr和tomcat的

1、從http://iamyida.iteye.com/blog/2220833下載下傳好Ansj需要的相關的資料，下面是已下載下傳好的。

Ansj資料： http://pan.baidu.com/s/1kTLGp7L

Solr整合Ansj中文分詞器

2、複制ansj相關檔案到solr項目中

　　1）将ansj_seg-2.0.8.jar、nlp-lang-0.2.jar和solr-analyzer-ansj-5.1.0.jar放到solr項目中

　　　　放置目錄：/luxh/solr/apache-tomcat-8.0.29/webapps/solr/WEB-INF/lib

　　2）将library.properties、libary目錄和stopwords目錄放置到solr項目中

放置目錄：

[[email protected] classes]# pwd
/luxh/solr/apache-tomcat-8.0.29/webapps/solr/WEB-INF/classes
[[email protected] classes]# ls
library  library.properties  log4j.properties  stopwords
[[email protected] classes]#

　　3）配置library.properties

　　　按照自己的實際路徑配置。

Solr整合Ansj中文分詞器

[[email protected] classes]# vi library.properties 
#redress dic file path
ambiguityLibrary=/luxh/solr/apache-tomcat-8.0.29/webapps/solr/WEB-INF/classes/library/ambiguity.dic
#path of userLibrary this is default library
userLibrary=/luxh/solr/apache-tomcat-8.0.29/webapps/solr/WEB-INF/classes/library
#set real name
isRealName=true

Solr整合Ansj中文分詞器

3、在solr_home下建立一個collection

　　1）建立一個collection叫collection1

[[email protected] solr_home]# pwd
/luxh/solr/solr_home
[[email protected] solr_home]# mkdir collection1

　　2）拷貝/solr-5.3.1/server/solr/configsets/basic_configs下的内容到建立的collection1中

[[email protected] basic_configs]# pwd
/luxh/solr/solr-5.3.1/server/solr/configsets/basic_configs
[[email protected] basic_configs]# cp -r ./* /luxh/solr/solr_home/collection1/

4、配置collection1中的schema.xml，加入ansj分詞配置

[[email protected] conf]# pwd
/luxh/solr/solr_home/collection1/conf
[[email protected] conf]# ls
currency.xml  lang  protwords.txt  _rest_managed.json  schema.xml  solrconfig.xml  stopwords.txt  synonyms.txt
[[email protected] conf]# vi schema.xml

　　加入如下内容：

Solr整合Ansj中文分詞器

<fieldType name="text_ansj" class="solr.TextField">
        <analyzer type="index">
             <tokenizer class="org.apache.lucene.analysis.ansj.AnsjTokenizerFactory"  
                        query="false" pstemming="true" stopwordsDir="stopwords/stopwords.dic"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="org.apache.lucene.analysis.ansj.AnsjTokenizerFactory"
                       query="true" pstemming="false"/>
        </analyzer>
    </fieldType>

Solr整合Ansj中文分詞器

5、啟動tomcat

6、通過 http://你的ip:8080/solr/admin.html Add Core

　　instanceDir指向剛才建立的collection1

Solr整合Ansj中文分詞器

7、測試

　　1）英文

Solr整合Ansj中文分詞器

　　2）中文

Solr整合Ansj中文分詞器

Solr整合Ansj中文分詞器

繼續閱讀

UVa 400 Unix的ls指令

Solr配置檔案及SolrCloudSolr配置檔案及SolrCloud

solr學習添加文檔（Add Document)

Solr實作商城搜尋高亮顯示

Android百度地圖——搜尋服務之周邊檢索

solr（八）：管理solr cloud切片執行個體（增加和删除）一、建立有兩切片shard1/shard2的collection2二、删除叢集執行個體collection1

Solr 8-7的安裝、配置1、Solr單獨運作方式2、運作Solr3、Solr常用指令4、Solr+tomcat方式

solrCloud 4.7 分布式搜尋重要bug QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard

飛5的Spring Boot2（27）- solr

飛5的Spring Boot2（3）- 細說starters

solr查詢服務配置

Solr6.3 Getting Started managed-schema配置介紹

solr6.1.0的安裝與入門添加core

【Solr現網問題】索引文檔數量超限

延雲行業搜尋資料庫在大資料生态中位置和重要性大資料的挑戰大資料技術的現狀延雲行業搜尋資料庫

30天了解30種技術系列---(10)面向Cloud的搜尋引擎 ElasticSearch