天天看点

Elasticsearch添加拼音搜索支持

一份不错的资料

ELASTIC 搜索开发实战

一、安装插件

拼音分词扩展elasticsearch-analysis-pinyin安装

文档:

https://github.com/medcl/elasticsearch-analysis-pinyin

二、新建索引添加拼音支持

替换为 实际 index

替换为 实际 type

PUT <index>
{
  "settings" : {
      "analysis" : {
        "analyzer" : {
          "pinyin_analyzer" : {
              "tokenizer" : "my_pinyin"
              }
        },
        
        "tokenizer" : {
          "my_pinyin" : {
            "type" : "pinyin",
            "keep_first_letter":false,
            "keep_separate_first_letter" : false,
            "keep_full_pinyin" : true,
            "keep_original" : false,
            "limit_first_letter_length" : 16,
            "lowercase" : true
          }
        }
      }
    },

  "mappings": {
    "<type>": {
      "properties": {
        "name": {
          "type": "text",
          "index": true,
          "fields":{
              "pinyin":{
                  "type":"text",
                  "analyzer":"pinyin_analyzer"
              }
           }
        },
        "link": {
          "type": "keyword",
          "index": false
        },
        "id": {
          "type": "long"
        },
        "update_time": {
          "type": "date",
          "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        }
      }
    }
  }
}
      

分词测试

GET <index>/_analyze
{
  "field": "name.pinyin",
  "text": "内蒙古"
}

返回
{
  "tokens": [
    {
      "token": "nei",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "meng",
      "start_offset": 1,
      "end_offset": 2,
      "type": "word",
      "position": 1
    },
    {
      "token": "gu",
      "start_offset": 2,
      "end_offset": 3,
      "type": "word",
      "position": 2
    }
  ]
}      

二、已有索引添加拼音支持

1、新建索引

PUT <index>
{
  "mappings": {
    "<type>": {
      "properties": {
        "name": {
          "type": "keyword",
          "index": true
        },
        "link": {
          "type": "keyword",
          "index": false
        },
        "id": {
          "type": "long"
        },
        "update_time": {
          "type": "date",
          "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        }
      }
    }
  }
}      

2、设置拼音分词器

POST  <index>/_close

PUT <index>/_settings
{
  "index": {
    "analysis": {
      "analyzer": {
        "pinyin_analyzer": {
          "tokenizer": "my_pinyin"
        }
      },
      "tokenizer": {
        "my_pinyin": {
          "type": "pinyin",
          "keep_first_letter": true,
          "keep_separate_first_letter": true,
          "keep_full_pinyin": true,
          "keep_original": false,
          "limit_first_letter_length": 16,
          "lowercase": true
        }
      }
    }
  }
}

POST  <index>/_open      

3、修改mapping,添加拼音分词器

PUT <index>/<type>/_mapping
{
  "<type>": {
    "properties": {
      "name": {
        "type": "keyword",
        "index": true,
            "fields":{
                "pinyin":{
                    "type":"text",
                    "analyzer":"pinyin_analyzer"
                }
            }
      },
      "link": {
        "type": "keyword",
        "index": false
      },
      "id": {
        "type": "long"
      },
      "update_time": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}


GET <index>/_mapping


# 将当前索引的数据重新替换一下当前索引
POST <index>/_update_by_query?conflicts=proceed
      

4、搜索测试

get <index>/_search
{
  "query_string": {
    "fields": [
      "name",
      "name.pinyin"
    ],
    "query": "王苏川",
    "default_operator": "AND"
  }
}
      

参考

Elastic 搜索开发实战 拼音处理

继续阅读