天天看点

es拼音分词 大帅哥_elasticsearch 拼音+ik分词,spring data elasticsearch 拼音分词

elasticsearch 自定义分词器

安装拼音分词器、ik分词器

下载源码需要使用maven打包

下载构建好的压缩包解压后放直接在elasticsearch安装目录下 plugins文件夹下,可以重命名

1.在es中设置分词

创建索引,添加setting属性

PUT myindex

{"settings": {"index":{"analysis":{"analyzer":{"ik_pinyin_analyzer":{"type":"custom","tokenizer":"ik_smart","filter":"pinyin_filter"}

},"filter":{"pinyin_filter":{"type":"pinyin","keep_separate_first_letter" : false,

"keep_full_pinyin" : true,

"keep_original" : false,

"limit_first_letter_length" : 10,

"lowercase" : true,

"remove_duplicated_term" : true}

}

}

}

}

}

添加属性 设置mapping属性

PUT myindex/_mapping/users

{"properties": {"uname":{"type": "text","analyzer": "ik_smart","search_analyzer": "ik_smart","fields": {"my_pinyin":{"type": "text","analyzer": "ik_pinyin_analyzer","search_analyzer": "ik_pinyin_analyzer"}

}

},"age":{"type": "integer"}

}

}

2.spring data elasticsearch设置分词

创建实体类

@Mapping(mappingPath = "elasticsearch_mapping.json")//设置mapping

@Setting(settingPath = "elasticsearch_setting.json")//设置setting

@Document(indexName = "myindex",type = "users")

public class User {

@Id

private Integer id;

//

// @Field(type =FieldType.keyword ,analyzer = "pinyin_analyzer",searchAnalyzer = "pinyin_analyzer")//没有作用

private String name1;

@Field(type = FieldType.keyword)

private String userName;

@Field(type = FieldType.Nested)

private List products;

}

在resources下创建elasticsearch_mapping.json 文件

{"properties": {"uname": {"type": "text","analyzer": "ik_smart","search_analyzer": "ik_smart","fields": {"my_pinyin": {"type": "text","analyzer": "ik_pinyin_analyzer","search_analyzer": "ik_pinyin_analyzer"}

}

},"age": {"type": "integer"}

}

}

在resources下创建elasticsearch_setting.json 文件

{

"index": {

"analysis": {

"analyzer": {

"ik_pinyin_analyzer": {

"type": "custom",

"tokenizer": "ik_smart",

"filter": "pinyin_filter"

}

},

"filter": {

"pinyin_filter": {

"type": "pinyin",

//true:支持首字母

"keep_first_letter":true,

//false:不支持首字母分隔

"keep_separate_first_letter": false,

//true:支持全拼

"keep_full_pinyin": true,

"keep_original": false,

//设置最大长度

"limit_first_letter_length": 10,

//小写非中文字母

"lowercase": true,

//重复的项将被删除

"remove_duplicated_term": true

}

}

}

}

}

ik_max_word:会将文本做最细粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、中华人民、中华、华人、人民共和国、人民、人、民、共和国、共和、和、国国、国歌」,会穷尽各种可能的组合;

ik_smart:会将文本做最粗粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、国歌」;

程序启动后分词并没有设置分词

实体创建后需要加上,创建的索引才可以分词

elasticsearchTemplate.putMapping(User.class);