elasticsearch 自定义分词器
1.新增自定义分词器
官方文档
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
}
首先以上是官方自定义分词器的格式
- tokenizer 参数 主要有[,whitespace,ik_smart,im_max_word,english,standard 等]
- char_filter 参数 主要有[html_strip,mapping,pattern_replace]
2.自定义es 分词器
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"自定名称": {
"type": "custom",
"tokenizer": "ik_smart",
"char_filter": [
"html_strip"
]
}
}
}
}
}
|