IK分词器 & pinyin分词器的安装
ES的安装目录下执行
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.2.0/elasticsearch-analysis-ik-7.2.0.zip
若是离线安装可以使用下列命令
cd plugins/
mkdir ik
mkdir pinyin
unzip ../plugin-zips/elasticsearch-analysis-ik-7.5.1.zip -d plugins/ik
IK分词器的说明
ik_max_word 和 ik_smart 什么区别
ik_max_word 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合
ik_smart 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”
下面的例子使用ik_max_word并需要启用 fielddata 的能力
PUT message_index
{
"mappings": {
"properties":{
"message": {
"analyzer": "ik_max_word",
"term_vector": "with_positions_offsets",
"boost": 8,
"type": "text",
"fielddata":"true"
}
}
}
}
POST message_index/_doc/1
{
"message":"《原神》霄宫角色PV——「鸣神岛夏天的象征」"
}
POST message_index/_doc/2
{
"message":"原神神里和霄宫该如何选择?全网最强评测"
}
POST message_index/_doc/3
{
"message":"原神:雷神心口拔刀,一刀斩败主角,最后还嫌我太慢抽完万叶抽神里,没有人比我更懂原神保底"
}
POST message_index/_doc/4
{
"message":"原神:神里怎么会加血?雷神稳稳的了,常驻池五虎上将齐了"
}
POST message_index/_doc/4
{
"message":"将会出现雷神和心海,还会有个神秘的5星角色原神"
}
POST message_index/_doc/5
{
"message":"氪金原神2.0,脸黑无下限!亏到自闭!"
}
POST message_index/_doc/6
{
"message":"我宣布原神氪金不再适合我,歪到大气层外面的万叶不抽也罢"
}
POST message_index/_doc/7
{
"message":"联合参展视频烟绯生日快乐哦"
}
POST message_index/_doc/8
{
"message":"可莉的生日礼物《原神》拾枝杂谈"
}
POST message_index/_doc/9
{
"message":"神里怎么会加血?雷神稳稳的了,常驻池五虎上将齐了"
}
执行并查看结果
POST message_index/_search
{
"size" : 0,
"aggs" : {
"messages" : {
"terms" : {
"size" : 15,
"field" : "message"
}
}
}
}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 9,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"messages" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 91,
"buckets" : [
{
"key" : "神",
"doc_count" : 8
},
{
"key" : "原",
"doc_count" : 7
},
{
"key" : "的",
"doc_count" : 4
},
{
"key" : "里",
"doc_count" : 3
},
{
"key" : "雷",
"doc_count" : 3
},
{
"key" : "万",
"doc_count" : 2
},
{
"key" : "叶",
"doc_count" : 2
},
{
"key" : "和",
"doc_count" : 2
},
{
"key" : "宫",
"doc_count" : 2
},
{
"key" : "氪",
"doc_count" : 2
},
{
"key" : "生日",
"doc_count" : 2
},
{
"key" : "角色",
"doc_count" : 2
},
{
"key" : "金",
"doc_count" : 2
},
{
"key" : "霄",
"doc_count" : 2
},
{
"key" : "2.0",
"doc_count" : 1
}
]
}
}
}
|