ElasticSearch
ES基本概念
index库 >type表 >document文档
-
index索引 动词:相当于mysql的insert。 名词:相当于mysql的db。 -
Type类型 在index中,可以定义一个或多个类型。 类似于mysql的table,每一种类型的数据放在一起。 -
Document文档 保存在某个index下,某种type的一个数据document,文档是json格式的,document就像是mysql中的某个table里面的内容。每一行对应的列叫属性。 关系型数据库中两个数据表示是独立的,即使他们里面有相同名称的列也不影响使用,但ES中不是这样的。elasticsearch是基于Lucene开发的搜索引擎,而ES中不同type下名称相同的filed最终在Lucene中的处理方式是一样的。
ES安装步骤
- Step1:下载ealastic search(存储和检索)和kibana(可视化检索)
docker pull elasticsearch:7.4.2
docker pull kibana:7.4.2
mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data
echo "http.host: 0.0.0.0" >/mydata/elasticsearch/config/elasticsearch.yml
chmod -R 777 /mydata/elasticsearch/
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2
docker update elasticsearch --restart=always
docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.168.101:9200 -p 5601:5601 -d kibana:7.4.2
docker update kibana --restart=always
访问查看elasticsearch版本信息: http://192.168.168.101:9200
访问查看 Kibana控制台页面: http://192.168.168.101:5601/app/kibana
简单检索
查询
访问查看elasticsearch节点信息: http://192.168.168.101:9200/_cat/nodes
访问查看elasticsearch节点信息: http://192.168.168.101:9200/_cat/health
访问查看elasticsearch主节点: http://192.168.168.101:9200/_cat/master
查看所有索引,等价于mysql数据库的show databases;
http://192.168.168.101:9200/_cat/indices
新增文档
PUT customer/external/1
http://192.168.168.101:9200/customer/external/1
{
"name":"John Doe"
}
返回如下:
带有下划线开头的,称为元数据,反映了当前的基本信息。
{
"_index": "customer", 表明该数据在哪个数据库下;
"_type": "external", 表明该数据在哪个类型下;
"_id": "1", 表明被保存数据的id;
"_version": 1, 被保存数据的版本
"result": "created", 这里是创建了一条数据,如果重新put一条数据,则该状态会变为updated,并且版本号也会发生变化。
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
查看文档
GET /customer/external/1
http://192.168.168.101:9200/customer/external/1
乐观锁:通过“if_seq_no=1&if_primary_term=1 ”,当序列号匹配的时候,才进行修改,否则不修改。
http://192.168.168.101:9200/customer/external/1?if_seq_no=18&if_primary_term=6
更新文档
POST customer/externel/1/_update
POST customer/externel/1
PUT customer/externel/1
{
"doc":{
"name":"222"
}
}
区别: POST操作会对比源文档数据,如果相同不会有什么操作,文档version不增加。 PUT操作总会重新保存并增加version版本。
使用场景: 对于大并发更新,不带update。 对于大并发查询偶尔更新,带update;对比更新,重新计算分配规则。
删除文档
DELETE customer/external/1
DELETE customer
http://192.168.168.101:9200/customer/external/1
http://192.168.168.101:9200/customer
批量操作
POST /customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"John Doe"}
POST /_bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}
http://192.168.168.101:9200/_bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}
进阶检索
https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json (原先的样例数据 accounts.json找不到 )
https://hub.fastgit.org/liuwen766/accounts.json/tree/main(新的样例数据)
POST bank/account/_bulk
http://192.168.168.101:9200/bank/account/_bulk
https://hub.fastgit.org/liuwen766/accounts.json/tree/main
信息检索
GET bank/_search?q=*&sort=account_number:asc
说明:
q=*
sort
asc
GET bank/_search
GET bank/_search?q=*&sort=account_number:asc
GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" },
{ "balance":"desc"}
]
}
GET bank/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 5,
"_source":["balance"],
"sort": [
{
"account_number": {
"order": "desc"
}
}
]
}
GET bank/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 5,
"sort": [
{
"account_number": {
"order": "desc"
}
}
],
"_source": ["balance","firstname"]
}
GET bank/_search
{
"query": {
"match": {
"account_number": "20"
}
}
}
GET bank/_search
{
"query": {
"match": {
"address": "kings"
}
}
}
GET bank/_search
{
"query": {
"match": {
"address.keyword": "990 Mill"
}
}
}
GET bank/_search
{
"query": {
"match_phrase": {
"address": "mill road"
}
}
}
GET bank/_search
{
"query": {
"multi_match": {
"query": "mill",
"fields": [
"state",
"address"
]
}
}
}
GET bank/_search
{
"query":{
"bool":{
"must":[
{"match":{"address":"mill"}},
{"match":{"gender":"M"}}
]
}
}
}
GET bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "gender": "M" }},
{ "match": {"address": "mill"}}
],
"must_not": [
{ "match": { "age": "38" }}
]
}
}
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"gender": "M"
}
},
{
"match": {
"address": "mill"
}
}
],
"must_not": [
{
"match": {
"age": "18"
}
}
],
"should": [
{
"match": {
"lastname": "Wallace"
}
}
]
}
}
}
GET bank/_search
{
"query": {
"bool": {
"must": [
{ "match": {"address": "mill" } }
],
"filter": {
"range": {
"balance": {
"gte": "10000",
"lte": "20000"
}
}
}
}
}
}
GET bank/_search
{
"query": {
"term": {
"address": "mill Road"
}
}
}
GET bank/_search
{
"query": {
"match": {
"address": "Mill"
}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 10
}
},
"ageAvg": {
"avg": {
"field": "age"
}
},
"balanceAvg": {
"avg": {
"field": "balance"
}
}
},
"size": 0
}
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"ageAvg": {
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"genderAgg": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
},
"ageBalanceAvg": {
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
Mapping字段映射
Mapping(映射)是用来定义一个文档(document),以及它所包含的属性(field)是如何存储和索引的。比如:使用maping来定义: 哪些字符串属性应该被看做全文本属性(full text fields); 哪些属性包含数字,日期或地理位置; 文档中的所有属性是否都嫩被索引(all 配置); 日期的格式;| 自定义映射规则来执行动态添加属性; 查看mapping信息:GET bank/_mapping
字段类型
-
字符串 text ?于全?索引,搜索时会自动使用分词器进?分词再匹配 keyword 不分词,搜索时需要匹配完整的值 -
数值型 整型: byte,short,integer,long 浮点型: float, half_float, scaled_float,double -
日期类型:date -
范围型 integer_range, long_range, float_range,double_range,date_range -
布尔:boolean -
二进制 :binary 会把值当做经过 base64 编码的字符串,默认不存储,且不可搜索 -
复杂数据类型 数组: Array 对象:object,object一个对象中也可以嵌套对象。 嵌套类型:nested 用于json对象数组
PUT /my_index
{
"mappings": {
"properties": {
"age": {
"type": "integer"
},
"email": {
"type": "keyword"
},
"name": {
"type": "text"
}
}
}
}
GET /my_index
PUT /my_index/_mapping
{
"properties": {
"employee-id": {
"type": "keyword",
"index": false
}
}
}
POST reindex
{
"source":{
"index":"twitter"
},
"dest":{
"index":"new_twitters"
}
}
POST reindex
{
"source":{
"index":"twitter",
"twitter":"twitter"
},
"dest":{
"index":"new_twitters"
}
}
GET /bank/_search
"age":{"type":"long"}
PUT /newbank
{
"mappings": {
"properties": {
"account_number": {
"type": "long"
},
"address": {
"type": "text"
},
"age": {
"type": "integer"
},
"balance": {
"type": "long"
},
"city": {
"type": "keyword"
},
"email": {
"type": "keyword"
},
"employer": {
"type": "keyword"
},
"firstname": {
"type": "text"
},
"gender": {
"type": "keyword"
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state": {
"type": "keyword"
}
}
}
}
GET /bank/_search
"age":{"type":"integer"}
POST _reindex
{
"source": {
"index": "bank",
"type": "account"
},
"dest": {
"index": "newbank"
}
}
分词
一个tokenizer(分词器)接收一个字符流,将之分割为独立的tokens (词元,通常是独立的单词),然后输出tokens流。 该tokenizer(分词器)还负责记录各个terms(词条)的顺序或position位置(用于phrase短语和word proximity词近邻查询),以及term(词条)所代表的原始word(单词)的start(起始)和end(结束)的character offsets(字符串偏移量)(用于高亮显示搜索的内容)。
elasticsearch提供了很多内置的分词器(标准分词器),可以用来构建custom analyzers(自定义分词器)。
所有的语言分词,默认使用的都是“Standard Analyzer”,但是这些分词器针对于中文的分词,并不友好。因此需要安装中文的分词器。下载地址: https://github.com/medcl/elasticsearch-analysis-ik/releases
[root@k8s101 ~]
{
"name" : "4c163b58400e",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "M4JiMlNeSmeZkMngqp4c_A",
"version" : {
"number" : "7.4.2",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
"build_date" : "2019-10-28T20:40:44.881551Z",
"build_snapshot" : false,
"lucene_version" : "8.2.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
[root@k8s101 ~]
[root@66718a266132 elasticsearch]
/usr/share/elasticsearch
[root@66718a266132 elasticsearch]
/usr/share/elasticsearch
[root@66718a266132 elasticsearch]
[root@66718a266132 elasticsearch]
[root@66718a266132 elasticsearch]
[root@66718a266132 elasticsearch]
[root@66718a266132 elasticsearch]
[root@66718a266132 elasticsearch]
docker restart elasticsearch
POST _analyze
{
"analyzer": "standard",
"text": "I am the one."
}
GET _analyze
{
"text":"我是中国人"
}
GET _analyze
{
"analyzer": "ik_smart",
"text":"我是中国人"
}
GET _analyze
{
"analyzer": "ik_max_word",
"text":"我是中国人"
}
- 自定义分词
修改 /usr/share/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">http://192.168.168.101/es/fenci.txt</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
修改完成后,需要重启elasticsearch容器,否则修改不生效。docker restart elasticsearch。
更新完成后,es只会对于新增的数据用更新分词。历史数据是不会重新分词的。如果想要历史数据重新分词,需要执行:
POST my_index/_update_by_query?conflicts=proceed
|