①. Mapping字段映射

①. 映射(Mapping)相当于数据表的表结构。ElasticSearch中的映射(Mapping)用来定义一个文档,可以定义所包含的字段以及字段的类型、分词器及属性等等。
②. 映射可以分为动态映射和静态映射

1.动态映射(dynamic mapping):
  在关系数据库中,需要事先创建数据库,然后在 该数据库实例下创建数据表,然后才能在该数据
表中插入数据。而ElasticSearch中不需 要事先定义映射(Mapping),文档写入ElasticSearch时,会
根据文档字段自动识别类型。这种机制称之为动态映射
2.静态映射 :
  在ElasticSearch中也可以事先定义好映射,包含文档的各个字段及其类 型等,这种方式称之为
静态映射

②. 常用类型如下

①.字符串类型

类型	描述
text	当一个字段要是被全文搜索的,比如Email内容、产品描述,应该使用text类型。设置text类型以后,字段内容会被分析,在生成倒排索引以前,字符串会被分析器分成一个个的词项。text类型的字段不用排序,很少用于聚合
keyword	keyword类型适用于索引结构化的字段,比如email地址、主机名、状态码和标签。如果字段需要进行过滤(比如查找已发布博客中status属性为published的文章)、排序、聚合。keyword类型的字段只能通过精确值搜索到

②. 整数类型

类型	取值范围
byte	-128 - 127
short	-32768 - 32767
integer	-2的31次方 – 2的31-1
long	-2的63次方 - 2的63次方-1

③. 浮点类型

类型	取值范围
doule	64位双精度浮点类型
float	32位单精度浮点类型
half_float	16位半精度浮点类型
scaled_float	缩放类型的浮点数

④. date类型,日期类型表示格式可以是以下几种:

日期格式的字符串,比如"2018-01-13"或"2018-01-13 12:10:30"
long类型的毫秒数(从1970年开始)
integer的秒数

⑤. boolean类型:逻辑类型(布尔类型)可以接受true/false
⑥. binary类型
二进制字段是指base64来表示索引中储存的二进制数据,可用来储存二进制形式的数据,例如图像。默认情况下,该类型的字段只储存不索引。二进制只支持index_name属性
⑦. array类型
⑧. object类型:JSON天生具有层级关系,文档会包含嵌套的对象

③. 映射的查看、创建

①. 查看mapping信息:GET bank/_mapping

 {
    "bank" : {
      "mappings" : {
        "properties" : {
          "account_number" : {
            "type" : "long" # long类型
          },
          "address" : {
            "type" : "text", # 文本类型,会进行全文检索,进行分词
            "fields" : {
              "keyword" : { # addrss.keyword
                "type" : "keyword",  # 该字段必须全部匹配到
                "ignore_above" : 256
              }
            }
          },
          "age" : {
            "type" : "long"
          },
          "balance" : {
            "type" : "long"
          },
          "city" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "email" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "employer" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "firstname" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "gender" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "lastname" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "state" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }

②. 新版本改变:ElasticSearch7-去掉type概念

关系型数据库中两个数据表示是独立的,即使他们里面有相同名称的列也不影响使用,但ES中不是这样的。elasticsearch是基于Lucene开发的搜索引擎,而ES中不同type下名称相同的filed最终在Lucene中的处理方式是一样的
(1). 两个不同type下的两个user_name,在ES同一个索引下其实被认为是同一个filed,你必须在两个不同的type中定义相同的filed映射。否则,不同type中的相同字段名称就会在处理中出现冲突的情况,导致Lucene处理效率下降。
(2). 去掉type就是为了提高ES处理数据的效率。
Elasticsearch 7.x URL中的type参数为可选。比如,索引一个文档不再要求提供文档类型
Elasticsearch 8.x 不再支持URL中的type参数
解决:
将索引从多类型迁移到单类型,每种类型文档一个独立索引
将已存在的索引下的类型数据,全部迁移到指定位置即可。详见数据迁移

③. 创建映射PUT /my_index

PUT /my_index
{
  "mappings": {
    "properties": {
      "age": {
        "type": "integer"
      },
      "email": {
        "type": "keyword" # 指定为keyword
      },
      "name": {
        "type": "text" # 全文检索。保存时候分词,检索时候进行分词匹配
      }
    }
  }
}
输出:
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_index"
}
查看映射GET /my_index
输出结果:
{
  "my_index" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "email" : {
          "type" : "keyword"
        },
        "employee-id" : {
          "type" : "keyword",
          "index" : false
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1588410780774",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "ua0lXhtkQCOmn7Kh3iUu0w",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "my_index"
      }
    }
  }
}
添加新的字段映射PUT /my_index/_mapping
PUT /my_index/_mapping
{
  "properties": {
    "employee-id": {
      "type": "keyword",
      "index": false # 字段不能被检索。检索
    }
  }
}
这里的 “index”: false,表明新增的字段不能被检索,只是一个冗余字段。

④. 不能更新映射:对于已经存在的字段映射,我们不能更新。更新必须创建新的索引,进行数据迁移。

④. 数据迁移

①. 先创建new_twitter的正确映射,然后使用如下方式进行数据迁移。

6.0以后写法
POST reindex
{
  "source":{
      "index":"twitter"
   },
  "dest":{
      "index":"new_twitters"
   }
}


老版本写法
POST reindex
{
  "source":{
      "index":"twitter",
      "twitter":"twitter"
   },
  "dest":{
      "index":"new_twitters"
   }
}

②. 案例:原来类型为account,新版本没有类型了,所以我们把他去掉

GET /bank/_search
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",//原来类型为account,新版本没有类型了,所以我们把他去掉
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 1,
          "balance" : 39225,
          "firstname" : "Amber",
          "lastname" : "Duke",
          "age" : 32,
          "gender" : "M",
          "address" : "880 Holmes Lane",
          "employer" : "Pyrami",
          "email" : "amberduke@pyrami.com",
          "city" : "Brogan",
          "state" : "IL"
        }
      },
      ...
GET /bank/_search
查出
"age":{"type":"long"}

③. 想要将年龄修改为integer,先创建新的索引

PUT /newbank
{
  "mappings": {
    "properties": {
      "account_number": {
        "type": "long"
      },
      "address": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "balance": {
        "type": "long"
      },
      "city": {
        "type": "keyword"
      },
      "email": {
        "type": "keyword"
      },
      "employer": {
        "type": "keyword"
      },
      "firstname": {
        "type": "text"
      },
      "gender": {
        "type": "keyword"
      },
      "lastname": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "state": {
        "type": "keyword"
      }
    }
  }
}
查看“newbank”的映射:
GET /newbank/_mapping
能够看到age的映射类型被修改为了integer.
"age":{"type":"integer"}

④. 将bank中的数据迁移到newbank中

POST _reindex
{
  "source": {
    "index": "bank",
    "type": "account"
  },
  "dest": {
    "index": "newbank"
  }
}
运行输出:
#! Deprecation: [types removal] Specifying types in reindex requests is deprecated.
{
  "took" : 768,
  "timed_out" : false,
  "total" : 1000,
  "updated" : 0,
  "created" : 1000,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

⑤. 查看newbank中的数据

GET /newbank/_search

输出
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "newbank",
        "_type" : "_doc", # 没有了类型

⑤. ik_max_word、ik_smart分词器

①. 一个tokenizer(分词器)接收一个字符流,将之分割为独立的tokens(词元,通常是独立的单词),然后输出tokens流。
例如:whitespace tokenizer遇到空白字符时分割文本。它会将文本"Quick brown fox!"分割为(Quick,brown,fox!)
该tokenizer(分词器)还负责记录各个terms(词条)的顺序或position位置(用于phrase短语和word proximity词近邻查询),以及term(词条)所代表的原始word(单词)的start(起始)和end(结束)的character offsets(字符串偏移量)(用于高亮显示搜索的内容)。
elasticsearch提供了很多内置的分词器(标准分词器),可以用来构建custom analyzers(自定义分词器)。
关于分词器: https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis.html
注意:对于中文,我们需要安装额外的分词器(调整vagrant内存为4G)

POST _analyze
{
  "analyzer": "standard",
  "text": "The 2 Brown-Foxes bone."
}
执行结果:
{
  "tokens" : [
    {
      "token" : "the",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "2",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<NUM>",
      "position" : 1
    },
    {
      "token" : "brown",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "foxes",
      "start_offset" : 12,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "bone",
      "start_offset" : 18,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 4
    }
  ]
}

②. 安装ik分词器
在前面安装的elasticsearch时,我们已经将elasticsearch容器的“/usr/share/elasticsearch/plugins”目录,映射到宿主机的“ /mydata/elasticsearch/plugins”目录下,所以比较方便的做法就是下载“/elasticsearch-analysis-ik-7.4.2.zip”文件,然后解压到该文件夹下即可。安装完毕后,需要重启elasticsearch容器。
注意需要将权限进行修改chmod -R 777 plugins/ik

在这里插入图片描述

③. ik_max_word:会将文本做最细粒度的拆分,比如会将“中华人民共和国人民大会堂”拆分为“中华人民共和国、中华人民、中华、华人、人民共和国、人民、共和国、大会堂、大会、会堂等词语(索引的时候用ik_max_word)

{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中华人民",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "中华",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "华人",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "人民共和国",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "人民",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "共和国",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "共和",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "国人",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 8
    },
    {
      "token" : "人民大会堂",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 9
    },
    {
      "token" : "人民大会",
      "start_offset" : 7,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 10
    },
    {
      "token" : "人民",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 11
    },
    {
      "token" : "大会堂",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 12
    },
    {
      "token" : "大会",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 13
    },
    {
      "token" : "会堂",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 14
    }
  ]
}

④. ik_smart:会做最粗粒度的拆分,比如会将“中华人民共和国人民大会堂”拆分为中华人民共和国、人民大会堂。(前台搜索的时候用 ik_smart)

GET _analyze
{
   "analyzer": "ik_smart", 
   "text":"中华人民共和国人民大会堂"
}
{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "人民大会堂",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}

⑥. 自定义分词器

①. 修改/usr/share/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict"></entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords"></entry>
	<!--用户可以在这里配置远程扩展字典 -->
	<entry key="remote_ext_dict">http://192.168.56.10/es/fenci.txt</entry> 
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

②. 修改完成后,需要重启elasticsearch容器,否则修改不生效。docker restart elasticsearch

GET _analyze
{
   "analyzer": "ik_smart", 
   "text":"唐智谷粒商城"
}
{
  "tokens" : [
    {
      "token" : "唐智谷粒商城",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}

③. 具体的操作步骤

[root@localhost ~]# docker ps
CONTAINER ID   IMAGE                 COMMAND                  CREATED         STATUS              PORTS                                                                                  NAMES
95de12634192   elasticsearch:7.4.2   "/usr/local/bin/dock…"   4 seconds ago   Up 3 seconds        0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 0.0.0.0:9300->9300/tcp, :::9300->9300/tcp   elasticsearch
a197c1d2cf05   kibana:7.4.2          "/usr/local/bin/dumb…"   30 hours ago    Up About a minute   0.0.0.0:5601->5601/tcp, :::5601->5601/tcp                                              kibana
a18680bef63e   redis                 "docker-entrypoint.s…"   5 weeks ago     Up 2 minutes        0.0.0.0:6379->6379/tcp, :::6379->6379/tcp                                              redis
91e02812975d   mysql:5.7             "docker-entrypoint.s…"   5 weeks ago     Up 2 minutes        0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp                                   mysql
[root@localhost ~]# cd /mydata/
[root@localhost mydata]# ls
elasticsearch  mysql  redis
[root@localhost mydata]# mkdir nginx
[root@localhost mydata]# docker images
REPOSITORY      TAG       IMAGE ID       CREATED         SIZE
redis           latest    08502081bff6   8 weeks ago     105MB
mysql           5.7       09361feeb475   2 months ago    447MB
kibana          7.4.2     230d3ded1abc   22 months ago   1.1GB
elasticsearch   7.4.2     b1179d41a7b4   22 months ago   855MB
[root@localhost mydata]# docker run -p80:80 --name nginx -d nginx:1.10
Unable to find image 'nginx:1.10' locally
1.10: Pulling from library/nginx
6d827a3ef358: Pull complete 
1e3e18a64ea9: Pull complete 
556c62bb43ac: Pull complete 
Digest: sha256:6202beb06ea61f44179e02ca965e8e13b961d12640101fca213efbfd145d7575
Status: Downloaded newer image for nginx:1.10
24c1454acf9f8419f762f3369b59557df57cd6209864ef64000f2f26d9f0d05b
[root@localhost mydata]# mkdir -p /mydata/nginx/html
[root@localhost mydata]# mkdir -p /mydata/nginx/logs
[root@localhost mydata]# mkdir -p /mydata/nginx/conf
[root@localhost mydata]# ls
elasticsearch  mysql  nginx  redis
[root@localhost mydata]# cd nginx/
[root@localhost nginx]# ls
conf  html  logs
[root@localhost nginx]# cd ..
[root@localhost mydata]# rm -rf nginx/
[root@localhost mydata]# docker container cp nginx:/etc/nginx .
[root@localhost mydata]# ls
elasticsearch  mysql  nginx  redis
[root@localhost mydata]# docker stop nginx
nginx
[root@localhost mydata]# docker rm nginx 
nginx
[root@localhost mydata]# ls
elasticsearch  mysql  nginx  redis
[root@localhost mydata]# cd nginx
[root@localhost nginx]# ls
conf.d  fastcgi_params  koi-utf  koi-win  mime.types  modules  nginx.conf  scgi_params  uwsgi_params  win-utf
[root@localhost nginx]# cd ..
[root@localhost mydata]# mv nginx conf
[root@localhost mydata]# ls
conf  elasticsearch  mysql  redis
[root@localhost mydata]# mkdir nginx
[root@localhost mydata]# mv conf nginx/
[root@localhost mydata]# ls
elasticsearch  mysql  nginx  redis
[root@localhost mydata]# cd nginx/
[root@localhost nginx]# ls
conf
[root@localhost nginx]# docker run -p 80:80 --name nginx \
>  -v /mydata/nginx/html:/usr/share/nginx/html \
>  -v /mydata/nginx/logs:/var/log/nginx \
>  -v /mydata/nginx/conf/:/etc/nginx \
>  -d nginx:1.10
01bfbb6a8cd0e3f6af476793ad33fdc696740eadb125f8adad573303524adb55
[root@localhost nginx]# ls
conf  html  logs
[root@localhost nginx]# docker update nginx --restart=always
nginx
[root@localhost nginx]# echo '<h2>hello nginx!</h2>' >index.html
[root@localhost nginx]# ls
conf  html  index.html  logs
[root@localhost nginx]# rm -rf index.html 
[root@localhost nginx]# cd html
[root@localhost html]# echo '<h2>hello nginx!</h2>' >index.html
[root@localhost html]# 
[root@localhost html]# mkdir es
[root@localhost html]# cd es
[root@localhost es]# vi fenci.text
[root@localhost es]# ls
fenci.text
[root@localhost es]# mv fenci.text fenci.txt
[root@localhost es]# cd /mydata/
[root@localhost mydata]# cd elasticsearch/
[root@localhost elasticsearch]# ls
config  data  plugins
[root@localhost elasticsearch]# cd plugins/
[root@localhost plugins]# ls
ik
[root@localhost plugins]# cd ik/
[root@localhost ik]# ls
commons-codec-1.9.jar    config                               httpclient-4.5.2.jar  plugin-descriptor.properties
commons-logging-1.2.jar  elasticsearch-analysis-ik-7.4.2.jar  httpcore-4.4.4.jar    plugin-security.policy
[root@localhost ik]# cd config/
[root@localhost config]# ls
extra_main.dic         extra_single_word_full.dic      extra_stopword.dic  main.dic         quantifier.dic  suffix.dic
extra_single_word.dic  extra_single_word_low_freq.dic  IKAnalyzer.cfg.xml  preposition.dic  stopword.dic    surname.dic
[root@localhost config]# vi IKAnalyzer.cfg.xml 
[root@localhost config]# docker restart elasticsearch 
elasticsearch
[root@localhost config]# cd /mydata/nginx/
[root@localhost nginx]# ls
conf  html  logs
[root@localhost nginx]# cd html/es/
[root@localhost es]# ls
fenci.txt
[root@localhost es]# cat fenci.txt 
唐智谷粒商城