[大数据] elasticSearch学习入门-安装使用

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> elasticSearch学习入门-安装使用 -> 正文阅读

[大数据]elasticSearch学习入门-安装使用

1. es框架

? Elasticsearch 是一个兼有搜索引擎和NoSQL数据库功能的开源系统，基于Java/Lucene构建，可以用于全文搜索，结构化搜索以及近实时分析。

2. es相关术语

2.1 相关概念

es和数据库作用类似,所以我们这里对es相关术语的学习和关系型数据库进行对比便于读者理解学习

es概念	数据库概念
index 索引	数据库表
type 类型（es 7版本弃用）	表逻辑类型
Document 文档	表的一行记录
filed 字段	记录对应的字段（字段名、类型、长度等）
mapping 映射	表结构定义
NRT 近实时	一秒或者一秒内延迟（Near real time 近乎实时）
Node 节点	集群部署情况下的每一个服务节点
shard replica	数据分片和备份

2.2 倒排索引

es最核心的两个概念为索引和搜索，这里的建立的索引即为倒排索引，在说到倒排索引之前我们需要先了解一下何为正排索引

正排索引：所谓正排是针对记录都有一个唯一标识，我们通过唯一标识来搜索对应的记录信息，比如数据库主键查询 key=>value,但是对于文档类型关键字查询可以需要扫描全部文档记录找到匹配的记录类似于数据库的like模糊查询性能低下。

倒排索引：与正排索引不同会将文档通过分词形成一个个词组，建立词组和文档唯一标识的记录，value(分词后)=》key 同时这样处理还可以记录词组在文档出现的频次和词组位置，便于我们更好更快的搜索。

在这里插入图片描述

3. es安装部署

#下载es
wget   
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.1-linux-x86_64.tar.gz
#解压项目
tar -zxvf elasticsearch-7.4.1.tar.gz -C /usr/local

#修改配置elasticsearch.yml,jvm.options
scp username@servername:/path/远程目录 /path/本地目录


#因为es服务不能直接使用root用户启动
#所以这里需要新建用户es 并为该用户赋予es使用权限
adduser es
chown -R es /usr/local/



#启动es 切换到es用户
su es
./bin/elasticsearch &

9200发布端口 9300集群节点之间的内部通讯

#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 设置集群名称 不设置默认分配
cluster.name: xiu-es
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 设置单服务节点信息
node.name: xiu-node-1
#
# ----------------------------------- Paths ------------------------------------
#
#设置es数据存储路径
path.data: /usr/local/elasticsearch/elasticsearch-7.4.1/data
#
# Path to log files:
#设置es日志存储路径
path.logs: /usr/local/elasticsearch/elasticsearch-7.4.1/logs
#

# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
# 设置可以远程访问的地址 0.0.0.0表示所有主机都可远程访问
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#设置http远程访问端口 9300是集群内部通讯端口
http.port: 9200

#
# --------------------------------- Discovery ---------------
# 设置集群节点地址
#discovery.seed_hosts: ["127.0.0.01"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#设置集群节点
cluster.initial_master_nodes: ["xiu-node-1“]

4. header 插件安装

elasticsearch-header是es的可视化访问页面，由于head插件本质上还是一个nodejs的工程，因此需要安装node，使用npm来安装依赖的包。node环境安装参考

# 下载head项目
git clone https://github.com/mobz/elasticsearch-head.git

# 安装
cnpm install
# 运行
npm run start

在这里插入图片描述

5. es相关api使用

5.1 集群api

# 查看集群状态
http://xxx.xx.x.xxx:9200/_cluster/health

#查看所有节点信息
http://xxx.xx.x.xxx:9200/_nodes

#查看节点的状态信息 （ip 端口）
http://xxx.xx.x.xxx:9200/_nodes/stats

5.2 索引相关

5.2.1 创建索引

PUT请求: http://xxx.xx.x.xxx:9200/{index_name}

PUT: http://xxx.xx.x.xxx:9200/singer

请求参数

{
	"settings": {
		"index": {
            //分片数（每个分片只存储一部分数据 所有分片共同）
			"number_of_shards": "5",
            //副本数 每一个主分片都设置
			"number_of_replicas": "0"
		}
	}
}

响应参数

{
    "acknowledged": true,
    "shards_acknowledged": true,
     //索引名
    "index": "singer" 
}

5.2.2 查看索引库

GET请求: http://xxx.xx.x.xxx:9200/{index_name}

GET：  http://xxx.xx.x.xxx:9200/singer

响应信息

{
    //索引名（相当于数据库 表 库名）
    "my-index-test": {
        //索引别名
        "aliases": {},
        //映射关系（数据表 字段类型） 此时还没有映射关系 
        //我们在后面创建
        "mappings": {},
        //索引设置信息 分片和副本、版本等信息
        "settings": {
            "index": {
                "creation_date": "1641371223528",
                "number_of_shards": "5",
                "number_of_replicas": "0",
                "uuid": "NaMpmAlzQJO3r0Bmr_QJmQ",
                "version": {
                    "created": "6000199"
                },
                "provided_name": "singer"
            }
        }
    }
}

5.2.3 创建映射关系

索引有了，接下来肯定是添加数据但在添加数据之前必须定义映射。

**映射**是定义文档的过程，文档包含哪些字段，这些字段是否保存，是否索引，是否分词，字段类型等

字段	描述
类型名称	数据库中的不同表的字段名：任意填写，可以指定许多属性
type	类型 es支持如text、long、short、date、integer、object等多种字符串 text：可分词，不可参与聚合 keyword：不可分词，数据会作为完整字段进行匹配，可以参与聚合
index	是否索引,默认为true true：字段会被索引，则可以用来进行搜索。 false：字段不会被索引，不能用来搜索
store	是否存储，默认为false
analyzer	分词器，这里的`ik_max_word`即使用ik分词器

put请求  http://ip:port/{index_ name}/_mappings/{type}

put  http://xxx.xx.x.xxx:9200/singer/_mappings/singers

请求信息

{
  "properties": {
    "id": {
      //int类型
      "type": "integer",
      "index": "true"
    },
    "name": {
       //字符串
      "type": "text",
      "index": "true"
    },
     "singer_id": {
       //字符串 不可分词 可搜索
      "type": "keyword",
      "index": "true"
    },
    //图片信息不需要进行分词和搜索
    "images": {
      //keyword不可分词
      "type": "keyword",
      "index": "false"
    },
    "height": {
       //浮点类型
      "type": "float"
    },
    "status": {
      //布尔类型
      "type": "boolean"
    },
    "birthday":{
      "type": "date"  
    },
    "desc": {
      //日期类型
      "type": "text",
      "index": "true",
      //分词器 
      "analyzer":"standard"
    },
    "relation":{
      "type": "object"
    }
  }
}

响应结果

{
  "acknowledged": true
}

5.2.4 删除索引

DELETE请求  http://ip:port/{index_name}
DELETE http://xxx.xx.x.xxx:9200/my_index

//响应信息
{
  "acknowledged": true
}

5.3 数据相关

5.3.1 添加数据

POST  http://ip:port/{index_name}/{type}/[{id}]

#手动添加 _id主键
POST  http://xxx.xx.x.xxx:9200/singer/singers/1

#自动添加 _id es自动生成
POST  http://xxx.xx.x.xxx:9200/singer/singers/

请求参数

{
    "id":"42",
    "name":"邓紫棋",
    "singer_id":"13948",
    "images":"http://y.gtimg.cn/music/photo_new/T001R150x150M000001fNHEf1SFEFN.webp",
    "height":1.69,
    "status":true,
    "bitthday":"1991-08-16",
    "relation":{"name":"张杰"}
}

5.3.2 修改数据

使用上述新增数据的请求重复的_id会进行覆盖也可以使用如下进行标准的修改（可以进行局部数据更新）

POST请求 http://ip:port/{index_name}/{types}/{id}

POST http://xxx.xx.x.xxx:9200/singer/singers/1
{
    "doc":{
        "name":"JayChou"
    }
}

5.3.3 删除数据

DELETE请求  http://ip:port/{index_name}/{type}/1
DELETE http://xxx.xxx.X.xxx:9200/singer/singers/1

5.4 自动创建映射

在上述创建索引的过程中,我们手动为索引创建了一个映射关系，但是es也提供了一个自动创建映射，我们只需要添加数据 es会根据我们添加的数据类型

POST请求  http://ip:port/{index_name}/{type}/1

POST  http://xxx.xx.x.xxx:9200/song/songs/1  
{
    "song_name":"当我去过她",
    "song_id":"http://image.leyou.com/12479122.jpg",
    "song_price":12.00,
    "song_type": 200,
    "status":true
}

如上创建会报错因为从6.0后，index里建多个types 不被支持了。

在这里插入图片描述

所以需要创建新的索引库，从而进行添加数据即可成功

没添加数据之前无映射关系

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QYDijPP9-1641561023122)(D:\study\doc\es框架学习\images\无映射关系.png)]
在这里插入图片描述

添加数据后自动生成映射关系

在这里插入图片描述

6. 内置分词和ik分词

我们知道es会先将我们添加的数据字段进行分词形成词组，这样我们进行搜索的时候命中词组后就会搜索到对应的数据doc。而这里的将对应的字段分成词组就是分词器的功劳。es为我们提供多种类型的分词器

下面通过请求分析其默认的分词效果

POST请求 http://ip:port/{index_name}/_analyze

POST: http://xxx.xx.x.xxx:9200/song/_analyze

#请求参数
{
  //设置分词 standard/simple/Whitespace等
  "analyzer":"standard",
  "text":"老天,我要回家过年!!!"
}

6.1 内置分词器

Standard 分词器

"text":"老天,我要回家过年!!!"

该分词器会将字符串按照一个单字进行分词并替换成小写形式，并去除停用词和标点符号

{
    "tokens": [
        {
            "token": "老",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<IDEOGRAPHIC>",
            "position": 0
        },
        {
            "token": "天",
            "start_offset": 1,
            "end_offset": 2,
            "type": "<IDEOGRAPHIC>",
            "position": 1
        },
        {
            "token": "我",
            "start_offset": 3,
            "end_offset": 4,
            "type": "<IDEOGRAPHIC>",
            "position": 2
        },
        {
            "token": "要",
            "start_offset": 4,
            "end_offset": 5,
            "type": "<IDEOGRAPHIC>",
            "position": 3
        },
        {
            "token": "回",
            "start_offset": 5,
            "end_offset": 6,
            "type": "<IDEOGRAPHIC>",
            "position": 4
        },
        {
            "token": "家",
            "start_offset": 6,
            "end_offset": 7,
            "type": "<IDEOGRAPHIC>",
            "position": 5
        },
        {
            "token": "过",
            "start_offset": 7,
            "end_offset": 8,
            "type": "<IDEOGRAPHIC>",
            "position": 6
        },
        {
            "token": "年",
            "start_offset": 8,
            "end_offset": 9,
            "type": "<IDEOGRAPHIC>",
            "position": 7
        }
    ]
}

simple 分词器

"text":"老天 12 3,我要回家过年!!!"

通过非字母字符来分割文本信息，去掉数字类型的字符然后将词汇单元统一为小写形式。

{
    "tokens": [
        {
            "token": "老天",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
        },
        {
            "token": "我要回家过年",
            "start_offset": 6,
            "end_offset": 12,
            "type": "word",
            "position": 1
        }
    ]
}

Whitespace 分词器

"text":“老天1,我要回家过年 I WANT!!!”

见名之意，该分词器是根据空格分隔词组,且不进行转换小写操作

{
    "tokens": [
        {
            "token": "老天",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
        },
        {
            "token": "12",
            "start_offset": 3,
            "end_offset": 5,
            "type": "word",
            "position": 1
        },
        {
            "token": "3,我要回家过年!!!",
            "start_offset": 6,
            "end_offset": 17,
            "type": "word",
            "position": 2
        }
    ]
}

Stop 分词器

** “text”:“I want the a apple”**

在 Simple的基础上增加了去除英文中的常用单词（如the，a等），也可以更加自己的需要设置常用单词；不支持中文

{
    "tokens": [
        {
            "token": "i",
            "start_offset": 0,
            "end_offset": 1,
            "type": "word",
            "position": 0
        },
        {
            "token": "want",
            "start_offset": 2,
            "end_offset": 6,
            "type": "word",
            "position": 1
        },
        {
            "token": "apple",
            "start_offset": 13,
            "end_offset": 18,
            "type": "word",
            "position": 4
        }
    ]
}

keyword 分词器

** “text”:“I want the a apple”**

不进行分词

{
    "tokens": [
        {
            "token": "I want the a apple",
            "start_offset": 0,
            "end_offset": 18,
            "type": "word",
            "position": 0
        }
    ]
}

pattern 分词器正则匹配分词
language 分词器一个用于解析特殊语言文本的analyzer集合。
snowball 分词器一个snowball类型的analyzer是由standard tokenizer和standard filter、lowercase filter、stop filter、snowball filter这四个filter构成的。

6.2 ik中文分词器

下载安装

#使用非root用户去下载安装ik插件
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.0.1/elasticsearch-analysis-ik-6.0.1.zip

#紧接着重启es服务

POST请求 http://ip:port/{index_name}/_analyze
POST http://xxx.xx.x.xxx:9200/singers/_analyze
{
      "text":"李荣浩，1985年7月11日出生于安徽省蚌埠市，中国流行乐男歌手、音乐制作人、演员、吉他手。",
      "analyzer":"ik_max_word"
}

分词结果

{
    "tokens": [
        {
            "token": "中国",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "流行",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "行乐",
            "start_offset": 3,
            "end_offset": 5,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "男歌手",
            "start_offset": 5,
            "end_offset": 8,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "歌手",
            "start_offset": 6,
            "end_offset": 8,
            "type": "CN_WORD",
            "position": 4
        }
    ]
}

使用

7. 查询

我们使用es绝大部分的场景是用于搜索，同时es也提供了丰富的查询机制。主要分为两种查询

7.1 queryURL 查询

该查询方式是将相关的查询条件按照特定规则添加到url后，这种方式无法支持复杂的查询且会导致url比较复杂安全性也不高

GET请求  http://ip:port/{index_name}/{type}/_search?q=key:valeue&key:value&sort=key:asc|desc

GET  http://xxx.xx.x.xxx:9200/singer/singers/_search?q=name:周杰伦

7.2 DSL查询

~~~

POST请求 http://ip:port/{index_name}/{type}/_search
POST http://xxx.xx.x.xxx:9200/singer/singers/_search
~~~

DSL(Domain Specific Language特定领域语言)以JSON请求体的形式出现。

DSL查询是ES提供的通用查询方式，这种方式最大的特点是开发语言的无关性，即任意的客户端只要支持HTTP请求，就可以通过JSON格式的查询数据完成复杂的搜索。

{
  "query": {
    "bool": {
      //完全匹配名字
      "must": [
        {
          "match": {
            "name": "周杰伦"
          }
        }
      ],
      //一定不等于该查询条件对应的字段
      "must_not": [
        {
          "term": {
            "status": "true"
          }
        }
      ],
      //应该 多用于范围查找
      //查找身高在1.60和1.90范围内的数据
      "should": [
        {
          "range": {
            "height": {
              "gt": "1.60",
              "lt": "1.90"
            }
          }
        }
      ]
    }
  },
  //分页 start
  "from": 0,
  //每页查询个数
  "size": 10,
  //排序字段
  "sort": [],
  "aggs": {}
}