[大数据] ElasticSearch 简介与安装(一)

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> ElasticSearch 简介与安装(一) -> 正文阅读

[大数据]ElasticSearch 简介与安装(一)

ElasticSearch简介

Elasticsearch是用Java开发并且是当前最流行的开源的企业级搜索引擎。能够达到实时搜索，稳定，可靠，快速，安装使用方便。
客户端支持Java、.NET（C#）、PHP、Python、Ruby等多种语言。

应用场景

像我们百度上搜索的词条，商城搜索的商品关键字等都是通过ES搜索引擎去做的查询。

ElasticSearch和Lucene

Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库（框架）

Lucene缺点：

1）只能在Java项目中使用,并且要以jar包的方式直接集成项目中.
2）使用非常复杂-创建索引和搜索索引代码繁杂
3）不支持集群环境-索引数据不同步（不支持大型项目）
4）索引数据如果太多就不行，索引库和应用所在同一个服务器,共同占用硬盘.共用空间少.

我们可以理解成ES是基于Lucene所实现的分布式搜索引擎，相对Lucene来说，ES的功能更加强大，可操作性的api更加简单。

关于ES vs Solr比较

在这里插入图片描述

ES 全文检索和分词原理

全文检索:
程序在创建索引和查询索引的过程中，都会对数据做一个文本检索，提炼关键字在文本的位置和出现次数。后续在查询时，同样检索数据中关键字，根据关键字数据去找索引（倒排索引），再去获取对应索引的详细信息。

分词原理是基于倒排索引。

创建索引流程:
处理数据 -> 数据分词（记录文本位置，和出现次数） -> 根据数据值去重 -> 倒排索引

查询索引流程:
数据条件 -> 数据分词 -> 根据词条获取 index下标 -> 查询具体内容

倒排索引: 根据词条寻找index。
通常我们是根据index去查询数据。

如何理解ElasticSearch

ElasticSearch可以看成一个数据库，一个非关系型数据库，自身支持的数据格式是json。他也有自己可视化客户端，我们也可以通过api和代码对其进行调用。

ES 和关系型数据库

在这里插入图片描述

ES服务的安装和配置

安装

注: ES是对内存要求比较高的，默认是1G，大量的数据会加载到内存

ES不能使用root用户来启动，必须使用普通用户来安装启动。为了安全不允许使用root用户启动。

后台启动的形式 ./目标文件 -d

我这边下载的是 elasticsearch-7.6.1-linux-x86_64.tar.gz

# 创建路径  我放的路径是 mkdir /usr/local/es/

# 解压安装
[root@10-9-44-97 es]# tar -zvxf elasticsearch-7.6.1-linux-x86_64.tar.gz -C /usr/local/es/

# 创建es分组
[root@10-9-44-97 es]# groupadd es 

# 添加用户
[root@10-9-44-97 es]# useradd esuser

# 设置用户密码
[root@10-9-44-97 es]# passwd esuser

# 将 esuser 用户添加到 es 用户组 
[root@10-9-44-97 es]# usermod -G es esuser
[root@10-9-44-97 es]# chown -R esuser /usr/local/es/*


#添加权限
[root@10-9-44-97 es]# visudo
#在root ALL=(ALL) ALL 一行下面 
#添加 esuser 用户 如下: 
esuser ALL=(ALL) ALL

#切换成 esuser用户
[root@10-9-44-97 es]# su esuser
[esuser@10-9-44-97 es]$ 

#创建日志存放文件
[esuser@10-9-44-97 es]$ mkdir ‐p /usr/local/es/elasticsearch‐7.6.1/log
#创建数据存放文件
[esuser@10-9-44-97 es]$ mkdir ‐p /usr/local/es/elasticsearch‐7.6.1/data

[esuser@10-9-44-97 es]$ cd /usr/local/es/elasticsearch‐7.6.1/config

[esuser@10-9-44-97 es]$ vim  elasticsearch.yml

elasticsearch.yml 配置文件

建议配置的都手打下，直接粘贴可能会存在格式问题

# ======================== Elasticsearch Configuration =========================
#	es的配置模板
#	
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 集群名称 
cluster.name: es-tx
#
# ------------------------------------ Node ------------------------------------
# 我们自己节点名称
node.name: node-tx-1
#
# Add custom attributes to the node:
# 自定义的节点属性
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
# 存放数据的目录路径(多个位置之间用逗号隔开):
path.data: /usr/local/es/elasticsearch‐7.6.1/data
#
# Path to log files:
# 日志文件
path.logs: /usr/local/es/elasticsearch‐7.6.1/log
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
# 启动是否锁定内存  es对内存的占用还是大的  如果存在内存切换将非常耗能
#bootstrap.memory_lock: true
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
# 设置绑定地址为指定IP地址(IPv4或IPv6):
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
# 设置端口号
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 集群的服务发现
discovery.seed_hosts: ["服务器ip"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
# 集群初始主节点 
cluster.initial_master_nodes: ["node名称"]
#是否支持跨域，默认为false
http.cors.enabled: true 
#当设置允许跨域，默认为*,表示支持所有域名，如果我们只是允许某些网站能访问，那么可以使用正则表达式。比如只允许本地地址。 /https?:\/\/localhost(:[0-9]+)?/
http.cors.allow‐origin: "*"

修改jvm.option

根据自己服务器内存大小来配置jvm堆内存大小。es默认堆大小是1g，我们如果资源比较充裕，可以调整到2g。

vim /usr/local/es/elasticsearch‐7.6.1/config/jvm.options

 ‐Xms2g 
 ‐Xmx2g

启动ES服务

#直接后台启动 bin目录下 elasticsearch 
[esuser@10-9-44-97 es]$ ./usr/local/es/elasticsearch-7.6.1/bin/elasticsearch -d


#启动后我们可以查询下后台  
[esuser@10-9-44-97 elasticsearch-7.6.1]$ ps -ef |grep elasticsearch
esuser     44951       1 19 15:00 pts/0    00:00:17 /usr/local/es/elasticsearch-7.6.1/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/tmp/elasticsearch-6367305308402663350 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m -XX:MaxDirectMemorySize=268435456 -Des.path.home=/usr/local/es/elasticsearch-7.6.1 -Des.path.conf=/usr/local/es/elasticsearch-7.6.1/config -Des.distribution.flavor=default -Des.distribution.type=tar -Des.bundled_jdk=true -cp /usr/local/es/elasticsearch-7.6.1/lib/* org.elasticsearch.bootstrap.Elasticsearch -d
esuser     45014   44951  0 15:00 pts/0    00:00:00 /usr/local/es/elasticsearch-7.6.1/modules/x-pack-ml/platform/linux-x86_64/bin/controller
esuser     46836   44424  0 15:01 pts/0    00:00:00 grep --color=auto elasti

自己的服务器端口号一定要开相应的端口，我们http.port: 9200 所以要把服务器相应端口也打开，一般阿里云和腾讯云都可以在安全组内配置端口，也可以通过设置linux防火墙的形式打开。

端口打开后可以访问 http://服务器ip:端口/
就能看到我们自己 es的一些信息

{
  "name" : "xt-node1",
  "cluster_name" : "xt-es",
  "cluster_uuid" : "_na_",
  "version" : {
    "number" : "7.6.1",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "aa751e09be0a5072e8570670309b1f12348f023b",
    "build_date" : "2020-02-29T00:15:25.529771Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

遇到的问题

错误信息描述：

max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]

**原因：**最大虚拟内存太小

每次启动机器都手动执行下。

执行以下命令
编辑 /etc/sysctl.conf，
添加内容：vm.max_map_count=262144
保存后，执行：sysctl ‐p

可能会保存失败：

参数报sysctl: cannot stat /proc/sys/–p: No such file or directory
#解决办法
1、 modprobe br_netfilter

2、 ls /proc/sys/net/bridge
#最后再执行下就可以了
3、  sysctl -p

注意: 启动ES的时候出现 Permission denied

原因：是因为权限问题，

#给用户赋权
chown -R 创建的用户名 文件路径
chown -R esuser /usr/local/es/elasticsearch-7.6.1

备注：问题解决完成之后，重新连接xshell生效。

Kibana客户端 - Elasticsearch可视化界面

为了更好的操作elasticsearch，我们安装个Kibana客户端。

安装

选好路径 /usr/local/es
解压 tar -zxvf kibana-X.X.X-linux-x86_64.tar.gz
我们可以去安装的文件目录下 config内

配置 kibana.yml


vi kibana.yml

# 当前es可视化客户端服务的端口
server.port: 5601

# 服务器ip  可以配置 0.0.0.0 
server.host: "服务器ip"

# 监听的es的服务器
elasticsearch.hosts: ["http://es服务器ip:端口"]