一、es基本概念简介?
1.1介绍
????Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎
-
es中主要有三个概念 1.索引(index)->mysql的库 ,2.类型(type)->mysql的表 ,3.文档(document)->mysql中table的内容 -
对照图 -
为什么能海量数据中快速帮我们定位数据?(倒排索引,相较于mysql的全表扫描,es在存储数据的时候,维护一张倒排索引表如下图,哪个单词在那些数据中有,比如我们在搜索红海特工行动的时候,就会命中三次 红海,特工,行动,此时3记录就比较满足 应为他击中了两次)
二、安装
2.1安装es
docker pull elasticsearch:7.4.2
docker pull kibana:7.4.2
mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data
vi /mydata/elasticsearch/config/elasticsearch.yml
... "http.host: 0.0.0.0" ...
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2
chmod -R 777 /mydata/elasticsearch/
docker update elasticsearch --restart=always
docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.56.10:9200 -p 5601:5601 -d kibana:7.4.2
docker update kibana --restart=always
2.2安装ik分词器
- 在前面安装的elasticsearch时,我们已经将elasticsearch容器的“/usr/share/elasticsearch/plugins”目录,映射到宿主机的“ /mydata/elasticsearch/plugins”目录下,所以安装分词器就很简单了直接去下载,“/elasticsearch-analysis-ik-7.4.2.zip”文件,解压到该文件下重启es即可。
- 也可以进到es容器中
[vagrant@localhost ~]$ sudo docker exec -it elasticsearch /bin/bash
[root@66718a266132 elasticsearch]
/usr/share/elasticsearch
[root@66718a266132 elasticsearch]
[root@66718a266132 elasticsearch]
[root@66718a266132 elasticsearch]
[root@66718a266132 elasticsearch]
chmod -R 777 plugins/ik
docker restart elasticsearch
2.3 安装nginx
docker pull nginx:1.10
docker run -p 80:80 --name nginx -d nginx:1.10
cd /mydata/nginx
docker container cp nginx:/etc/nginx .
然后在外部 /mydata/nginx/nginx 有了一堆文件
mv /mydata/nginx/nginx /mydata/nginx/conf
docker stop nginx
docker rm nginx
docker run -p 80:80 --name nginx \
-v /mydata/nginx/html:/usr/share/nginx/html \
-v /mydata/nginx/logs:/var/log/nginx \
-v /mydata/nginx/conf:/etc/nginx \
-d nginx:1.10
docker update nginx --restart=always
三、原生api使用
3.1入门案例(简单增删改查)
3.1.1. _cat
- GET /_cat/nodes 查看所有节点
- GET /_cat/health 查看es健康信息
- GET /_cat/nodes 查看主节点
- GET /_cat/nodes 查看所有索引
- 以用postman为例
3.1.2 保存
- 要保存一条数据,首先要告诉先有索引和类型,指定用哪个唯一标识
put http://192.168.64.132:9200/customer/external/1
{
"name":"lujun"
}
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 4
}
ps:PUT和 POST都可以,POST新增。如果不指定id,会自动生成id。指定id就会修改这个数据,并新增版本号 PUT可以新增可以修改。PUT必须指定id;由于PUT需要指定id,我们一般都用来做修改操作,不指定id会报错。
3.1.3 查询,更新,删除
1.查询
GET http://192.168.64.132:9200/customer/external/1
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 2,
"_seq_no": 1,
"_primary_term": 4,
"found": true,
"_source": {
"name": "lujun"
}
}
2.修改
POST http://192.168.64.132:9200/customer/external/1
{
"name":"lujun2"
}
POST http://192.168.64.132:9200/customer/external/1
{
"name":"lujun2",
"xx":"xx"
}
3. 删除 文档
DELETE http://192.168.64.132:9200/customer/external/1
删除 索引
DELETE http://192.168.64.132:9200/customer
删除类型 没有
4.批量操作 注意需要在kinbana里面操作
POST /customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"luijune"}
{"index":{"_id":"2"}}
{"name":"lujun2"}
复杂实例
POST /_bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}
3.2 检索
官方文档: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/getting-started-search.html 主要方式有1.通过接口(a直接拼接参数b请求体)
3.2.1 基本格式
GET bank/_search?q=*&sort=account_number:asc
GET bank/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 5,
"_source":["balance"],
"sort": [
{
"account_number": {
"order": "desc"
}
}
]
}
3.2.2 匹配查询_单字段匹配(match和match_phrase)
GET bank/_search
{
"query": {
"match": {
"address": "kings"
}
}
}
GET bank/_search
{
"query": {
"match_phrase": {
"address": "mill road"
}
}
}
GET bank/_search
{
"query": {
"match_phrase": {
"address": "990 Mill"
}
}
}
{
.........
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 10.806405,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 10.806405,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "forbeswallace@pheast.com",
"city" : "Lopezo",
"state" : "AK"
}
}
]
}
}
GET bank/_search
{
"query": {
"match": {
"address.keyword": "990 Mill"
}
}
}
3.2.3 匹配查询_多字段匹配(multi_math)
GET bank/_search
{
"query": {
"multi_match": {
"query": "mill",
"fields": [
"state",
"address"
]
}
}
}
{
"took" : 28,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 5.4032025,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 5.4032025,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "forbeswallace@pheast.com",
"city" : "Lopezo",
"state" : "AK"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 5.4032025,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "winnieholland@neteria.com",
"city" : "Urie",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "345",
"_score" : 5.4032025,
"_source" : {
"account_number" : 345,
"balance" : 9812,
"firstname" : "Parker",
"lastname" : "Hines",
"age" : 38,
"gender" : "M",
"address" : "715 Mill Avenue",
"employer" : "Baluba",
"email" : "parkerhines@baluba.com",
"city" : "Blackgum",
"state" : "KY"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "472",
"_score" : 5.4032025,
"_source" : {
"account_number" : 472,
"balance" : 25571,
"firstname" : "Lee",
"lastname" : "Long",
"age" : 32,
"gender" : "F",
"address" : "288 Mill Street",
"employer" : "Comverges",
"email" : "leelong@comverges.com",
"city" : "Movico",
"state" : "MT"
}
}
]
}
}
3.2.4 多条件嵌套查询(bool/must)
先来看几个关键词
- must:必须达到must所列举的所有条件
- must_not:必须不匹配must_not所列举的所有条件。
- should:应该满足should所列举的条件(满足加分,不满足也没事)。
GET bank/_search
{
"query":{
"bool":{
"must":[
{"match":{"address":"mill"}},
{"match":{"gender":"M"}}
]
}
}
}
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"gender": "M"
}
},
{
"match": {
"address": "mill"
}
}
],
"must_not": [
{
"match": {
"age": "18"
}
}
],
"should": [
{
"match": {
"lastname": "Wallace"
}
}
],
}
}
}
3.2.5 结果过滤(filter)
filter在使用过程中,并不会计算相关性得分:
GET bank/_search
{
"query": {
"bool": {
"must": [
{ "match": {"address": "mill" } }
],
"filter": {
"range": {
"balance": {
"gte": "10000",
"lte": "20000"
}
}
}
}
}
}
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 5.4032025,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 5.4032025,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "forbeswallace@pheast.com",
"city" : "Lopezo",
"state" : "AK"
}
}
]
}
}
3.2.6 匹配之term
作用和match是匹配字段
- 全文检索字段用match
- 其他非text字段匹配用term(有点类似精确查找的意思)
来看一个例子
GET bank/_search
{
"query": {
"term": {
"address": "mill Road"
}
}
}
3.2.7 聚合(就是分组)
????注意es的好用之处,查询出来的结果进行保留,然后进行我们一些想要的聚合,然后进行返回
"aggs":{
"aggs_name":{
"AGG_TYPE":{}
}
}
GET bank/_search
{
"query": {
"match": {
"address": "Mill"
}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 10
}
},
"ageAvg": {
"avg": {
"field": "age"
}
},
"balanceAvg": {
"avg": {
"field": "balance"
}
}
},
"size": 0
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageAgg" : { // 第一个聚合
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 38,
"doc_count" : 2
},
{
"key" : 28,
"doc_count" : 1
},
{
"key" : 32,
"doc_count" : 1
}
]
},
"ageAvg" : { // 第二个聚合
"value" : 34.0
},
"balanceAvg" : {
"value" : 25208.0
}
}
}
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"ageAvg": {
"avg": {
"field": "balance"
}
}
}
}
}
}
}
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"genderAgg": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
},
"ageBalanceAvg": {
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
3.2.8 nested对象聚合
数组类型的对象会被扁平化处理(对象的相同属性会分别存储到一起)
{
"group": "fans",
"user": [
{ "name": "John", "nicheng": "Smith" },
{ "name": "Alice", "nicheng": "White" }
]
}
{
"group": "fans",
"user.name": [ "alice", "john" ],
"user.nicheng": [ "smith", "white" ]
}
https://blog.csdn.net/weixin_40341116/article/details/80778599
3.3 映射(略)
3.4 分词
POST _analyze
{
"analyzer": "standard",
"text": "The 2 Brown-Foxes bone."
}
GET _analyze
{
"analyzer": "ik_smart",
"text":"我是中国人"
}
- 虽然上述我们完成了对中文的切分,但是一些网络热词ik分词器并没有给我准备,这时候我们就需要自定义一些词。
- 修改/usr/share/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">http://192.168.64.132/es/fenci.txt</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
docker restart elasticsearch
POST my_index/_update_by_query?conflicts=proceed
2.使用nginx
mkdir /mydata/nginx/html/es
cd /mydata/nginx/html/es
touch fenci.txt
echo "乔碧萝殿下,yyds" > /mydata/nginx/html/fenci.txt
GET _analyze
{
"analyzer": "ik_max_word",
"text":"乔碧萝殿下,yyds"
}
四、java客户端操作es
- ps:spring-data-Elasticsearch 整合了es相关操作 比较简单,这边是使用es官方-Elasticsearch-Rest-Client。
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.4.2</version>
</dependency>
@Configuration
public class GuliESConfig {
public static final RequestOptions COMMON_OPTIONS;
static {
RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
COMMON_OPTIONS = builder.build();
}
@Bean
public RestHighLevelClient esRestClient() {
RestClientBuilder builder = null;
builder = RestClient.builder(new HttpHost(host, 9200, "http"));
RestHighLevelClient client = new RestHighLevelClient(builder);
return client;
}
}
public void indexData() throws IOException {
IndexRequest indexRequest = new IndexRequest ("users");
indexRequest.id("1");
User user = new User();
user.setUserName("lujun");
user.setAge(24);
user.setGender("男");
String jsonString = JSON.toJSONString(user);
indexRequest.source(jsonString, XContentType.JSON);
IndexResponse index = client.index(indexRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);
System.out.println(index);
}
public void find() throws IOException {
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("bank");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
System.out.println(sourceBuilder.toString());
searchRequest.source(sourceBuilder);
SearchResponse response = client.search(searchRequest, GuliESConfig.COMMON_OPTIONS);
System.out.println(response.toString());
}
public void find() throws IOException {
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("bank");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
TermsAggregationBuilder agg1 = AggregationBuilders.terms("agg1").field("age").size(10);
sourceBuilder.aggregation(agg1);
AvgAggregationBuilder agg2 = AggregationBuilders.avg("agg2").field("balance");
sourceBuilder.aggregation(agg2);
System.out.println("检索条件"+sourceBuilder.toString());
searchRequest.source(sourceBuilder);
SearchResponse response = client.search(searchRequest, GuliESConfig.COMMON_OPTIONS);
System.out.println(response.toString());
}
SearchHits hits = response.getHits();
SearchHit[] hits1 = hits.getHits();
for (SearchHit hit : hits1) {
hit.getId();
hit.getIndex();
String sourceAsString = hit.getSourceAsString();
Account account = JSON.parseObject(sourceAsString, Account.class);
System.out.println(account);
}
Aggregations aggregations = response.getAggregations();
Terms agg21 = aggregations.get("agg2");
for (Terms.Bucket bucket : agg21.getBuckets()) {
String keyAsString = bucket.getKeyAsString();
System.out.println(keyAsString);
}
五、项目使用
- 上架的商品才可以在网站展示。
- 上架的商品需要可以被检索。
思路就是在后台管理有商品列表有一个上架的功能点击 我们要同步信息到es 代码(略过)主要难点就是怎么存?
{
skuId:1
spuId:11
skyTitile:华为xx
price:999
saleCount:99
attr:[
{尺寸:5},
{CPU:高通945},
{分辨率:全高清}
]
缺点:如果每个sku都存储规格参数,会有冗余存储,因为每个spu对应的sku的规格参数都一样(空间上的浪费)
sku索引
{
spuId:1
skuId:11
}
attr索引
{
skuId:11
attr:[
{尺寸:5},
{CPU:高通945},
{分辨率:全高清}
]
}
先找到4000个符合要求的spu,再根据4000个spu查询对应的属性,封装了4000个id,long 8B*4000=32000B=32KB
1K个人检索,就是32MB(时间浪费)
结论:如果将规格参数单独建立索引,会出现检索时出现大量数据传输的问题,会引起网络网络
|