开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> 【ES】Elasticsearch的基本操作及相关概念-02 -> 正文阅读

[大数据]【ES】Elasticsearch的基本操作及相关概念-02

前言

所有操作基于kibana的 devs tools，相关的安装可以看：

【ES学习】ElasticSearch MacOS版安装与使用（图文教程）

【ES学习】ElasticSearch在Kibana的使用 Kibana安装（MacOS版）

本文承接【ES】Elasticsearch的基本操作及相关概念-01

先再加一个文档

PUT /megacorp/employee/4
{
    "first_name" :  "Gang",
    "last_name" :   "Xiao",
    "age" :         23,
    "about" :       "I love everything about music",
    "interests":  [ "forestry", "music"]
}

query查询: interval - match

按照搜索词的特定排列，搜索对应的文档。并且可以规定搜索词之间的最大单词间隔。

我们输入：

============ 表达式 =============
GET /megacorp/employee/_search
{
  "query": {
    "intervals": {
      "目标字段": {
        "match": {
          "query": "搜索单词（串）",
          "max_gaps": 最大间隔数值,
          "ordered": 是否要有序（true/false）
        }
      }
    }
  }
}
================================
GET /megacorp/employee/_search
{
  "query": {
    "intervals": {
      "about": {
        "match": {
          "query": "I rock",
          "max_gaps": 4,
          "ordered": true
        }
      }
    }
  }
}

我们可以得到两个结果：

........
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.19999999,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.19999999,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      }
      ......

如果我们把参数"max_gaps"改为2，就什么都搜不到了。因为I单词和rock单词间隔小于等于2的，在所有文档的about字段都不存在。

query查询: term

term查询时判断某个document是否包含某个具体的值。term不会对查询语句进行分词处理，直接拿查询输入的文本去检索：

我们输入：

============ 表达式 ============
GET /megacorp/employee/_search
{
  "query" : {
    "term": {
      "目标字段": {
        "value": "单词（串）"
      }
    }
  }
}

================================
GET /megacorp/employee/_search
{
  "query" : {
    "term": {
      "about": {
        "value": "love"
      }
    }
  }
}

可以很顺利得到所有在about字段中包含love单词的结果：

    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "4",
        "_score" : 0.55584013,
        "_source" : {
          "first_name" : "Gang",
          "last_name" : "Xiao",
          "age" : 23,
          "about" : "I love everything about music",
          "interests" : [
            "forestry",
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.51556194,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }

但term查询不能分词处理，如果我们输入：

GET /megacorp/employee/_search
{
  "query" : {
    "term": {
      "about": {
        "value": "love to"
      }
    }
  }
}

结果什么都查不到。

query查询: terms

terms和term的查询语法基本类似，但terms支持多个文本查询:

GET /megacorp/employee/_search
{
  "query" : {
    "terms": {
      "about": ["build", "climbing"]
    }
  }
}

最终就会很干净的搜出两条文档，他们分别拥有build单词和climbing单词。

GET /_search
{
  "query" : {
    "terms": {
      "about": ["build", "climbing"]
    }
  }
}

query查询: range

query - range表达式，可以对搜索字段的数值进行约束。

其中gte表示大于等于，lte表示小于等于，gt表示大于，lt表示小于。boost表示对此次query进行相关性算分权重，默认是1.0

========== 表达式 =========
GET /megacorp/employee/_search
{
  "query" : {
    "range" : {
      "目标字段": {
        "gte": 数值,
        "lte": 数值,
        "boost": 数值
      }
    }
  }
}
=========================
GET /megacorp/employee/_search
{
  "query" : {
    "range" : {
      "age": {
        "gte": 20,
        "lte": 30,
        "boost": 1
      }
    }
  }
}

query查询: exist

返回指定字段不为空的所有文档。包括字段对应的值是null，或者[]，或者没有为这个字段建立索引。

============ 表达式 =============
GET /megacorp/employee/_search
{
  "query" : {
    "exists": {
      "field" : "目标字段"
    }
  }
}
=============================
GET /megacorp/employee/_search
{
  "query" : {
    "exists": {
      "field" : "about"
    }
  }
}

query查询: ids

根据文档的_id搜索对应的文档：

我们输入：

=========== 表达式 ============
GET /megacorp/employee/_search
{
  "query": {
    "ids": {
      "values": ["索引id","索引id","索引id"]
    }
  }
}
============================
GET /megacorp/employee/_search
{
  "query": {
    "ids": {
      "values": ["1","4","999"]
    }
  }
}

得到：

....
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "Gang",
          "last_name" : "Xiao",
          "age" : 23,
          "about" : "I love everything about music",
          "interests" : [
            "forestry",
            "music"
          ]
        }
      }
      ......

query查询: fuzzy

一个非常好玩的api。

fuzzy查询是一种模糊查询，会根据搜索单词的编辑距离（Levenshtein Distance）来判断是否匹配。一个编辑距离就是对单词进行一个字符的修改，这种修改可能是

修改一个字符，比如box到fox
删除一个字符，比如black到lack
插入一个字符，比如sic到sick
交换两个相邻的字符的位置，比如act到cat

我们输入：

============ 表达式 =============
GET /megacorp/employee/_search
{
  "query": {
    "fuzzy": {
      "目标字段": "搜索词"
    }
  }
}
==============================
GET /megacorp/employee/_search
{
  "query": {
    "fuzzy": {
      "about": "buil"
    }
  }
}

得到：

    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "3",
        "_score" : 0.9378587,
        "_source" : {
          "first_name" : "Douglas",
          "last_name" : "Fir",
          "age" : 35,
          "about" : "I like to build cabinets",
          "interests" : [
            "forestry"
          ]
        }
      }
    ]

所以我们模糊搜索搜到的词是build。

query查询: prefix

搜索出所有，包含以搜索词为前缀的单词的字段，的文档。

我们输入：

========= 表达式 =========
GET /megacorp/employee/_search
{
  "query": {
    "prefix": {
      "目标字段": "搜索词"
    }
  }
}
========================
GET /megacorp/employee/_search
{
  "query": {
    "prefix": {
      "about": "ro"
    }
  }
}

可以得到：

.....
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      }

.....

query查询: wildcard

搜索包含通配符的搜索词的文档。

支持两种通配符：

?：匹配任何单一的字符
*：匹配0个或者多个字符

我们输入：

=========== 表达式 ============
GET /megacorp/employee/_search
{
  "query": {
    "wildcard": {
      "目标字段": {
        "value": "搜索词和通配符"
      }
    }
  }
}
================================
GET /megacorp/employee/_search
{
  "query": {
    "wildcard": {
      "about": {
        "value": "ro*"
      }
    }
  }
}

可以搜索到所有about字段包含rock单词的文档。

query复合查询: bool

复合查询可以对多个字段过滤筛选，类比mysql的where多条件查询。

must：根据must中的条件过滤文档，返回的结果文档必须严格匹配条件，会影响相关性算分
filter：根据must中的条件过滤文档，返回的结果文档必须严格匹配条件，和must不同的是，filter不会影响相关性算分
should：根据should中的条件进行筛选，返回的结果文档应该包含should的条件，影响相关性算分
must_not：根据must_not中的条件过滤文档，返回的结果文档必须不包含must_not条件，会影响相关性算分t

我们可以输入：

============ 表达式 ===========
GET /megacorp/employee/_search
{
  "query" : {
    "bool" : {
      "must": [
        {
          "查询子句": {
            ......
          }
        }
      ],
      "filter": [
        {
          "查询子句": {
            ......
          }
        }
      ],
      "should": [
        {
          "查询子句": {
            ......
          }
        }
      ],
      "must_not": [
        {
          "查询子句": {
            ......
          }
        }
      ]
    }
  }
}
==============================
GET /megacorp/employee/_search
{
  "query" : {
    "bool" : {
      "must": [
        {
          "match" : {
            "about" : "like"
          }
        }
      ],
      "filter": [
        {
          "range" : {
            "age": {
              "gte": 25,
              "lte": 40
            }
          }
        }
      ],
      "should": [
        {
          "term": {
            "about": {
              "value": "to"
            }
          },
        }
      ]
    }
  }
}

可以得到结果：

	.....
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "3",
        "_score" : 1.4586673,
        "_source" : {
          "first_name" : "Douglas",
          "last_name" : "Fir",
          "age" : 35,
          "about" : "I like to build cabinets",
          "interests" : [
            "forestry"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 1.3529669,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      }
      ......