[大数据] 翻译文档-elasticsearch-7.13.*-search-aggregations

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> 翻译文档-elasticsearch-7.13.*-search-aggregations -> 正文阅读

[大数据]翻译文档-elasticsearch-7.13.*-search-aggregations

https://www.elastic.co/guide/en/elasticsearch/reference/7.13/search-aggregations.html
aggregation将统计数据的metrics、statistics或其他纬度。aggregation帮助您回答以下问题:
* 我的网站的平均加载时间是什么?
* 基于交易量,谁是我最有价值的客户?
* 在我的网络中，什么会被认为是一个大文件?
* 每个产品类别有多少产品?

Elasticsearch提供的聚合操作分为三个类别:
* Metric aggregation：总和、平均值
* Bucket aggregation: 也称为垃圾箱,根据field的值、分数或其他条件，将document 分到不同的桶
* pipeline aggretation:输入来自其他聚合,而不是field或document。

执行一个聚合操作：
聚合参数在查询api的aggs中定义，以下搜索条件执行一个terms aggregation 在字段my-field上。

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      }
    }
  }
}
'

返回结果：

{
  "took": 78,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [...]
  },
  "aggregations": {                                (1)
    "my-agg-name": {                           
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

(1)：my-agg-name的结果

改变聚合的数据范围：
使用查询参数限制一个聚合的document范围：

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1d/d",
        "lt": "now/d"
      }
    }
  },
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      }
    }
  }
}
'

只返回聚合结果:
默认情况下,搜索结果包含搜索结果和聚合结果。只返回聚合结果,设置大小为0:

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      }
    }
  }
}
'

执行多个聚合操作：
您可以指定多个聚合在同一个请求：

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "my-first-agg-name": {
      "terms": {
        "field": "my-field"
      }
    },
    "my-second-agg-name": {
      "avg": {
        "field": "my-other-field"
      }
    }
  }
}
'

运行sub-aggregations：
bucket aggregation支持bucket或metric sub-aggregations。例如,一个terms aggregation 带有一个 avg sub-aggregation，用于计算每个桶中document的平均值。子聚合没有水平或嵌套深度的限制。

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      },
      "aggs": {
        "my-sub-agg-name": {
          "avg": {
            "field": "my-other-field"
          }
        }
      }
    }
  }
}
'

以上，桶聚合支持对每个桶再进行分桶、对每个桶进行度量统计；也说明进行过度量统计后，不能再使用子聚合。
sub-aggregation的结果在父聚合中

{
  ...
  "aggregations": {
    "my-agg-name": {                          (1)                   
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "foo",
          "doc_count": 5,
          "my-sub-agg-name": {                (2)                 
            "value": 75.0
          }
        }
      ]
    }
  }
}

(1):聚合的结果my-agg-name
(2):对my-agg-name进行子聚合的结果my-sub-agg-nam
添加自定义元数据:
在一个聚合中，利用meta对象添加自定义元素

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      },
      "meta": {
        "my-metadata-field": "foo"
      }
    }
  }
}
'

meta对象在返回结果中的位置：

{
  ...
  "aggregations": {
    "my-agg-name": {
      "meta": {
        "my-metadata-field": "foo"
      },
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

返回聚合类型：
默认情况，聚合结果包含聚合的名称，不包含聚合的类型。使用type_keys查询参数可以返回聚合类型。

curl -X GET "localhost:9200/my-index-000001/_search?typed_keys&pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "my-agg-name": {
      "histogram": {
        "field": "my-field",
        "interval": 1000
      }
    }
  }
}
'

typed_keys在url请求中，返回结果中，聚合类型作为聚合名称的前缀,中间通过#号分隔。

{
  ...
  "aggregations": {
    "histogram#my-agg-name": {                 
      "buckets": []
    }
  }
}

ps:一些聚合操作根据不同的请求返回不同的聚合类型。
例如terms、significant terms、percentiles 根据聚合字段的不同类型返回不同的聚合类型。

在聚合操作中使用脚本：
当一个field不满足需要的聚合操作时，可以使用runtime field 作为聚合字段。

curl -X GET "localhost:9200/my-index-000001/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
  "runtime_mappings": {
    "message.length": {
      "type": "long",
      "script": "emit(doc[\u0027message.keyword\u0027].value.length())"
    }
  },
  "aggs": {
    "message_length": {
      "histogram": {
        "interval": 10,
        "field": "message.length"
      }
    }
  }
}
'

脚本动态地计算字段值，需要增加一点开销。
一些聚合像terms 、filters，在使用runtime fields时不能被优化，需要花费计算时间。
In total, performance costs for using a runtime field varies from aggregation to aggregation【其他，总的来说,使用一个runtime field性能成本与聚合后再聚合不同。】

聚合缓存：

为了更快的响应，Elasticsearch 在 shard-request-cache上缓存经常执行的聚合结果。
To get cached results, use the same preference string for each search.
If you don’t need search hits, set size to 0 to avoid filling the cache.【如果不需要使用聚合缓存，设置size = 0 去避免用缓存填充结果】

Elasticsearch routes searches with the same preference string to the same shards. If the shards’ data doesn’t change between searches, the shards return cached aggregation results.

长度限制
当执行聚合时,Elasticsearch使用double来保存和表示数字。因此,聚合结果的长度大于2^53.