对于bucket aggregation聚合,已经讲解了常用的Terms Aggregation 链接,本篇讲解其他可能常用的分桶聚合。
1.Filter Aggregation
过滤聚合:是一个单值聚合,是把部分文档先确定下来,再进行子聚合运算(任务场景比较小,通常会把该条件放到query 里较好)
"aggs":{
"fiter_one":{
"filter":{
"term":{
"addr":"guangdong"
}
},
"aggs":{
"age_avg":{
"avg" : { "field" : "age" }
}
}
}
}
上面的例子是获取地址在guagndong的平均年龄(可以发现还要使用子聚合,如果不使用子聚合,则返回广东的文档数,所以用query更好),下面是响应。
"aggregations": {
"fiter_one": {
"doc_count": 2,
"age_avg": {
"value": 41
}
}
}
2.Filters Aggregation
多重过滤,可以设置多个分组的聚合,例如下面,获取广东 和浙江 的人数。
GET user/_search
{
"size": 0,
"aggs" : {
"adds" : {
"filters" : {
"filters" : {
"sum_guangdong" : { "match" : { "addr" : "guangdong"}},
"sum_zhejiang" : { "match" : { "addr" : "zhejiang" }}
}
}
}
}
}
响应结果:
"aggregations": {
"adds": {
"buckets": {
"sum_guangdong": {
"doc_count": 2
},
"sum_zhejiang": {
"doc_count": 1
}
}
}
}
- 匿名过滤:过滤字段作为分组依据,请求的顺序就是响应的顺序(注意请求的语句变化)
GET user/_search
{
"size": 0,
"aggs" : {
"adds" : {
"filters" : {
"filters" : [
{ "term" : { "addr" : "guangdong" }},
{ "term" : { "addr" : "zhejiang" }}
]
}
}
}
}
响应结果
"aggregations": {
"adds": {
"buckets": [
{
"doc_count": 2
},
{
"doc_count": 1
}
]
}
}
- 其他分组:把不满足分组的其他数据,放入到一个
other_bucket 分组中,默认不会返回
GET user/_search
{
"size": 0,
"aggs" : {
"adds" : {
"filters" : {
"other_bucket" : true,
"filters" : {
"sum_guangdong" : { "match" : { "addr" : "guangdong"}},
"sum_zhejiang" : { "match" : { "addr" : "zhejiang" }}
}
}
}
}
}
响应结果:
"aggregations": {
"adds": {
"buckets": {
"sum_guangdong": {
"doc_count": 2
},
"sum_zhejiang": {
"doc_count": 1
},
"_other_": {
"doc_count": 1
}
}
}
}
如果要改变,其他分组的分组名,可以设置参数:other_bucket_key ,设置该值,也就把other_bucket 默认设置为true
GET user/_search
{
"size": 0,
"aggs" : {
"adds" : {
"filters" : {
"other_bucket_key" : "other_addr",
"filters" : {
"sum_guangdong" : { "match" : { "addr" : "guangdong" }},
"sum_zhejiang" : { "match" : { "addr" : "zhejiang" }}
}
}
}
}
}
响应结果:
"aggregations": {
"adds": {
"buckets": {
"sum_guangdong": {
"doc_count": 2
},
"sum_zhejiang": {
"doc_count": 1
},
"other_addr": {
"doc_count": 1
}
}
}
}
3.Histogram Aggregation
直方图聚合,创建固定间距的分组聚合,默认初始的偏移量是0,区间是左闭右开,每个文档落入对应的区间的计算公式:bucket_key = Math.floor((value - offset) / interval) * interval + offset , 这里的offset 默认值是0,所以一般可以简化为:bucket_key = Math.floor(value/ interval) * interval ,比如价格是32,区间间隔是5,则bucket_key 是30,而它对应的区间是[30,35)。
GET user/_search
{
"size": 0,
"aggs" : {
"ages" : {
"histogram" : {
"field" : "age",
"interval" : 10
}
}
}
}
响应数据:
"aggregations": {
"ages": {
"buckets": [
{
"key": 30,
"doc_count": 1
},
{
"key": 40,
"doc_count": 3
},
{
"key": 50,
"doc_count": 0
},
{
"key": 60,
"doc_count": 0
},
{
"key": 70,
"doc_count": 1
}
]
}
}
从上面可以发现,比文档中最小值所在的区间不会展示,比文档中最大值所在的区间不会展示,在其中的所有的区间都是默认展示的。
- 也就是说文档可以根据文档中的值,动态的展示区间范围。但是有时我们并不想把文档数为0的区间也展示,那么就用到了
min_doc_count ,设置最小文档数。
GET user/_search
{
"size": 0,
"aggs" : {
"ages" : {
"histogram" : {
"field" : "age",
"interval" : 10,
"min_doc_count" : 1
}
}
}
}
GET user/_search
{
"size": 0,
"aggs" : {
"ages" : {
"histogram" : {
"field" : "age",
"interval" : 10,
"extended_bounds":{
"min":60,
"max":90
}
}
}
}
}
响应结果:
"aggregations": {
"ages": {
"buckets": [
{
"key": 30,
"doc_count": 1
},
{
"key": 40,
"doc_count": 3
},
{
"key": 50,
"doc_count": 0
},
{
"key": 60,
"doc_count": 0
},
{
"key": 70,
"doc_count": 1
},
{
"key": 80,
"doc_count": 0
},
{
"key": 90,
"doc_count": 0
}
]
}
}
1.extended_bounds 只有在min_doc_count 为0时才有意义,因为想展示的范围可以文档数为0 2.extended_bounds 不会对分组产生影响。例如上例即使extended_bounds.min 比文档多个key范围还要大,亦然会展示对应的分组。
- 偏移量:默认偏移量是0,也是可以自定义初始化偏移量。
GET user/_search
{
"size": 0,
"aggs" : {
"ages" : {
"histogram" : {
"offset" : 3,
"field" : "age",
"interval" : 10
}
}
}
}
按照key排序
"order" : { "_key" : "desc" }
按照文档数排序
"order" : { "_count" : "desc" }
按照子聚合(数值型)的结果排序
POST /sales/_search?size=0
{
"aggs" : {
"prices" : {
"histogram" : {
"field" : "price",
"interval" : 50,
"order" : { "price_stats.min" : "asc" }
},
"aggs" : {
"price_stats" : { "stats" : {"field" : "price"} }
}
}
}
}
4.Range Aggregation
范围聚合:可以定义范围,不像Histogram Aggregation 那么固化 用from和to确定每个范围边界,[from,to)
{
"aggs" : {
"price_ranges" : {
"range" : {
"field" : "price",
"ranges" : [
{ "to" : 50 },
{ "from" : 50, "to" : 100 },
{ "from" : 100 }
]
}
}
}
}
响应结果:
"aggregations": {
"price_ranges" : {
"buckets": [
{
"to": 50,
"doc_count": 2
},
{
"from": 50,
"to": 100,
"doc_count": 4
},
{
"from": 100,
"doc_count": 4
}
]
}
}
5.Missing Aggregation
空值聚合:把字段为空值和null聚合。
POST /sales/_search?size=0
{
"aggs" : {
"products_without_a_price" : {
"missing" : { "field" : "price" }
}
}
}
响应结果:
{
...
"aggregations" : {
"products_without_a_price" : {
"doc_count" : 100
}
}
}
6.Nested Aggregation
嵌套聚合:可以聚合嵌套的文档,其实就是可以提供嵌套的路径就可以了。 例如下面是一个嵌套索引
{
...
"product" : {
"properties" : {
"resellers" : {
"type" : "nested",
"properties" : {
"name" : { "type" : "text" },
"price" : { "type" : "double" }
}
}
}
}
}
下面是一个找到led tv最低价格的聚合。
{
"query" : {
"match" : { "name" : "led tv" }
},
"aggs" : {
"resellers" : {
"nested" : {
"path" : "resellers"
},
"aggs" : {
"min_price" : { "min" : { "field" : "resellers.price" } }
}
}
}
}
响应结果:
{
"aggregations": {
"resellers": {
"min_price": {
"value" : 350
}
}
}
}
|