概念
- 桶(Buckets)
- 满足特定条件的文档的集合;(类似 SQL 中的 group by)
- 指标(Metrics)
- 对桶内的文档进行统计计算;(类似 SQL 中的统计函数 COUNT() 、 SUM() 、 MAX() 等等)
分类
聚合分析的功能主要有:
- 指标聚合
- 桶聚合
- 管道聚合
- 矩阵聚合
指标聚合
对一组数据进行统计,例如:求最大值、最小值、计算总数、求平均值、求和等等; 类似 SQL 中的 max、min、count、avg、sum 等统计函数;
数据准备
PUT /books
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"price": {
"type": "float"
},
"type": {
"type": "text",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
POST /books/_doc/_bulk
{"index":{"_id":1}}
{"name":"C语言编程","price":23.5,"type":"c"}
{"index":{"_id":2}}
{"name":"数据结构与算法","price":34.5,"type":"ideas"}
{"index":{"_id":3}}
{"name":"计算机组成原理","price":34.5,"type":"Computer"}
{"index":{"_id":4}}
{"name":"计算机网络","price":32.5,"type":"Computer"}
{"index":{"_id":5}}
{"name":"计算机操作系统","price":44.5,"type":"Computer"}
{"index":{"_id":6}}
{"name":"Java 编程","price":13.5,"type":"java"}
{"index":{"_id":7}}
{"name":"数据库原理","price":36.0,"type":"Database"}
{"index":{"_id":8}}
{"name":"ElasticSearch搜索引擎","price":34.8,"type":"search_engine"}
{"index":{"_id":9}}
{"name":"Lucene 原理","price":29.8,"type":"search_engine"}
{"index":{"_id":10}}
{"name":"JVM 技术","price":34.8,"type":"java"}
{"index":{"_id":11}}
{"name":"设计模式","price":27.8,"type":"ideas"}
max 统计最大值
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"max": {
"field": "price"
}
}
}
}
{
"took": 20,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"value": 44.5
}
}
}
min 统计最小值
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"min": {
"field": "price"
}
}
}
}
value_count 统计文档数量
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"value_count": {
"field": "price"
}
}
}
}
cardinality 基数统计(统计去重后的文档数量)
类似 SQL 中的 select count(distinct price) from books
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"cardinality": {
"field": "price"
}
}
}
}
avg 计算平均值
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"avg": {
"field": "price"
}
}
}
}
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"value": 31.472726995294746
}
}
}
sum 计算总和
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"sum": {
"field": "price"
}
}
}
}
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"value": 346.1999969482422
}
}
}
这里发现一个小问题,手动计算总和应为 346.2 ;这里为 346.1999969482422 ;猜测应该是 Java 中关于小数二进制保存不准确导致的;
stats 基本统计
一次性返回总数,最大值,最小值,平均值,总和的结果
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"stats": {
"field": "price"
}
}
}
}
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"count": 11,
"min": 13.5,
"max": 44.5,
"avg": 31.472726995294746,
"sum": 346.1999969482422
}
}
}
extended_stats 高级统计
包含基本统计的结果,另外还会统计:平方和,方差,标准差,平均值加减两个标准差的区间
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"extended_stats": {
"field": "price"
}
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"count": 11,
"min": 13.5,
"max": 44.5,
"avg": 31.472726995294746,
"sum": 346.1999969482422,
"sum_of_squares": 11530.459805908205,
"variance": 57.691074198573254,
"std_deviation": 7.595464054195323,
"std_deviation_bounds": {
"upper": 46.66365510368539,
"lower": 16.2817988869041
}
}
}
}
percentiles 百分位统计
百分位数
是一个统计术语,如果将一组数据从小到大排序,并计算相应的累计百分数
,某一百分位所对应数据的值
就称为这一百分位的百分位数
。
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"percentiles": {
"field": "price"
}
}
}
}
{
"took": 24,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"values": {
"1.0": 13.500000000000002,
"5.0": 14,
"25.0": 28.299999237060547,
"50.0": 34.5,
"75.0": 34.79999923706055,
"95.0": 44.074999999999996,
"99.0": 44.5
}
}
}
}
桶聚合
当聚合开始被执行,每个文档里面的值通过计算来决定符合哪个桶的条件。如果匹配到,文档将放入相应的桶并接着进行聚合操作。
terms 分组聚合
类似 select count(*) from books group by price
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"terms": {
"field": "type"
}
}
}
}
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "computer",
"doc_count": 3
},
{
"key": "ideas",
"doc_count": 2
},
{
"key": "java",
"doc_count": 2
},
{
"key": "search_engine",
"doc_count": 2
},
{
"key": "c",
"doc_count": 1
},
{
"key": "database",
"doc_count": 1
}
]
}
}
}
精彩的来了,桶聚合与指标聚合可以结合使用,更加丰富了聚合分析的功能
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"terms": {
"field": "type"
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
},
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "computer",
"doc_count": 3,
"avg_price": {
"value": 37.166666666666664
},
"sum_price": {
"value": 111.5
}
},
{
"key": "ideas",
"doc_count": 2,
"avg_price": {
"value": 31.149999618530273
},
"sum_price": {
"value": 62.29999923706055
}
},
{
"key": "java",
"doc_count": 2,
"avg_price": {
"value": 24.149999618530273
},
"sum_price": {
"value": 48.29999923706055
}
},
{
"key": "search_engine",
"doc_count": 2,
"avg_price": {
"value": 32.29999923706055
},
"sum_price": {
"value": 64.5999984741211
}
},
{
"key": "c",
"doc_count": 1,
"avg_price": {
"value": 23.5
},
"sum_price": {
"value": 23.5
}
},
{
"key": "database",
"doc_count": 1,
"avg_price": {
"value": 36
},
"sum_price": {
"value": 36
}
}
]
}
}
}
filter 过滤器聚合
把符合条件的文档放到一个桶
里进行统计相关指标;
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"filter": {
"match": {
"name": "java"
}
}
}
}
}
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"doc_count": 1
}
}
}
filters 多过滤器聚合
把符合多个过滤器的文档分到不同的桶里进行统计
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"filters": {
"filters": [
{
"match": {
"name": "java"
}
},
{
"match": {
"name": "c"
}
}
]
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"buckets": [
{
"doc_count": 1
},
{
"doc_count": 1
}
]
}
}
}
missing 空值聚合
把索引中的缺失字段的文档分到一个桶里,类似 select count(*) from books where filedA is null
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"missing": {
"field": "price"
}
}
}
}
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"doc_count": 0
}
}
}
组合使用案例1
GET books/_search
{
"size": 0,
"aggs": {
"missing_result": {
"missing": {
"field": "price"
}
},
"sum_result": {
"sum": {
"field": "price"
}
}
}
}