聚合查询介绍

    elasticsearch1.0带来了改进和新功能,更包括备受期待的框架,它赋予了elasticsearch新的定位:全功能分析引擎。现在.你可以使用elasticsearch作为系统的一个关键部分。处理大量的数据,提取结论,并将这些数据可视化为可读的方式。

    聚合可以分成两组:度量聚合和桶聚合。

date_histogram

    时间区间的柱形图,Date histogram的用法与histogram差不多,只不过区间上支持了日期的表达式。

date_histogram属性介绍

{ "aggs":{ "articles_over_time":{ "date_histogram":{ "field":"date", "interval":"month" } } } }

interval字段支持多种关键字:year, quarter, month, week, day, hour, minute, second

当然也支持对这些关键字进行扩展使用,比如一个半小时可以定义成如下:

{
"aggs":{
"articles_over_time":{
"date_histogram":{
"field":"date",
"interval":"1.5h"
             }
         }
    }
}

format

返回的结果可以通过设置format进行格式化:

{
"aggs":{
"articles_over_time":{
"date_histogram":{
"field":"date",
"interval":"1M",
"format":"yyyy-MM-dd"
        }
    }
  }
}

得到的结果如下:

{
"aggregations":{
"articles_over_time":{
"buckets":[{
"key_as_string":"2013-02-02",
"key":1328140800000,
"doc_count":1
},{
"key_as_string":"2013-03-02",
"key":1330646400000,
"doc_count":2
},
...
]}
}
}

其中key_as_string是格式化后的日期,key显示了是日期时间戳

time_zone时区的用法

在es中日期支持时区的表示方法,这样就相当于东八区的时间。

{
"aggs":{
"by_day":{
"date_histogram":{
"field":"date",
"interval":"day",
"time_zone":"+08:00"
       }
    }
  }
}

offset

offset 使用偏移值,改变时间区间
默认情况是从凌晨0点到午夜24:00,如果想改变时间区间,可以通过下面的方式,设置偏移值:

{"aggs":{
"by_day":{
"date_histogram":{
"field":"date",
"interval":"day",
"offset":"+6h"
 }
   }
  }
}

那么桶的区间就改变为:

"aggregations":{
    "by_day":{
    "buckets":[{
    "key_as_string":"2015-09-30T06:00:00.000Z",
    "key":1443592800000,
    "doc_count":1
    },{
       "key_as_string":"2015-10-01T06:00:00.000Z",
       "key":1443679200000,
       "doc_count":1
      }]
     }
 }

missing value

Missing Value缺省字段
当遇到没有值的字段,就会按照缺省字段missing value来计算:

{
"aggs":{
"publish_date":{
"date_histogram":{
"field":"publish_date",
"interval":"year",
"missing":"2000-01-01"
      }
    }
  }
}

产品相关统计应用

数据准备

curl -XPOST http://study0:9200/cars/transactions/_bulk -d '
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

备注:我们会建立一个也许对汽车交易商有所用处的聚合。数据是关于汽车交易的:汽车型号,制造商,销售价格,销售时间以及一些其他的相关数据

统计每个月文档数

按时间统计,每个月的文档数

GET /cars/transactions/_search
{
  "aggs": {
"doc_count": {
  "date_histogram": {
"field": "sold",
"interval": "month"
  }
}
  }
}

返回结果:

"aggregations" : {
"articles_over_time" : {
  "buckets" : [
{
  "key_as_string" : "2014-01-01T00:00:00.000Z",
  "key" : 1388534400000,
  "doc_count" : 1
},
{
  "key_as_string" : "2014-02-01T00:00:00.000Z",
  "key" : 1391212800000,
  "doc_count" : 1
},

histogram

统计价格区间柱形图(Histogram Aggregation)

Histogram做等间距划分,统计区间的price值,看他落在那个区间,数据间隔是5000

GET /cars/transactions/_search
{
   "size": 0, 
"aggs" : {
"prices" : {
"histogram" : {
"field" : "price",
"interval" : 5000
      }
    }
  }
}  

返回结果

"aggregations" : {
"prices" : {
  "buckets" : [
{
  "key" : 10000.0,
  "doc_count" : 2
},
{
  "key" : 15000.0,
  "doc_count" : 1
},

非数值类型的文档数统计

统计非数值类型的字段,类似sql中的select count(*) from table group by field

如查看每种颜色的销量

GET /cars/transactions/_search
{
  "size": 0, 
"aggs" : {
"genres" : {
"terms" : { "field" : "color" }
    }
  }
}

说明:genres表示的是统计结果名,可以是任意合法字符串

注意:会报如下错:

“reason” : “Fielddata is disabled on text fields by default. Set fielddata=true on [color] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.”

提示我们数据类型不对,我们修改一下mapping映射:

GET /cars/_mapping/transactions
{
   "properties": {
 "color": {
   "type": "text",
   "fielddata": true
  }
}
 }
{"acknowledged":true}

统计结果

   "buckets" : [
{
  "key" : "red",
  "doc_count" : 4
},
{
  "key" : "blue",
  "doc_count" : 2
},
{
  "key" : "green",
  "doc_count" : 2
}

Metric

添加一个指标(Metric),相当于sql中的select count(*),avg(vi),max(vi)…from table group by field

GET /cars/transactions/_search
{
  "size": 0, 
"aggs" : {
"genres" : {
"terms" : { "field" : "color" },
  "aggs": {
 "avg_price": {
"avg": {
  "field": "price"
  }
}
  }
   }
 }
}

注意:avg可以换成max,min,sum等。用stats就表示所有。

用stats找出Metric的所有值。

GET /cars/transactions/_search
{
  "size": 0, 
"aggs" : {
"genres" : {
"terms" : { "field" : "color" }
,
  "aggs": {
 "avg_price": {
"stats": {
  "field": "price"
  }
}
  }
   }
 }
}

返回结果

"buckets" : [
{
  "key" : "red",
  "doc_count" : 4,
  "avg_price" : {
"count" : 4,
"min" : 10000.0,
"max" : 80000.0,
"avg" : 32500.0,
"sum" : 130000.0
  }
}