Great Architect & Artist :: 20. Exploring Your Data

[Executing Aggregations]

Aggregations는 데이터를 group하고 통계를 추출하는 기능을 제공한다. Aggregation에 대해서는 SQL GROUP BY와 SQL aggregate function과 거의 같다고 생각하는 것이 가장 쉽다. Elasticsearch에서는 하나의 response 안에 hit를 리턴하는 검색과 hit별로 분리된 aggregated result를 동시에 실행하는 기능이 있다. 이것은 굉장히 강력하고 효율적인 기능이다. 여러분은 query와 multiple aggregation을 실행하여 간결하고 간단한 API를 통해서 네트워크의 빈번한 호출을 피하고, 한번에 두개 혹은 하나의 결과를 받아볼 수 있다.

다음 에제는 상태별로 모든 account를 group하고, count 내림차순 정렬된 결과 중에서 top 10개의 state를 리턴하는 것을 보여준다.

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"size": 0,
"aggs": {
   "group_by_state": {
   "terms": {
   "field": "state"
   }
   }
}
}'

SQL에서는 위의 aggregation 예제는 다음 개념과 유사하다.

SELECT COUNT(*) from bank GROUP BY state ORDER BY COUNT(*) DESC

응답 결과의 부분적인 내용은 다음과 같다.

"hits" : {
   "total" : 1000,
   "max_score" : 0.0,
   "hits" : [ ]
},
"aggregations" : {
   "group_by_state" : {
   "buckets" : [ {
   "key" : "al",
   "doc_count" : 21
   }, {
   "key" : "tx",
   "doc_count" : 17
   }, {
   "key" : "id",
   "doc_count" : 15
   }, {
   "key" : "ma",
   "doc_count" : 15
   }, {
   "key" : "md",
   "doc_count" : 15
   }, {
   "key" : "pa",
   "doc_count" : 15
   }, {
   "key" : "dc",
   "doc_count" : 14
   }, {
   "key" : "me",
   "doc_count" : 14
   }, {
   "key" : "mo",
   "doc_count" : 14
   }, {
   "key" : "nd",
   "doc_count" : 14
   } ]
   }
}
}

위에서 AL(abama)에 21개의 account, 그 다음으로 TX에 17개의 account, ID에 15개 등등이 있음을 알 수 있다. size=0으로 설정하면 response에서 aggregation 결과만 보고 싶은 경우이기 때문에 search hit을 보여주지 않는다.

다음 예제는 state별로 account balance의 평균을 계산하는 것을 보여준다.

(역시, 내림차순 정렬하여 top 10개만 출력한다.)

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"size": 0,
"aggs": {
   "group_by_state": {
   "terms": {
   "field": "state"
   },
   "aggs": {
   "average_balance": {
   "avg": {
   "field": "balance"
   }
   }
   }
   }
}
}'

어떻게 group_by_state aggregation내에 average_balance aggregation을 연결했는지 주목하라. 모든 aggregation은 이와 같은 공통된 pattern을 가지고 있다. 여러분이 데이터로부터 구하고자 하는 독자적인 요약 정보를 추출하는 aggregation내에 연결된 aggregation을 가질 수 있다. 앞에서 수행한 aggregation에서 내림차순으로 average balance를 정렬해 보자.

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"size": 0,
"aggs": {
   "group_by_state": {
   "terms": {
   "field": "state",
   "order": {
   "average_balance": "desc"
   }
   },
   "aggs": {
   "average_balance": {
   "avg": {
   "field": "balance"
   }
   }
   }
   }
}
}'

다음 예제는 어떻게 우리가 연령대별로, 성별에 따라 group할 수 있는지, 마지막으로 어떻게 연령대별, 성별에 따른 average account balance를 구할 수 있는지를 보여준다.

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"size": 0,
"aggs": {
   "group_by_age": {
   "range": {
   "field": "age",
   "ranges": [
   {
   "from": 20,
   "to": 30
   },
   {
   "from": 30,
   "to": 40
   },
   {
   "from": 40,
   "to": 50
   }
   ]
   },
   "aggs": {
   "group_by_gender": {
   "terms": {
   "field": "gender"
   },
   "aggs": {
   "average_balance": {
   "avg": {
   "field": "balance"
   }
   }
   }
   }
   }
   }
}
}'

여기에서 우리가 상세히 다루지 않은 다른 많은 aggregation 기능이 있다. Aggregation reference guide (http://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) 를 통해서 더 많은 내용을 알 수 있다.

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

20. Exploring Your Data - Executing Aggregations

댓글+트랙백 RSS :: http://www.yongbi.net/rss/response/704

트랙백 주소 :: http://www.yongbi.net/trackback/704

트랙백 RSS :: http://www.yongbi.net/rss/trackback/704

댓글을 달아 주세요

블로거

카테고리

태그목록

최근에 올라온 글

Great Architect & Artist - 최근 글

달력