Mongodb聚合查询介绍

一，mongodb聚合介绍

db.collection.aggregate()是基于数据处理的聚合管道，每个文档通过一个由多个阶段（stage）组成的管道，可以对每个阶段的管道进行分组、过滤等功能，然后经过一系列的处理，输出相应的结果。

通过这张图，可以了解Aggregate处理的过程。

Mongodb聚合查询介绍

1、db.collection.aggregate() 可以用多个构件创建一个管道，对于一连串的文档进行处理。这些构件包括：筛选操作的match、映射操作的project、分组操作的group、排序操作的sort、限制操作的limit、和跳过操作的skip。

2、db.collection.aggregate()使用了MongoDB内置的原生操作，聚合效率非常高,支持类似于SQL Group By操作的功能，而不再需要用户编写自定义的JavaScript例程。

3、每个阶段管道限制为100MB的内存。如果一个节点管道超过这个极限,MongoDB将产生一个错误。为了能够在处理大型数据集,可以设置allowDiskUse为true来在聚合管道节点把数据写入临时文件。这样就可以解决100MB的内存的限制。

4、db.collection.aggregate()可以作用在分片集合，但结果不能输在分片集合，MapReduce可以作用在分片集合，结果也可以输在分片集合。

5、db.collection.aggregate()方法可以返回一个指针（cursor），数据放在内存中，直接操作。跟Mongo shell 一样指针操作。

6、db.collection.aggregate()输出的结果只能保存在一个文档中，BSON Document大小限制为16M。可以通过返回指针解决，版本2.6中后面：DB.collect.aggregate()方法返回一个指针，可以返回任何结果集的大小。

二，mongodb和mysql的聚合的参照

为了便于理解，先将常见的mongo的聚合操作和mysql的查询做下类比：

Mongodb聚合查询介绍

下面举了一些常用的mongo聚合例子和mysql对比，假设有一条如下的数据库记录（表名：orders）作为例子：

{
  cust_id: "abc123",
  ord_date: ISODate("2012-11-02T17:04:11.102Z"),
  status: 'A',
  price: 50,
  items: [ { sku: "xxx", qty: 25, price: 1 },
           { sku: "yyy", qty: 25, price: 1 } ]
}

1. 统计orders表所有记录

db.orders.aggregate( [
   {
     $group: {
        _id: null,
        count: { $sum: 1 }
     }
   }
] )

类似mysql:
SELECT COUNT(*) AS count   FROM orders

2.对orders表计算所有price求和

db.orders.aggregate( [
   {
     $group: {
        _id: null,
        total: { $sum: "$price" }
     }
   }
] )

类似mysql;
SELECT SUM(price) AS total  FROM orders

3.对每一个唯一的cust_id, 计算price总和

db.orders.aggregate( [
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   }
] )

类似mysql:
SELECT cust_id,
       SUM(price) AS total
FROM orders
GROUP BY cust_id

4.对每一个唯一对cust_id和ord_date分组，计算price总和，不包括日期的时间部分

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        },
        total: { $sum: "$price" }
     }
   }
] )

类似mysql：
SELECT cust_id,
       ord_date,
       SUM(price) AS total
FROM orders
GROUP BY cust_id,
         ord_date

5.对于有多个记录的cust_id，返回cust_id和对应的数量

db.orders.aggregate( [
   {
     $group: {
        _id: "$cust_id",
        count: { $sum: 1 }
     }
   },
   { $match: { count: { $gt: 1 } } }
] )

类似mysql:
SELECT cust_id,
       count(*)
FROM orders
GROUP BY cust_id
HAVING count(*) > 1

6.对每个唯一的cust_id和ord_date分组，计算价格总和，并只返回price总和大于250的记录，且排除日期的时间部分

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        },
        total: { $sum: "$price" }
     }
   },
   { $match: { total: { $gt: 250 } } }
] )

类似mysql:
SELECT cust_id,
       ord_date,
       SUM(price) AS total
FROM orders
GROUP BY cust_id,
         ord_date
HAVING total > 250

7.对每个唯一的cust_id且status=A，计算price总和

db.orders.aggregate( [
   { $match: { status: 'A' } },
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   }
] )

类似mysql:
SELECT cust_id,
       SUM(price) as total
FROM orders
WHERE status = 'A'
GROUP BY cust_id

8.对每个唯一的cust_id且status=A，计算price总和并且只返回price总和大于250的记录

db.orders.aggregate( [
   { $match: { status: 'A' } },
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   },
   { $match: { total: { $gt: 250 } } }
] )

类似mysql:

SELECT cust_id,
       SUM(price) as total
FROM orders
WHERE status = 'A'
GROUP BY cust_id
HAVING total > 250

9.对于每个唯一的cust_id，将与orders相关联的相应订单项order_lineitem的qty字段进行总计

db.orders.aggregate( [
   { $unwind: "$items" },
   {
     $group: {
        _id: "$cust_id",
        qty: { $sum: "$items.qty" }
     }
   }
] )

类似mysql:
SELECT cust_id,
       SUM(li.qty) as qty
FROM orders o,
     order_lineitem li
WHERE li.order_id = o.id
GROUP BY cust_id

10.统计不同cust_id和ord_date分组的数量，排除日期的时间部分

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        }
     }
   },
   {
     $group: {
        _id: null,
        count: { $sum: 1 }
     }
   }
] )

类似mysql:
SELECT COUNT(*)
FROM (SELECT cust_id, ord_date
      FROM orders
      GROUP BY cust_id, ord_date)
      as DerivedTable

Mongodb聚合查询介绍

继续阅读

windows 解压安装Mogdb

mongoDB 数据库中导入导出数据的方法

django 使用mongodb数据库 celery并发自增字段

mongodb导出导入数据库

MYSQL数据库数据导入mongoDB

windows导出导入mongodb数据库

解决MongoDB官网下载速度过慢Mongo DB官网：解决：

探花交友_第2章-完善个人信息与MongoDB入门

启动mongo warning

MongoDB学习笔记一ID自增长

303、副本集如何扩容

Mongodb4.0操作命令

Error: couldn‘t connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error

Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Erro

couldn‘t connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error conne

Ubuntu14.04 LTS下安装mongodb