天天看點

MongoDB mapReduce使用

MongoDB的MapReduce相當于Mysql中的group

使用MapReduce要實作兩個函數 Map Function 和 Reduce Function

在調用mapReduce時需要用到這兩個函數

?

1

>db.things.mapReduce(Map Function, Reduce Function, [output | option])

Map Function 調用emit(key, value),周遊collection中所有的記錄,将key與value傳遞給Reduce Function進行處理

collection things中有如下記錄

?

1 2 3 4 5

> db.things.

find

()

{

"_id"

: 1,

"tags"

: [

"dog"

,

"cat"

] }

{

"_id"

: 2,

"tags"

: [

"cat"

] }

{

"_id"

: 3,

"tags"

: [

"mouse"

,

"cat"

,

"dog"

] }

{

"_id"

: 4,

"tags"

: [ ] }

Map Function

?

1 2 3 4 5

>m =

function

() {

...    this.tags.forEach(

function

(z) {

...        emit(z, {count:1});

...    });

...  }

m函數掃描每條記錄的tags,将tags的每個元素如:“dog",“cat”……作為key,{count : 1}作為value,如:{"dog", { count : 1}},{"cat", { count : 1}},将這些{ key : value}(注: 經過聚集的)傳遞給Reduce Function

?

1 2 3 4 5 6

> r =

function

(key, values) {

...     var total = 0;

...     

for

(var i = 0; i < values.length; i++)

...         total += values[i].count;

...     

return

{count : total};

...   };

r函數統計每個tag的個數,r的傳回結果要與emit函數的value格式一緻(官方文檔說如果不一緻的話,bug很難調試) 。r函數調用的方式如下:

?

1 2 3

r(

"cat"

, [ { count : 1 }, { count : 1 }, { count : 1} ] );

r(

"dog"

, [ { count : 1 }, { count : 1 } ] );

r(

"mouse"

, [ { count : 1 } ]);

執行mapReduce()

?

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

>res = db.things.mapReduce(m,r, {

out

: {replace :

'things_reduce'

}});

{

"result"

:

"things_reduce"

,

"timeMillis"

: 4,

"counts"

: {

"input"

: 4,

"emit"

: 6,

"output"

: 3

},

"ok"

: 1,

}

>db[res.result].find()

{

"_id"

:

"cat"

,

"value"

: {

"count"

: 3 } }

{

"_id"

:

"dog"

,

"value"

: {

"count"

: 2 } }

{

"_id"

:

"mouse"

,

"value"

: {

"count"

: 1 } }

在文檔中output選項是可選的,但在操作過程中發現,沒有{out : {replace : 'things_reduce'}}會報錯。

db.collection.mapReduce(mapfunction,reducefunction[,options]);

輸出選項結構如下:

{ "out", option }

option可以是下面幾個選項:

  • "collection name" – mapReduce的輸出結果會替換掉原來的collection,collection不存在則建立
  • { replace : "collection name" } – 同上
  • { merge : "collection name" } – 将新老資料進行合并,新的替換舊的,沒有的添加進去
  • { reduce : "collection name" } – 存在老資料時,在原來基礎上加新資料(即 new value = old value + mapReduce value)
  • { inline : 1 } – 不會建立collection,結果儲存在記憶體裡,隻限于結果小于16MB的情況

如果用collection name作option不能與其它option一起使用,其它則可以,如:

{ "out", { replace : "collection name", db : "db name" } }

PS: 還有哪些選項,不清楚,沒在文檔裡看到,以後再補。

非 { inline : 1 }的情況,會建立一個名為collection name的collection

?

1 2 3 4 5 6 7 8

> show collections

system.indexes

things

things_reduce

> db.things_reduce.

find

()

{

"_id"

:

"cat"

,

"value"

: {

"count"

: 3 } }

{

"_id"

:

"dog"

,

"value"

: {

"count"

: 2 } }

{

"_id"

:

"mouse"

,

"value"

: {

"count"

: 1 } }

另一個例子:

?

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

> db.foo.

find

()

{

"_id"

: ObjectId(

"4da54867beb0fbf627f15179"

),

"username"

:

"jones"

,

"likes"

: 20,

"text"

:

"Hello world!"

}

{

"_id"

: ObjectId(

"4da560e0beb0fbf627f1517a"

),

"username"

:

"jones"

,

"likes"

: 5,

"text"

:

"Hello world aaaaaaaaaa!"

}

{

"_id"

: ObjectId(

"4da560fdbeb0fbf627f1517b"

),

"username"

:

"chy"

,

"likes"

: 15,

"text"

:

"Hello world bbbbbbbbbb!"

}

> m

function

() {

emit(this.username, {count:1, likes:this.likes});

}

> f

function

(key, values) {

var result = {count:0, likes:0};

values.forEach(

function

(value) {result.count += value.count;result.likes += value.likes;});

return

result;

}

> res = db.foo.mapReduce(m, f, {out: {replace:

"test_result"

}});

{

"result"

:

"test_result"

,

"timeMillis"

: 4,

"counts"

: {

"input"

: 3,

"emit"

: 3,

"output"

: 2

},

"ok"

: 1,

}

> db.test_result.

find

()

{

"_id"

:

"chy"

,

"value"

: {

"count"

: 1,

"likes"

: 15 } }

{

"_id"

:

"jones"

,

"value"

: {

"count"

: 2,

"likes"

: 25 } }

将{out: {replace: "test_result"}}改為{out: {reduce: "test_result"}}的話,可以看到沒運作一次res = db.foo.mapReduce(m, f, {out: {replace: "test_result"}});結果就會增加,如:

?

1 2 3

> db.test_result.

find

()

{

"_id"

:

"jones"

,

"value"

: {

"count"

: 5,

"likes"

: 70 } }

{

"_id"

:

"chy"

,

"value"

: {

"count"

: 2,

"likes"

: 30 } }

繼續閱讀