天天看點

IX MongoDB

NoSQL,not only sql;

bigdata大資料存儲問題:

并行資料庫(水準切分;分區查詢);

NoSQL資料庫管理系統:

非關系模型;

分布式;

不支援ACID資料庫設計範式;

簡單資料模型;

中繼資料和資料分離;

弱一緻性;

高吞吐量;

高水準擴充能力和低端硬體叢集;

代表(clustrix;GenieDB;ScaleBase;NimbusDB;Drizzle);

雲資料管理(DBaaS);

大資料的分析處理(MapReduce);

CAP:

<a href="https://s2.51cto.com/wyfs02/M00/9B/16/wKioL1ld8SCRS68PAAAqhi4Nf9Q333.jpg" target="_blank"></a>

ACID vs BASE:

atomicity basically available

consistency soft state

isolation eventually

durability consistent

注:

最終一緻性細分:

因果一緻性;

讀自己寫一緻性;

會話一緻性;

單調讀一緻性;

時間軸一緻性;

<a href="https://s1.51cto.com/wyfs02/M00/9B/16/wKioL1ld8TjxdeHAAACjTCrzz0s626.jpg" target="_blank"></a>

ACID(強一緻性;隔離性;采用悲觀保守的方法,難以變化);

BASE(弱一緻性;可用性優先;采用樂觀的方法,适用變化,更簡單、更快);

資料一緻性的實作技術:

NRW政策(

N,number,所有副本數;

R,read,完成r副本數;

W,write,完成w副本數;

R+W&gt;N(強一緻);

R+W&lt;N(弱一緻);

);

2PC;

PaxOS;

Vector clock(向量時鐘);

http://www.nosql-database.org/

資料存儲模型(各流派根據此模型來劃分):

列式存儲模型(

應用場景:在分布式FS之上,支援随機rw的分布式資料存儲;

典型産品:HBase、Hypertable、Cassandra;

資料模型:以列為中心存儲,将同一列資料存儲在一起;

優點:快速查詢,高可擴充性,易于實作分布式擴充;

文檔模型(

應用場景:非強事務要求的web應用;

典型産品:MongoDB、ElasticSearch、CouchDB、CouchDB Server);

資料模型:鍵值模型,存儲為文檔;

優點:資料模型無須事先定義;

鍵值模型(

應用場景:内容緩存,用于大量并行資料通路的高負載場景;

典型産品:redis、DynamoDB(amazon)、Azure、Table Storage、Riak;

資料模型:基于hash表實作的key-value;

優點:查詢迅速;

圖式模型(

應用場景:社交網絡、推薦系統、關系圖譜;

典型産品:Neo4j、Infinite Graph;

資料模型:圖式結構;

優點:适用于圖式計算場景;

introduction mongodb:

MongoDB(from humongous) is a scalable,high-performance, open source, schema free , document nosql oriented database;

<a href="https://s5.51cto.com/wyfs02/M01/9B/16/wKiom1ld8YaDW4sYAABOc-7cu-o600.jpg" target="_blank"></a>

<a href="https://s1.51cto.com/wyfs02/M01/9B/16/wKioL1ld8ZST3J5VAABTSy1PakM760.jpg" target="_blank"></a>

what is MongoDB?

humongous(huge+monstrous);

document database(document oriented databaase: uses JSON(BSON actually);

schema free;

C++;

open source;

GNU AGPL V3.0 License;

OSX,Linux,windows,solaris|32bi,64bit;

development and support by 10gen and was first released in February 2009;

NoSQL;

performance(written C++,full index support,no transactions(has atomic operations)支援原子事務, memory-mapped files(delayed writes));

saclable(replication,auto-sharding);

commercially supported(10gen): lots of documentation;

document-based queries:

flexible document queries expressed in JSON/Javascript;

Map/Reduce:(flexible aggregation and data processing; queries run in parallel on all shards);

gridFS(store files of any size easily);

geospatial indexing(find object based on location(i.e.find closes n items to x));

many production deloyments;

features:

collection oriented storage: easy storageof object/JSON-style data;

dynamic queries;

full index support,including on inner objects and embedded arrays;

query profiling;

replication and fail-over support;

efficient storage of binary data including large objects(e.g. photos and videos);

auto-sharding for cloud-level scalability(currently in alpha);

great for(websites; caching; high volument;low value; high scalability; storage of program objects and json);

not as great for(highly transactional;Ad-hoc business intelligence; problems requiring SQL);

JSON,java script object notation,名稱/值對象的集合;值的有序清單;

DDL;

DML

https://www.mongodb.com/

http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/RPMS/

mongodb-org-mongos-2.6.4-1.x86_64.rpm

mongodb-org-server-2.6.4-1.x86_64.rpm

mongodb-org-shell-2.6.4-1.x86_64.rpm

mongodb-org-tools-2.6.4-1.x86_64.rpm

]# yum -y install mongodb-org-server-2.6.4-1.x86_64.rpm mongodb-org-shell-2.6.4-1.x86_64.rpmmongodb-org-tools-2.6.4-1.x86_64.rpm

]# rpm -qi mongodb-org-server   #(This package contains the MongoDB server software, default configuration files, and init.d scripts.)

&lt;!--Description :

MongoDB is built for scalability,performance and high availability, scaling from single server deployments tolarge, complex multi-site architectures. By leveraging in-memory computing,MongoDB provides high performance for both reads and writes. MongoDB’s nativereplication and automated failover enable enterprise-grade reliability andoperational flexibility.

MongoDB is an open-source database used bycompanies of all sizes, across all industries and for a wide variety ofapplications. It is an agile database that allows schemas to change quickly asapplications evolve, while still providing the functionality developers expectfrom traditional databases, such as secondary indexes, a full query languageand strict consistency.

MongoDB has a rich client ecosystemincluding hadoop integration, officially supported drivers for 10 programminglanguages and environments, as well as 40 drivers supported by the usercommunity.

MongoDB features:

* JSON Data Model with Dynamic Schemas

* Auto-Sharding for Horizontal Scalability

* Built-In Replication for HighAvailability

* Rich Secondary Indexes, includinggeospatial

* TTL indexes

* Text Search

* Aggregation Framework &amp; NativeMapReduce

--&gt;

]# rpm -qi mongodb-org-shell   #(This package contains the mongo shell.)

]# rpm -qi mongodb-org-tools   #(This package contains standard utilities for interacting with MongoDB.)

]# mkdir -pv /mongodb/data

mkdir: created directory `/mongodb'

mkdir: created directory `/mongodb/data'

]# id mongod

uid=497(mongod) gid=497(mongod)groups=497(mongod)

]# chown -R mongod.mongod /mongodb/

]# vim /etc/mongod.conf

#dbpath=/var/lib/mongo

dbpath=/mongodb/data

bind_ip=172.168.101.239

#httpinterface=true

httpinterface=true   #28017

]# /etc/init.d/mongod start

Starting mongod: /usr/bin/dirname: extraoperand `2&gt;&amp;1.pid'

Try `/usr/bin/dirname --help' for moreinformation.

                                                          [  OK  ]

]# vim /etc/init.d/mongod   #(解決2&gt;&amp;1.pid問題)

 #daemon --user "$MONGO_USER" "$NUMACTL $mongod $OPTIONS&gt;/dev/null 2&gt;&amp;1"

 daemon --user "$MONGO_USER" "$NUMACTL $mongod $OPTIONS&gt;/dev/null"

]# /etc/init.d/mongod restart

Stopping mongod:                                          [  OK  ]

Starting mongod:                                          [  OK  ]

]# ss -tnl | egrep "27017|28017"

LISTEN    0      128               127.0.0.1:27017                    *:*    

LISTEN    0      128               127.0.0.1:28017                    *:*  

]# du -sh /mongodb/data

3.1G          /mongodb/data

]# ll -h !$

ll -h /mongodb/data

total 81M

drwxr-xr-x 2 mongod mongod 4.0K Jun 2916:23 journal

-rw------- 1 mongod mongod  64M Jun 29 16:23 local.0

-rw------- 1 mongod mongod  16M Jun 29 16:23 local.ns

-rwxr-xr-x 1 mongod mongod    6 Jun 29 16:23 mongod.lock

<a href="https://s2.51cto.com/wyfs02/M01/9B/17/wKiom1ld8xehuxVGAACY-8xrR_I255.jpg" target="_blank"></a>

]# mongo --host 172.168.101.239   #(預設在test庫)

MongoDB shell version: 2.6.4

connecting to: 172.168.101.239:27017/test

Welcome to the MongoDB shell.

……

&gt; show dbs

admin (empty)

local 0.078GB

&gt; show collections

&gt; show users

&gt; show logs

global

&gt; help

         db.help()                    help on db methods

         db.mycoll.help()             help on collection methods

         sh.help()                    sharding helpers

         rs.help()                    replica set helpers

         ……

&gt; db.help()

&gt; db.stats()   #(資料庫狀态)

{

         "db": "test",

         "collections": 0,

         "objects": 0,

         "avgObjSize": 0,

         "dataSize": 0,

         "storageSize": 0,

         "numExtents": 0,

         "indexes": 0,

         "indexSize": 0,

         "fileSize": 0,

         "dataFileVersion": {

         },

         "ok": 1

}

&gt; db.version()  

2.6.4

&gt; db.serverStatus()   #(mongodb資料庫伺服器狀态)

&gt; db.getCollectionNames()

[ ]

&gt; db.mycoll.help()

&gt; use testdb

switched to db testdb

test  (empty)

&gt; db.stats()

         "db": "testdb",

&gt; db.students.insert({name:"tom",age: 22})

WriteResult({ "nInserted" : 1 })

students

system.indexes

admin  (empty)

local  0.078GB

test   (empty)

testdb 0.078GB

&gt; db.students.stats()

         "ns": "testdb.students",

         "count": 1,

         "size": 112,

         "avgObjSize": 112,

         "storageSize": 8192,

         "numExtents": 1,

         "nindexes": 1,

         "lastExtentSize": 8192,

         "paddingFactor": 1,

         "systemFlags": 1,

         "userFlags": 1,

         "totalIndexSize": 8176,

         "indexSizes": {

                   "_id_": 8176

[ "students","system.indexes" ]

&gt; db.students.insert({name:"jerry",age: 40,gender: "M"})

&gt; db.students.insert({name:"ouyang",age:80,course: "hamagong"})

&gt; db.students.insert({name: "yangguo",age:20,course: "meivnquan"})

&gt; db.students.insert({name:"guojing",age: 30,course: "xianglong"})

&gt; db.students.find()

{ "_id" :ObjectId("5955b505853f9091c4bf4056"), "name" :"tom", "age" : 22 }

{ "_id" :ObjectId("595b3816853f9091c4bf4057"), "name" :"jerry", "age" : 40, "gender" : "M" }

{ "_id" :ObjectId("595b3835853f9091c4bf4058"), "name" :"ouyang", "age" : 80, "course" :"hamagong" }

{ "_id" :ObjectId("595b384f853f9091c4bf4059"), "name" :"yangguo", "age" : 20, "course" :"meivnquan" }

{ "_id" :ObjectId("595b3873853f9091c4bf405a"), "name" :"guojing", "age" : 30, "course" :"xianglong" }

&gt; db.students.count()

5

mongodb documents:

mongodb stores data in the form of documents,which are JSON-like field and value pairs;

documents are analogous to structures in programming languages that associate keys with values,where keys may hold other pairs of keys and values(e.g. dictionaries,hashes,maps,and associative arrays);

formally,mongodb documents are BSON documents,which is a binary representation of JSON with additional type information;

field:value;

document:

stored in collection,think record or row;

can have _id key that works like primarykey in MySQL;

tow options for relationships: subdocumentor db reference;

mongodb collections:

mongodb stores all documents in collections;

a collection is a group of related documents that have a set of shared common indexes;

collections are analogous to a table inrelational databases;

collection:

think table,but with no schema;

for grouping into smaller query sets(speed);

each top entity in your app would have its own collection(users,articles,etc.);

full index support;

database operations: query

in mongodb a query targets a specific collection of documents;

queries specify criteria,or conditions,that identify the documents that mongodb returns to the clients;

a query may include a projection that specifies the fields from the matchina documents to return;

you can optionally modify queries to impose limits,skips,and sort orders;

如:

db.users.find({age: {$gt: 18}}).sort({age:1}})

db.COLLECTION.find({FIELD: {QUERYCRITERIA}}).MODIFIER

query interface:

for query operations,mongodb provide adb.collection.find() method;

the method accepts both the query criteria and projections and returns a cursor to the matching documents

query behavior:

all queries in mongodb address a single collection;

you can modify the query to impose limits,skips,and sort orders;

the order of documents returned by a queryis not defined and is not necessarily consistent unless you specify a sort();

operations that modify existing documents(i.e. updates) use the same query syntax as queries to select documents to update;

in aggregation pipeline,the $match pipelinestage provides access to mongodb queries;

data modification:

data modification refers to operations that create,update,or delete data;

in mongodb,these operations modify the data of a single collection;

all write operations in mongodb are atomicon the level of a single document;

for the update and delete operations,you can specify the criteria to select the documents to update or remove;

RDBMS vs  mongodb

table,view        collection

row                     jsondocument

index                  index

join                     embedded

partition            shard

partition key    shard key

db.collection.find(&lt;query&gt;,&lt;projection&gt;)

類似于sql中的select語句,其中&lt;query&gt;相當于where子句,而&lt;projection&gt;相當于要標明的字段;

如果使用的find()方法中不包含&lt;query&gt;,則意味着要傳回對應collection的所有文檔;

db.collection.count()方法可統計指定collection中文檔的個數;

舉例:

&gt;use testdb

&gt; for (i=1;i&lt;=100;i++)

{ db.testColl.insert({hostname:"www"+i+".magedu.com"})}

&gt;db.testColl.find({hostname: {$gt:"www95.magedu.com"}})

find的進階用法:

mongodb的查詢操作支援挑選機制有comparison,logical,element,javascript等幾類;

比較運算comparison:

$gt;$gte;$lt;$lte;$ne,文法格式:{field:{$gt: VALUE}}

&gt; db.students.find({age: {$gt: 30}})

$in;$nin,文法格式:{field: {$in:[VALUE1,VALUE2]}}

&gt; db.students.find({age: {$in:[20,80]}})

&gt; db.students.find({age: {$in:[20,80]}}).limit(1)

&gt; db.students.find({age: {$in:[20,80]}}).count()

2

&gt; db.students.find({age: {$in: [20,80]}}).skip(1)

組合條件:

邏輯運算:$or或運算;$and與運算;$not非運算;$nor反運算,文法格式:{$or: [{&lt;expression1&gt;},{expression2}]}

&gt; db.students.find({$or: [{age: {$nin:[20,40]}},{age: {$nin: [22,30]}}]})

{ "_id" : ObjectId("595b3816853f9091c4bf4057"),"name" : "jerry", "age" : 40, "gender": "M" }

&gt; db.students.find({$and: [{age: {$nin:[20,40]}},{age: {$nin: [22,30]}}]})

元素查詢:

根據文檔中是否存在指定的字段進行的查詢,$exists;$mod;$type;

$exists,根據指定字段的存在性挑選文檔,文法格式:{field:{$exists: &lt;boolean&gt;}},true則傳回存在指定字段的文檔,false則傳回不存在指定字段的文檔

$mod,将指定字段的值進行取模運算,并傳回其餘數為指定值的文檔,文法格式:{field: {$mod: [divisor,remainder]}}

$type傳回指定字段的值類型為指定類型的文檔,文法格式:{field: {$type: &lt;BSON type&gt;}}

注:double,string,object,array,binarydata,undefined,boolean,date,null,regular expression,javascript,timestamp

更新操作:

update()方法可用于更改collection中的資料,預設情況下,update()隻更新單個文檔,若要一次更新所有符合指定條件的文檔,則要使用multi選項;

文法格式:db.collection.update(&lt;query&gt;,&lt;update&gt;,&lt;options&gt;),其中query類似于sql語句中的where,而&lt;update&gt;相當于附帶了LIMIT 1的SET,如果在&lt;options&gt;處提供multi選項,則update語句類似于不帶LIMIT語句的update();

&lt;update&gt;參數的使用格式比較獨特,其僅能包含使用update專有操作符來建構的表達式,其專有操作符大緻包含field,array,bitwise,其中filed類常用的操作有:

$set,修改字段的值為新指定的值,文法格式:({field:value},{$set: {field: new_value}})

$unset,删除指定字段,文法格式:({field:value},{$unset: {field1,field2,...}})

$rename,更改字段名,文法格式:($rename:{oldname: newname,oldname: newname})

$inc,增大指定字段的值,文法格式:({field:value},{$inc: {field1: amount}}),其中{field: value}用于指定挑選标準,{$inc: {field1: amount}用于指定要提升其值的字段及提升大小

删除操作:

&gt; db.students.remove({name:"tom"})   #(db.mycoll.remove(query))

WriteResult({ "nRemoved" : 1 })

&gt; db.collections

testdb.collections

&gt; db.students.drop()   #(db.mycoll.drop(),drop thecollection)

true

&gt; db.dropDatabase()   #(删除database)

{ "dropped" : "testdb","ok" : 1 }

&gt; quit()

索引:

索引類型:

B+Tree;

hash;

空間索引;

全文索引;

indexes are special data structures that store a small portion of the collection's set in an easy to traverse form: theindex stores the value of a specific field or set of fields,ordered by thevalue of the field;

mongodb defines indexes at the collection level and supports indexes on any field or sub-field of the documents in amongodb collection;

mongodb的索引類型:

single field indexes單字段索引:

a single field index only includes datafrom a single field of the documents in a collection;

mongodb supports single field indexes on fields at the top level of a document and on fields in sub-documents;

compound indexes組合索引:

a compound index includes more than onfield of the documents in a collection;

multikey indexes多鍵索引:

a multikey index references an array and records a match if a query includes any value in the array;

geospatial indexes and queries空間索引;

geospatial indexes support location-based searches on data that is stored as either geo JSON objects or legacy coordinate pairs;

text indexes文本索引:

text indexes supports search of string content in documents;

hashed indexes,hash索引: 

hashed indexes maintain entries with hashes of the values of the indexed field;

index creation:

mongodb provides several options that only affect the creation of the index;

specify these options in a document as the second argument to the db.collection.ensureIndex() method:

unique index:db.addresses.ensureIndex({"user_id": 1},{unique: true})

sparse index:db.addresses.ensureIndex({"xmpp_id": 1},{sparse: true})

mongodb與索引相關的方法:

db.mycoll.ensureIndex(field[,options])   #建立索引options有name,unique,dropDups,sparse(稀疏索引)

db.mycoll.dropIndex(INDEX_NAME)   #删除索引

db.mycoll.dropIndexes()

db.mycoll.getIndexes()

db.mycoll.reIndex()   #重建索引

&gt; show dbs;

&gt; use testdb;

&gt; for(i=1;i&lt;=1000;i++){db.students.insert({name: "student"+i,age:(i%120),address: "#85 LeAi Road,ShangHai,China"})}

1000

&gt; db.students.find()   #(每次顯示20行,用it繼續檢視後20行)

Type "it" for more

&gt; db.students.ensureIndex({name: 1})

         "createdCollectionAutomatically": false,

         "numIndexesBefore": 1,

         "numIndexesAfter": 2,

&gt; db.studens.getIndexes()   #目前collect的index個數,索引名為name_1

[

         {

                   "v": 1,

                   "key": {

                            "_id": 1

                   },

                   "name": "_id_",

                   "ns": "test.students"

                            "name": 1

                   "name": "name_1",

         }

]

&gt;db.students.dropIndex("name_1")  #删除索引,索引名要用引号

{ "nIndexesWas" : 2,"ok" : 1 }

&gt; db.students.ensureIndex({name:1},{unique: true})   #唯一鍵索引

&gt; db.students.getIndexes()

                   "unique": true,

&gt; db.students.insert({name:"student20",age: 20})   #在唯一鍵索引的基礎上,再次插入相同的field時會報錯

WriteResult({

         "nInserted": 0,

         "writeError": {

                   "code": 11000,

                   "errmsg": "insertDocument :: caused by :: 11000 E11000 duplicate key error index:test.students.$name_1  dup key: { :\"student20\" }"

})

&gt; db.students.find({name:"student500"})

{ "_id" : ObjectId("595c3a9bcf42a9afdbede7c0"),"name" : "student500", "age" : 20,"address" : "#85 LeAi Road,ShangHai,China" }

&gt; db.students.find({name:"student500"}).explain()   #顯示執行過程

         "cursor": "BtreeCursor name_1",

         "isMultiKey": false,

         "n": 1,

         "nscannedObjects": 1,

         "nscanned": 1,

         "nscannedObjectsAllPlans": 1,

         "nscannedAllPlans": 1,

         "scanAndOrder": false,

         "indexOnly": false,

         "nYields": 0,

         "nChunkSkips": 0,

         "millis": 0,

         "indexBounds": {

                   "name": [

                            [

                                     "student500",

                                     "student500"

                            ]

                   ]

         "server": "ebill-redis.novalocal:27017",

         "filterSet": false

&gt; db.students.find({name: {$gt:"student500"}}).explain()

         "n": 552,

         "nscannedObjects": 552,

         "nscanned": 552,

         "nscannedObjectsAllPlans": 552,

         "nscannedAllPlans": 552,

         "nYields": 4,

         "millis": 1,

                                     {

                                     }

]# mongod -h   #分General options;Replicationoptions;Master/slave options;Replica set options;Sharding options;

 --bind_ip arg               commaseparated list of ip addresses to listen on

                              - all local ipsby default

 --port arg                 specify port number - 27017 by default

 --maxConns arg              maxnumber of simultaneous connections - 1000000

                              by default,并發最大連接配接數

 --logpath arg               logfile to send write to instead of stdout - has

                              to be a file, notdirectory

 --httpinterface             enablehttp interface,web頁面監控mongodb統計資訊

 --fork                      forkserver process,mongod是否運作在背景

 --auth                      runwith security,使用認證才能通路&gt;db.createUser(Userdocument),&gt;db.auth(user,password)

 --repair                    runrepair on all dbs,上次未正常關閉,用于再啟動時修複

 --journal                   enablejournaling,啟用事務日志,先将日志順序寫到事務日志再每隔一段時間同步到資料日志檔案,将随機IO--&gt;順序IO提高性能,單執行個體的mongodb此選項一定要打開,這日保證資料日持久的

 --journalOptions arg       journal diagnostic options

 --journalCommitInterval arg how often to group/batch commit (ms)

調試選項:

 --cpu                      periodically show cpu and iowait utilization

  --sysinfo                   print some diagnostic systeminformation

 --slowms arg (=100)         valueof slow for profile and console log,超過此值的查詢都日慢查詢

 --profile arg               0=off1=slow, 2=all,性能剖析

Replication options:

 --oplogSize arg       size to use(in MB) for replication op log. default is

                        5% of disk space (i.e.large is good)

Master/slave options (old; use replica setsinstead):

 --master              master mode

 --slave               slave mode

 --source arg          when slave:specify master as &lt;server:port&gt;

 --only arg            when slave:specify a single database to replicate

 --slavedelay arg      specifydelay (in seconds) to be used when applying

                        master ops to slave

 --autoresync         automatically resync if slave data is stale

Replica set options:

 --replSet arg           arg is&lt;setname&gt;[/&lt;optionalseedhostlist&gt;]

 --replIndexPrefetch arg specify index prefetching behavior (if secondary)

                          [none|_id_only|all]

Sharding options:

 --configsvr           declare thisis a config db of a cluster; default port

                        27019; default dir/data/configdb

 --shardsvr            declare thisis a shard db of a cluster; default port

                        27018

mongodb的複制:

兩種類型(master/slave;replica sets):

master/slave,與mysql近似,此方式已很少用;

replica sets:

一般為奇數個節點(至少3個節點),可使用arbiter參與選舉;

heartbeat2S,通過選舉方式實作自動失效轉移;

複制集(副本集),可實作自動故障轉移,易于實作更進階的部署操作,服務于同一資料集的多個mongodb執行個體,一個複制集隻能有一個主節點(rw)其它都是從節點(r),主節點将資料修改操作儲存在oplog中;

arbiter仲裁者,一般至少3個節點,一主兩從,從每隔2s與主節點傳遞一次heartbeat資訊,若從在10s内監測不到主節點心跳,會自動在兩從節點上重新選舉一個為主節點;

replica sets中的節點分類:

0優先級的節點,不會被選舉為主節點,但可參與選舉過程,冷備節點,可被client讀);

被隐藏的從節點,首先是一個0優先級的從節點,且對client不可見;

延遲複制的從節點,首先是一個0優先級的從節點,且複制時間落後于主節點一個固定時長;

arbiter仲裁節點;

mongodb的複制架構:

oplog,大小固定的檔案,存儲在local資料庫中,複制集中,僅主可在本地寫oplog,即使從有oplog也不會被使用,隻有在從被選舉為主時才可在本地寫自己的oplog,mysql的二進制日志随着使用會越來越大,而oplog大小固定,在首次啟動mongodb時會劃分oplog大小(oplog.rs),一般為disk空間的5%,若5%的結果小于1G,則至少被指定為1G;

從節點有如下類型對應主節點的操作:

initial sync初始同步;

post-rollback catch-up復原後追趕;

sharding chunk migrations切分塊遷移;

local庫存放了複制集的所有中繼資料和oplog,local庫不參與複制過程,除此之外其它庫都被複制到從節點,用于存儲oplog的是一個名為oplog.rs的collection,當一個執行個體加入某複制集後,在第一次啟動時被自動建立oplog.rs,oplog.rs的大小依賴于OS及FS,可用選項--oplogSize自定義大小;

test   0.078GB

testdb (empty)

&gt; use local

switched to db local

startup_log

mongodb的資料同步類型:

initial sync初始同步,節點沒有任何資料,節點丢失副本,複制曆史,初始同步的步驟:克隆所有資料庫、應用資料集的所有改變(複制oplog并應用在本地)、為所有collection建構索引;

複制,主上寫一點會同步到從;

一緻性(僅主可寫);

--journal,保證資料持久性,單執行個體的mongodb一定要打開;

支援多線程複制,對于單個collection僅可單線程複制(在同一個namespace内依然是單線程);

可實作索引預取(資料預取),提高性能;

<a href="https://s2.51cto.com/wyfs02/M02/9B/17/wKioL1ld9LeB156mAABvzV3GJXk799.jpg" target="_blank"></a>

<a href="https://s1.51cto.com/wyfs02/M01/9B/17/wKioL1ld9N3jescqAAA-3dCTOuE908.jpg" target="_blank"></a>

<a href="https://s4.51cto.com/wyfs02/M02/9B/17/wKiom1ld9MahlyyFAABWXhHKdos963.jpg" target="_blank"></a>

purpose of replication:

replication provides redundancy and increases data availability;

with multiple copies of data on different database servers,replication protects a database from the loss of a single server;

replication also allows you to recover from hardware failure and service interruptions;

with additional copies of the data,you candedicate one to disaster recovery,reporting,or backup;

replication in mongodb:

a replica set is a group of mongod instances that host the same data set:

one mongod,the primary,receives all writeoperations ;

all other instances,secondaries,apply operations from the primary so that they have the same data set;

the primary accepts all write operations from clients,replica set can have only one primary:

because only one member can accept write opertations,replica sets provide strict consistency;

to support replication,the primary logs all changes to its data sets in its oplog;

the secondaries replicate the primary's oplog and apply the operations to their data sets;

secondaries data sets reflect the primary'sdata set;

if the primary is unavailable,the replica set will elect a secondary to be primary;

by default,clients read from the primary,however,clients can specify a read preferences to send read operations to secondaries;

arbiter:

you may add an extra mongod instance areplica set as an arbiter;

arbiters do not maintain a dataset,arbiters only exist to vote in elections;

if your replica set has an even number of members,add an arbiter to obtain a majority of votes in an election for primary;

arbiters do not require dedicated hardware;

automatic failover:

when a primary does not communicate with the other members of the set for more than 10 seconds,the replica set will attempt to select another member to become the new primary;

the first secondary that receives a majority of the votes becomes primary;

priority 0 replica set members:

a priority 0 member is a secondary that cannot become primary;

priority 0 members cannot trigger elections:

otherwise these members function as normal secondaries;

a priority 0 member maintains a copy of the data set,accepts read opertaions,and votes in elections;

configure a priority 0 member to prevent secondaries from becoming primary,which is particularyly useful in multi-datacenter deployments;

in a three-member replica set, in one datacenter hosts the primary and a secondary;

a second data center hosts one priority 0 member that cannot become primary;

複制集重新選舉的影響條件:

heartbeat;

priority(優先級,預設為1,0優先級的節點不能觸發選舉操作);

optime;

網絡連接配接;

網絡分區;

選舉機制:

觸發選舉的事件:

新複制集初始化時;

從節點聯系不到主節點時;

主節點下台(不可用)時,&gt;rs.stepdown(),某從節點有更高的優先級且已滿足成為主節點其它所有條件,主節點無法聯系到副本集的多數方;

操作:

172.168.101.239   #master

172.168.101.233   #secondary

172.168.101.221   #secondary

]# vim /etc/mongod.conf   #(三個節點都用此配置)

fork=true

# in replicated mongo databases, specifythe replica set name here

#replSet=setname

replSet=testSet   #(複制集名稱,叢集名稱)

replIndexPrefetch=_id_only   #(specify indexprefetching behavior (if secondary) [none|_id_only|all],副本索引預取,使複制更高效,僅在secondary上有效)

]# mongo --host 172.168.101.239   #(在primary上操作)

test  0.078GB

&gt; rs.status()

         "startupStatus": 3,

         "info": "run rs.initiate(...) if not yet done for the set",

         "ok": 0,

         "errmsg": "can't get local.system.replset config from self or any seed(EMPTYCONFIG)"

&gt; rs.initiate()

         "info2": "no configuration explicitly specified -- making one",

         "me": "172.168.101.239:27017",

         "info": "Config now saved locally.  Shouldcome online in about a minute.",

testSet:PRIMARY&gt;rs.add("172.168.101.233:27017")

{ "ok" : 1 }

testSet:PRIMARY&gt; rs.add("172.168.101.221:27017")  #(另可用rs.addArb(hostportstr)添加仲裁節點)

testSet:PRIMARY&gt; rs.status()

         "set": "testSet",

         "date": ISODate("2017-07-05T07:22:59Z"),

         "myState": 1,

         "members": [

                   {

                            "_id": 0,

                            "name": "172.168.101.239:27017",

                            "health": 1,

                            "state": 1,

                            "stateStr": "PRIMARY",

                            "uptime": 970,

                            "optime": Timestamp(1499238843, 1),

                            "optimeDate": ISODate("2017-07-05T07:14:03Z"),

                            "electionTime": Timestamp(1499238673, 2),

                            "electionDate": ISODate("2017-07-05T07:11:13Z"),

                            "self": true

                            "_id": 1,

                            "name": "172.168.101.233:27017",

                            "state": 2,

                            "stateStr": "SECONDARY",

                            "uptime": 557,

                            "lastHeartbeat": ISODate("2017-07-05T07:22:57Z"),

                            "lastHeartbeatRecv": ISODate("2017-07-05T07:22:57Z"),

                            "pingMs": 1,

                            "syncingTo": "172.168.101.239:27017"

                            "_id": 2,

                            "name": "172.168.101.221:27017",

                            "uptime": 536,

                            "pingMs": 0,

                   }

         ],

testSet:PRIMARY&gt; use testdb

testSet:PRIMARY&gt;db.students.insert({name: "tom",age: 22,course: "hehe"})

]# mongo --host 172.168.101.233   #(在secondary上操作)

testSet:SECONDARY&gt; rs.slaveOk()   #(執行此操作後才可查詢)

testSet:SECONDARY&gt; rs.status()

         "date": ISODate("2017-07-05T07:24:54Z"),

         "myState": 2,

         "syncingTo": "172.168.101.239:27017",

                            "uptime": 669,

                            "lastHeartbeat": ISODate("2017-07-05T07:24:53Z"),

                            "lastHeartbeatRecv" :ISODate("2017-07-05T07:24:53Z"),

                            "electionDate": ISODate("2017-07-05T07:11:13Z")

                            "uptime": 1292,

                            "uptime": 649,

                            "lastHeartbeatRecv": ISODate("2017-07-05T07:24:53Z"),

                            "pingMs": 5,

testSet:SECONDARY&gt; use testdb

testSet:SECONDARY&gt; db.students.find()

{ "_id" :ObjectId("595c95095800d50614819643"), "name" :"tom", "age" : 22, "course" : "hehe" }

]# mongo --host 172.168.101.221   #(在另一secondary上操作)

testSet:SECONDARY&gt; db.students.find()   #(未先執行rs.slaveOk())

error: { "$err" : "notmaster and slaveOk=false", "code" : 13435 }

testSet:SECONDARY&gt; rs.slaveOk()

testSet:SECONDARY&gt;db.students.insert({name: jowin,age: 30,course: "mage"})   #(隻可在primary上寫)

2017-07-05T15:37:05.069+0800ReferenceError: jowin is not defined

testSet:PRIMARY&gt; rs.stepDown()   #(在primary上操作,step downas primary (momentarily) (disconnects),模拟讓主下線)

2017-07-05T15:38:46.819+0800DBClientCursor::init call() failed

2017-07-05T15:38:46.820+0800 Error: errordoing query: failed at src/mongo/shell/query.js:81

2017-07-05T15:38:46.821+0800 tryingreconnect to 172.168.101.239:27017 (172.168.101.239) failed

2017-07-05T15:38:46.822+0800 reconnect172.168.101.239:27017 (172.168.101.239) ok

         "date": ISODate("2017-07-05T07:39:46Z"),

         "syncingTo": "172.168.101.221:27017",

                            "stateStr" :"SECONDARY",

                            "uptime": 1977,

                            "optime": Timestamp(1499239689, 1),

                            "optimeDate": ISODate("2017-07-05T07:28:09Z"),

                            "infoMessage": "syncing to: 172.168.101.221:27017",

                            "uptime": 1564,

                            "lastHeartbeat": ISODate("2017-07-05T07:39:45Z"),

                            "lastHeartbeatRecv": ISODate("2017-07-05T07:39:45Z"),

                            "lastHeartbeatMessage": "syncing to: 172.168.101.239:27017",

                            "stateStr" :"PRIMARY",

                            "uptime": 1543,

                            "lastHeartbeat": ISODate("2017-07-05T07:39:46Z"),

                            "lastHeartbeatRecv": ISODate("2017-07-05T07:39:46Z"),

                            "electionTime": Timestamp(1499240328, 1),

                            "electionDate": ISODate("2017-07-05T07:38:48Z")

testSet:SECONDARY&gt;    #(在新選舉的primary上操作)

testSet:PRIMARY&gt;

testSet:PRIMARY&gt;db.printReplicationInfo()   #(檢視oplog大小及時間)

configured oplog size:   1180.919921875MB

log length start to end: 846secs (0.24hrs)

oplog first event time:  Wed Jul 05 2017 15:14:03 GMT+0800 (CST)

oplog last event time:   Wed Jul 05 2017 15:28:09 GMT+0800 (CST)

now:                    Wed Jul 05 201715:42:57 GMT+0800 (CST)

testSet:SECONDARY&gt; quit()   #(将172.168.101.239停服務,目前的primary在10S内連不上,會将其health标為0)

[root@ebill-redis ~]# /etc/init.d/mongodstop

         "date": ISODate("2017-07-05T07:51:58Z"),

                            "health" : 0,

                            "state": 8,

                            "stateStr": "(not reachable/healthy)",

                            "uptime": 0,

                            "lastHeartbeat": ISODate("2017-07-05T07:51:57Z"),

                            "lastHeartbeatRecv": ISODate("2017-07-05T07:44:24Z"),

                            "syncingTo": "172.168.101.221:27017"

testSet:PRIMARY&gt; cfg=rs.conf()   #(将172.168.101.233從節點的優先級調為2,使之觸發選舉時成為主,立即生效)

         "_id": "testSet",

         "version": 3,

                            "host": "172.168.101.239:27017"

                            "host": "172.168.101.233:27017"

                            "host": "172.168.101.221:27017"

         ]

testSet:PRIMARY&gt;cfg.members[1].priority=2

testSet:PRIMARY&gt; rs.reconfig(cfg)   #(經測試此步驟可不用執行,在執行&gt;cfg.members[1].priority=2,優先級最高的節點将直接成為primary)

2017-07-05T16:04:10.101+0800DBClientCursor::init call() failed

2017-07-05T16:04:10.103+0800 tryingreconnect to 172.168.101.221:27017 (172.168.101.221) failed

2017-07-05T16:04:10.104+0800 reconnect172.168.101.221:27017 (172.168.101.221) ok

reconnected to server after rs command (whichis normal)

testSet:PRIMARY&gt; cfg=rs.conf()   #(添加仲裁節點,若目前的secondary資料已複制完成則不能修改此屬性值,要先&gt;rs.remove(hostportstr)再&gt;rs.addArb(hostportstr)

         "version": 6,

                            "host": "172.168.101.239:27017",

                            "priority": 3

                            "host": "172.168.101.233:27017",

                            "priority": 2

testSet:PRIMARY&gt;cfg.members[2].arbiterOnly=true

testSet:PRIMARY&gt; rs.reconfig(cfg)

         "errmsg": "exception: arbiterOnly may not change for members",

         "code": 13510,

         "ok": 0

testSet:PRIMARY&gt; rs.printSlaveReplicationInfo()

source: 172.168.101.233:27017

         syncedTo:Wed Jul 05 2017 16:08:07 GMT+0800 (CST)

         0secs (0 hrs) behind the primary

source: 172.168.101.221:27017

mongodb sharding:

資料量大導緻單機性能不足;

mysql的分片借助Gizzard(twitter)、HiveDB、MySQLproxy+HSCALE、Hibernate shard(google)、Pyshard;

寫離散、讀要集中;

隻要是叢集,各node間時間要同步;

分片架構中的角色:

mongos  #router or proxy,生産中mongos要作HA;

config server   #中繼資料伺服器,中繼資料實際是collection的index,生産中configserver3台,能實作内部選舉;

shard  #資料節點,即mongod執行個體節點,生産中至少3台;

<a href="https://s4.51cto.com/wyfs02/M00/9B/17/wKioL1ld9bjQKgBZAABn6jIFzTU829.jpg" target="_blank"></a>

range  #基于範圍切片

list  #基于清單分片

hash  #基于hash分片

sharding is the process of storing datarecords across multiple machines and is mongdb's approach to meeting thedemands of data growth;

as the size of the data increases,a singlemachine may not be sufficient to store the data nor provide an acceptable readand write throughput;

sharding solves the problem with horizontalscaling;

with sharding,you add more machines tosupport data growth and the demands of read and write operation;

purpose of sharding:

database systems with large data sets andhigh throughput applications can challenge the capacity of a single server;

high query rates can exhaust the cpucapacity of the server;

larger data sets exceed the storagecapacity of a single machine;

finally,working set sizes larger than thesystem's ram stress the IO capacity of disk drives;

<a href="https://s3.51cto.com/wyfs02/M01/9B/17/wKiom1ld9cegDMNNAABOEX747E8024.jpg" target="_blank"></a>

<a href="https://s1.51cto.com/wyfs02/M01/9B/17/wKioL1ld9dbRv_CmAACFK2NhKAk221.jpg" target="_blank"></a>

sharding in mongodb:

sharded cluster has the following components:shards,query,routers and config server:

shards store the data. to provide high availability and data consistency,in a production sharded cluster,each shard is a replicaset;

query routers,or mongos instances,interface with client applications and direct operations to the appropriate shard orshards;

the query router processes and targets operations to shards and then returns results to the clients;

a sharded cluster can containn more than one query router to divide the client request load;

a client sends requests to one query router;

most sharded cluster have many query routers;

config servers store the clsuter's metadata:

this data contains a mapping of the cluster's data set to the shards;

the query router uses this metadata to target opertations to specific shards;

production sharded clusters have exactly 3 config server;

data pratitioning:

shard keys:

to shard a collection,you need to select ashard key;

a shard key is either an indexed field oran indexed compound field that exists in every document in the collection;

mongodb divides the shard key values into chunks and distributes the chunks evenly across the shards;

to divide the shard key values into chunks,mongodb uses either range based partitioning and hash based partitioning;

range based sharding:

for range-based sharding,mongodb divides the data set into ranges determined by the shard key values to provide range based partitioning;

hash based sharding:

for hash based partitioning,mongodb computes a hash of a field's value,and then uses these hashes to create chunks;

performance distinctions between range and hash based partitioning:

range based partitioning supports more efficient range queries;

however,range based partitioning can result in an uneven distribution of data,which may negate some of the benefits of sharding;

hash based partitioning,by contrast,ensures an even distribution of data at the expense of efficient range queries;

but random distribution makes it more likely that a range query on the shard key will not be able to target a fewshards but would more likely query every shard in order to return a result;

maintaining a balanced data distribution:

mongodb ensures a balanced cluster usingtwo background process: splitting and the balancer

splitting:

a background process that keeps chunks from growing too large;

when a chunk grows beyond a specified chunksize,mongodb splits the chunk in half;

inserts and updates triggers splits;

splits are a efficient meta-data change;

to create splits,mongodb does not migrateany data or affect the shards;

balnacing:

the balancer is a background process that manages chunk migrations;

the balancer runs in all of ther queryrouters in a cluster;

when the distribution of a shardedcollection in a cluster is uneven,the balanacer process migrates chunks fromthe shard that has the largest number of chunks to the shard with the lesatnumber of chunks until the collection balances;

the shards mange chunk migrations as a background operation;

during migration,all requests for a chunks data address the origin shard;

primary shard:

every database has a primary shard that holds all the un-sharded collections in that database;

broadcast operations:

mongos instances broadcast queries to all shards for the collection unless the mogos can determine which shard or subset of shards stores this data;

targeted operations:

all insert() operations target to one shard;

all single update(),including upsert operations,and remove() operations must target to one shard;

sharded cluster metadata:

config servers store the metadata for asharded cluster:

the metadata reflects state and organization of the sharded data sets and system;

the metadata includes the list of chunks one very shard and the ranges that define the chunks;

the mongos instances cache this data anduse it to route read and write operations to shards;

config servers store the metadata in the config database;

172.168.101.228   #mongos

172.168.101.239   #config server

172.168.101.233&amp;172.168.101.221   #shard{1,2}

在config server操作:

configsvr=true

httpinterface=true

]# ss -tnl | egrep "27019|28019"

LISTEN    0      128         172.168.101.239:27019                    *:*    

LISTEN    0      128         172.168.101.239:28019                    *:*

在shard{1,2}操作:

bind_ip=172.168.101.233

LISTEN    0      128         172.168.101.233:27017                    *:*    

LISTEN    0      128         172.168.101.233:28017                    *:*

在mongos操作:

]# yum -y localinstall mongodb-org-mongos-2.6.4-1.x86_64.rpm mongodb-org-server-2.6.4-1.x86_64.rpm mongodb-org-shell-2.6.4-1.x86_64.rpm mongodb-org-tools-2.6.4-1.x86_64.rpm   #(僅mongos節點要安裝mongodb-org-mongos-2.6.4-1.x86_64.rpm,其它節點僅安裝server,shell,tools)

]# mongos --configdb=172.168.101.239:27019 --logpath=/var/log/mongodb/mongos.log --httpinterface --fork

2017-07-06T15:00:32.926+0800 warning:running with 1 config server should be done only for testing purposes and isnot recommended for production

about to fork child process, waiting untilserver is ready for connections.

forked process: 5349

child process started successfully, parentexiting

LISTEN    0      128                       *:27017                    *:*    

LISTEN    0      128                       *:28017                    *:*  

]# mongo --host 172.168.101.228

connecting to: 172.168.101.228:27017/test

For interactive help, type "help".

For more comprehensive documentation, see

         http://docs.mongodb.org/

Questions? Try the support group

         http://groups.google.com/group/mongodb-user

mongos&gt; sh.help()

mongos&gt; sh.status()

--- Sharding Status ---

 sharding version: {

         "_id": 1,

         "version": 4,

         "minCompatibleVersion": 4,

         "currentVersion": 5,

         "clusterId": ObjectId("595ddf9b52f218b1b6f9fcf2")

 shards:

 databases:

         {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }

mongos&gt;sh.addShard("172.168.101.233")

{ "shardAdded" :"shard0000", "ok" : 1 }

mongos&gt;sh.addShard("172.168.101.221")

{ "shardAdded" :"shard0001", "ok" : 1 }

         {  "_id" : "shard0000",  "host" :"172.168.101.233:27017" }

         {  "_id" : "shard0001",  "host" :"172.168.101.221:27017" }

mongos&gt;sh.enableSharding("testdb")   #(在testdb庫上啟用sharding功能)

         {  "_id" : "shard0001",  "host" : "172.168.101.221:27017"}

         {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0001" }

         {  "_id" : "testdb",  "partitioned" : true,  "primary" : "shard0001" }

mongos&gt; for (i=1;i&lt;=10000;i++){db.students.insert({name: "student"+i,age: (i%120),classes:"class"+(i%10),address: "www.magedu.com"})}

mongos&gt; db.students.find().count()

10000

mongos&gt;sh.shardCollection("testdb.students",{age: 1})

         "proposedKey": {

                   "age": 1

         "curIndexes": [

                            "v": 1,

                            "key": {

                                     "_id": 1

                            },

                            "name": "_id_",

                            "ns": "testdb.students"

         "errmsg": "please create an index that starts with the shard key beforesharding."

sh.moveChunk(fullName,find,to)            move the chunk where 'find' is to'to' (name of shard),手動均衡,不建議使用

mongos&gt; use admin

switched to db admin

mongos&gt;db.runCommand("listShards")

         "shards": [

                            "_id": "shard0000",

                  {

                            "_id": "shard0001",

mongos&gt; sh.isBalancerRunning()   #(均衡器僅在工作時才顯示true)

false

mongos&gt; sh.getBalancerState()

mongos&gt; db.printShardingStatus()   #(同&gt;sh.status())

         sh.help()                    sharding helpers

         helpadmin                   administrativehelp

         helpconnect                 connecting to adb help

         helpkeys                    key shortcuts

         helpmisc                    misc things to know

         helpmr                      mapreduce

         showdbs                     show databasenames

         showcollections             show collectionsin current database

         showusers                   show users incurrent database

         showprofile                 show most recentsystem.profile entries with time &gt;= 1ms

         showlogs                    show theaccessible logger names

         showlog [name]              prints out thelast segment of log in memory, 'global' is default

         use&lt;db_name&gt;                set current database

         db.foo.find()                list objects in collection foo

         db.foo.find({ a : 1 } )     list objects in foo wherea == 1

         it                           result of the lastline evaluated; use to further iterate

         DBQuery.shellBatchSize= x   set default number of items todisplay on shell

         exit                         quit the mongo shell

DB methods:

         db.adminCommand(nameOrDocument)- switches to 'admin' db, and runs command [ just calls db.runCommand(...) ]

         db.auth(username,password)

         db.cloneDatabase(fromhost)

         db.commandHelp(name)returns the help for the command

         db.copyDatabase(fromdb,todb, fromhost)

         db.createCollection(name,{ size : ..., capped : ..., max : ... } )

         db.createUser(userDocument)

         db.currentOp()displays currently executing operations in the db

         db.dropDatabase()

         db.eval(func,args) run code server-side

         db.fsyncLock()flush data to disk and lock server for backups

         db.fsyncUnlock()unlocks server following a db.fsyncLock()

         db.getCollection(cname)same as db['cname'] or db.cname

         db.getCollectionNames()

         db.getLastError()- just returns the err msg string

         db.getLastErrorObj()- return full status object

         db.getMongo()get the server connection object

         db.getMongo().setSlaveOk()allow queries on a replication slave server

         db.getName()

         db.getPrevError()

         db.getProfilingLevel()- deprecated

         db.getProfilingStatus()- returns if profiling is on and slow threshold

         db.getReplicationInfo()

         db.getSiblingDB(name)get the db at the same server as this one

         db.getWriteConcern()- returns the write concern used for any operations on this db, inherited fromserver object if set

         db.hostInfo()get details about the server's host

         db.isMaster()check replica primary status

         db.killOp(opid)kills the current operation in the db

         db.listCommands()lists all the db commands

         db.loadServerScripts()loads all the scripts in db.system.js

         db.logout()

         db.printCollectionStats()

         db.printReplicationInfo()

         db.printShardingStatus()

         db.printSlaveReplicationInfo()

         db.dropUser(username)

         db.repairDatabase()

         db.resetError()

         db.runCommand(cmdObj)run a database command.  if cmdObj is astring, turns it into { cmdObj : 1 }

         db.serverStatus()

         db.setProfilingLevel(level,&lt;slowms&gt;)0=off 1=slow 2=all

         db.setWriteConcern(&lt;write concern doc&gt; ) - sets the write concern for writes to the db

         db.unsetWriteConcern(&lt;write concern doc&gt; ) - unsets the write concern for writes to the db

         db.setVerboseShell(flag)display extra information in shell output

         db.shutdownServer()

         db.stats()

         db.version()current version of the server

DBCollection help

         db.mycoll.find().help()- show DBCursor help

         db.mycoll.count()

         db.mycoll.copyTo(newColl)- duplicates collection by copying all documents to newColl; no indexes arecopied.

         db.mycoll.convertToCapped(maxBytes)- calls {convertToCapped:'mycoll', size:maxBytes}} command

         db.mycoll.dataSize()

         db.mycoll.distinct(key ) - e.g. db.mycoll.distinct( 'x' )

         db.mycoll.drop()drop the collection

         db.mycoll.dropIndex(index)- e.g. db.mycoll.dropIndex( "indexName" ) or db.mycoll.dropIndex( {"indexKey" : 1 } )

         db.mycoll.dropIndexes()

         db.mycoll.ensureIndex(keypattern[,options])- options is an object with these possible fields: name, unique, dropDups

         db.mycoll.reIndex()

         db.mycoll.find([query],[fields])- query is an optional query filter. fields is optional set of fields toreturn.

                                                      e.g. db.mycoll.find( {x:77} , {name:1, x:1} )

         db.mycoll.find(...).count()

         db.mycoll.find(...).limit(n)

         db.mycoll.find(...).skip(n)

         db.mycoll.find(...).sort(...)

         db.mycoll.findOne([query])

         db.mycoll.findAndModify({ update : ... , remove : bool [, query: {}, sort: {}, 'new': false] } )

         db.mycoll.getDB()get DB object associated with collection

         db.mycoll.getPlanCache()get query plan cache associated with collection

         db.mycoll.getIndexes()

         db.mycoll.group({ key : ..., initial: ..., reduce : ...[, cond: ...] } )

         db.mycoll.insert(obj)

         db.mycoll.mapReduce(mapFunction , reduceFunction , &lt;optional params&gt; )

         db.mycoll.aggregate([pipeline], &lt;optional params&gt; ) - performs an aggregation on acollection; returns a cursor

         db.mycoll.remove(query)

         db.mycoll.renameCollection(newName , &lt;dropTarget&gt; ) renames the collection.

         db.mycoll.runCommand(name , &lt;options&gt; ) runs a db command with the given name where the firstparam is the collection name

         db.mycoll.save(obj)

         db.mycoll.stats()

         db.mycoll.storageSize()- includes free space allocated to this collection

         db.mycoll.totalIndexSize()- size in bytes of all the indexes

         db.mycoll.totalSize()- storage allocated for all data and indexes

         db.mycoll.update(query,object[, upsert_bool, multi_bool]) - instead of two flags, you can pass anobject with fields: upsert, multi

         db.mycoll.validate(&lt;full&gt; ) - SLOW

         db.mycoll.getShardVersion()- only for use with sharding

         db.mycoll.getShardDistribution()- prints statistics about data distribution in the cluster

         db.mycoll.getSplitKeysForChunks(&lt;maxChunkSize&gt; ) - calculates split points over all chunks and returnssplitter function

         db.mycoll.getWriteConcern()- returns the write concern used for any operations on this collection,inherited from server/db if set

         db.mycoll.setWriteConcern(&lt;write concern doc&gt; ) - sets the write concern for writes to thecollection

         db.mycoll.unsetWriteConcern(&lt;write concern doc&gt; ) - unsets the write concern for writes to thecollection

&gt; db.mycoll.find().help()

find() modifiers

         .sort({...} )

         .limit(n )

         .skip(n )

         .count(applySkipLimit)- total # of objects matching query. by default ignores skip,limit

         .size()- total # of objects cursor would return, honors skip,limit

         .explain([verbose])

         .hint(...)

         .addOption(n)- adds op_query options -- see wire protocol

         ._addSpecial(name,value) - http://dochub.mongodb.org/core/advancedqueries#AdvancedQueries-Metaqueryoperators

         .batchSize(n)- sets the number of docs to return per getMore

         .showDiskLoc()- adds a $diskLoc field to each returned object

         .min(idxDoc)

         .max(idxDoc)

         .comment(comment)

         .snapshot()

         .readPref(mode,tagset)

Cursor methods

         .toArray()- iterates through docs and returns an array of the results

         .forEach(func )

         .map(func )

         .hasNext()

         .next()

         .objsLeftInBatch()- returns count of docs left in current batch (when exhausted, a new getMorewill be issued)

         .itcount()- iterates through documents and counts them

         .pretty()- pretty print each document, possibly over multiple lines

&gt; rs.help()

         rs.status()                     { replSetGetStatus : 1 }checks repl set status

         rs.initiate()                   { replSetInitiate : null }initiates set with default settings

         rs.initiate(cfg)                { replSetInitiate : cfg }initiates set with configuration cfg

         rs.conf()                       get the currentconfiguration object from local.system.replset

         rs.reconfig(cfg)                updates the configuration of arunning replica set with cfg (disconnects)

         rs.add(hostportstr)             add a new member to the set withdefault attributes (disconnects)

         rs.add(membercfgobj)            add a new member to the set withextra attributes (disconnects)

         rs.addArb(hostportstr)          add a new member which isarbiterOnly:true (disconnects)

         rs.stepDown([secs])             step down as primary (momentarily)(disconnects)

         rs.syncFrom(hostportstr)        make a secondary to sync from the givenmember

         rs.freeze(secs)                 make a node ineligible tobecome primary for the time specified

         rs.remove(hostportstr)          remove a host from the replica set(disconnects)

         rs.slaveOk()                    shorthand fordb.getMongo().setSlaveOk()

         rs.printReplicationInfo()       check oplog size and time range

         rs.printSlaveReplicationInfo()  check replica set members and replication lag

         db.isMaster()                   check who is primary

         reconfigurationhelpers disconnect from the database so the shell will display

         anerror, even if the command succeeds.

         seealso http://&lt;mongod_host&gt;:28017/_replSet for additional diagnostic info

         sh.addShard(host )                       server:portOR setname/server:port

         sh.enableSharding(dbname)                 enables sharding on thedatabase dbname

         sh.shardCollection(fullName,key,unique)   shards the collection

         sh.splitFind(fullName,find)               splits the chunk that find is inat the median

         sh.splitAt(fullName,middle)               splits the chunk that middle isin at middle

         sh.moveChunk(fullName,find,to)            move the chunk where 'find' is to'to' (name of shard)

         sh.setBalancerState(&lt;bool on or not&gt; )   turns thebalancer on or off true=on, false=off

         sh.getBalancerState()                     return true if enabled

         sh.isBalancerRunning()                    return true if the balancerhas work in progress on any mongos

         sh.addShardTag(shard,tag)                 adds the tag to the shard

         sh.removeShardTag(shard,tag)              removes the tag from the shard

         sh.addTagRange(fullName,min,max,tag)      tags the specified range of the givencollection

         sh.status()                               prints a generaloverview of the cluster

《NoSQL資料庫入門》

《MongoDB權威指南》

《深入學習MongoDB》

學習模式:

基本概念;

安裝;

資料類型與資料模型;

與應用程式對接;

shell操作指令;

主從複制;

分片;

管理維護;

應用案例;

mongodb:

面向文檔的NoSQL;

<a href="https://www.mongodb.com/" target="_blank">https://www.mongodb.com/</a>

名稱源自humongous(巨大無比);

C++編寫,支援linux/windows(32bit/64bit),solaris64bit,OS X64bit;

10gen(雲平台)研發的面向文檔的資料庫(2007年10月,mongodb由10gen團隊所發展,2009年2月首度推出),目前最新版3.4.2;

開源軟體,GPL許可,也有10gen頒發的商業許可(具有商業性質的企業級支援、教育訓練和咨詢服務);

https://repo.mongodb.com/yum/redhat/

mongodb是NoSQL中發展最好的一款,最接近關系型DB,并且最有可能取而代之它;

mongodb基本不需要多做做優化,隻要掌握索引就行,它自身的自動分片叢集會化解壓力;

mongodb查詢語句是在java script基礎上發展而來;

LAMP--&gt;LNMP(nginx,mongodb,python)

hadoop和redis是由個人開發的,mongodb是由公司開發的;

技術特點:

面向文檔的存儲引擎,可友善支援非結構化資料;

全面的索引支援,可在任意屬性上建立索引;

資料庫本身内置的複制與高可用;

資料庫本身支援的自動化分片叢集;

豐富的基于文檔的查詢功能;

原子化的查詢操作;

支援map/reduce;

GridFS;

非結構化資料(舉例:調查表,公司裝置管理):

不能确定表的列結構的資料(而SQL,sturcture query language結構化查詢語言,先确定表結構即列);

誤區(認為多媒體資料、大資料是非結構化資料);

非結構化的煩惱(無法固定模式/模型;資料結構持續變化中;資料庫管理者和開發人員的壓力被非必須的擴大);

文檔(行)

可了解為調查表,一行是一個調查表,由多個"key:value"(“屬性名:值”)組成,在多行中,可給不固定的key(屬性)加上索引,value(值)有資料類型區分;

{"foo":3,"greeting":"Hello,world!"}

數值:

{"foo":3}

字元串:

{"foo":"3"}

區分大小寫:

{"Foo":3}

key不能重複:

{"greeting":"Helloworld!","greeting":"Hello,MongoDB!"}

文檔中可以嵌入文檔

集合:

集合就是一組文檔;

文檔類似于關系庫裡的行;

集合類似于關系庫裡的表;

集合是無模式的,即集合中的文檔可以五花八門,無需固定結構;

集合的命名;

資料庫:

資料庫是由多個集合組成;

資料庫的命名與命名空間規則,見《MongoDB the definitive guide.pdf》;

特殊資料庫;

集合的命名:

a collection is identified by itsname.collection names can be any utf-8 string,with a few resrictions:

1.the empty string("") is not avalid collection name;

2.collection names may not contain thecharacter \0(the null character) because this delineates the end of acollection name;

3.you should not create any collectionsthat start with system,a prefix reserved for system collections,for example,thesystem .users collection contains the database's users,and the systemnamespaces collection contains information about all of the databases'scollections.

4.user-created collections should notcontain the reserved characters $ in the name. the various drivers avaiousdrivers available for the database do support using $ in collection namesbecause some system-generated collections contain it. you should not use $ in aname unless you are accessing one of these collections.

資料庫命名:

1.the empty string("") is not avalid database name;

2.a database name cannot contain any ofthese characters:' '(single space)、-、$、/、\、\0(the nullcharacter);

3.database names should be all lowercase;

4.database names are limited to a maximumof 64bytes;

預設庫:

admin(this is the "root" database,in terms of authentication.ifa user is added to the admin database,the user automatically inheritspermissions for all databases,there are also certain server-wide commands thatcan be run only from the admin database; such as listing all of the databasesor shutting down server);

local(the database will never be replicated and can be used to store anycollections that should be local to a single server(see chapter 9 for moreinformation about replication and the local database));

config(when mongo is being used in a sharded set up (see chapter 10),theconfig database is used internally to store information about the shards);

<a href="https://s3.51cto.com/wyfs02/M00/9B/18/wKioL1ld-jjzIilrAABsBgcrao8250.jpg" target="_blank"></a>

condor cluster,秃鷹群,2010-12-7,美國空軍研究實驗室近日研制出一台号稱美國國防部速度最快的超級計算機--“秃鷹群”。這台超級計算機由1760台索尼PS3遊戲機組成,運算性能可達每秒500萬億次浮點運算,它也是世界上第33大超級計算機;“秃鷹群”超級計算機項目開始于4年前,當時每台PS遊戲機價值約400美元,組成超級計算機核心的PS3遊戲機共花費大約為200萬美元,美國空軍研究實驗室高性能計算部主任馬克-巴内爾表示,這樣的花費隻相當于利用成品計算機部件組裝的同等系統花費的5%到10%;基于PS3遊戲機的超級計算機的另一個優點就是它的能效,它所消耗的能量僅僅相當于類似超級計算機的10%。除了PS3遊戲機外,這台超級計算機還包括了168個獨立的圖像處理單元和84個協調伺服器;

<a href="https://s2.51cto.com/wyfs02/M01/9B/18/wKiom1ld-kuCRkUdAABps2JFa60675.jpg" target="_blank"></a>

“天河一号”超級計算機由中國國防科學技術大學研制,部署在天津的國家超級計算機中心,其測試運算速度可以達到每秒2570萬億次。2009年10月29日,作為第一台國産千萬億次超級計算機“天河一号”在湖南長沙亮相;該超級計算機峰值性能速度能達到每秒1206萬億次雙精度浮點數,而LINPACK實測性能速度達到每秒563.1萬億次;10億人民币;

mongodb查詢:

操作

SQL

mongodb

所有記錄

SELECT * FROM users;

&gt;db.users.find()

age=33的記錄

&gt;SELECT * from users WHERE age=33;

&gt;db.users.find({age:33});

子鍵(字段)篩選

&gt;SELECT a,b from users WHERE age=33;

db.users.find({age:33},{a:1,b:1});

排序

&gt;SELECT * FROM users WHERE age=33  ORDER BY name;

db.users.find({age:33}).sort({name:1});

比大小

&gt;SELECT * FROM users WHERE age&gt;33;

db.users.find({age:{$gt:33}});

正則(模糊比對)

&gt;SELECT * FROM users WHERE name LIKE  "Joe%";

&gt;db.users.find({name:/^Joe/});

忽略,限制

&gt;SELECT * FROM users LIMIT 10 SKIP 20;

&gt;db.users.find().limit(10).skip(20);

or操作

&gt;SELECT * FROM users WHERE a=1 or b=2;

&gt;db.users.find({$or:[{a:1},{b:2}]);

僅傳回1條

&gt;SELECT * FROM users LIMIT 1;

&gt;db.users.findOne();

distinct聚合

&gt;SELECT DISTINCT last_name FROM users;

db.users.distinct("Last_name");

count聚合

&gt;SELECT COUNT(AGE) from users;

db.users.find({age:{"$exists":true}}).count();

查詢計劃

&gt;EXPLAIN SELECT * FROM users WHERE  z=3;

&gt;db.users.find({z:3}).explain();

本文轉自 chaijowin 51CTO部落格,原文連結:http://blog.51cto.com/jowin/1945066,如需轉載請自行聯系原作者