天天看点

IX MongoDB

NoSQL,not only sql;

bigdata大数据存储问题:

并行数据库(水平切分;分区查询);

NoSQL数据库管理系统:

非关系模型;

分布式;

不支持ACID数据库设计范式;

简单数据模型;

元数据和数据分离;

弱一致性;

高吞吐量;

高水平扩展能力和低端硬件集群;

代表(clustrix;GenieDB;ScaleBase;NimbusDB;Drizzle);

云数据管理(DBaaS);

大数据的分析处理(MapReduce);

CAP:

<a href="https://s2.51cto.com/wyfs02/M00/9B/16/wKioL1ld8SCRS68PAAAqhi4Nf9Q333.jpg" target="_blank"></a>

ACID vs BASE:

atomicity basically available

consistency soft state

isolation eventually

durability consistent

注:

最终一致性细分:

因果一致性;

读自己写一致性;

会话一致性;

单调读一致性;

时间轴一致性;

<a href="https://s1.51cto.com/wyfs02/M00/9B/16/wKioL1ld8TjxdeHAAACjTCrzz0s626.jpg" target="_blank"></a>

ACID(强一致性;隔离性;采用悲观保守的方法,难以变化);

BASE(弱一致性;可用性优先;采用乐观的方法,适用变化,更简单、更快);

数据一致性的实现技术:

NRW策略(

N,number,所有副本数;

R,read,完成r副本数;

W,write,完成w副本数;

R+W&gt;N(强一致);

R+W&lt;N(弱一致);

);

2PC;

PaxOS;

Vector clock(向量时钟);

http://www.nosql-database.org/

数据存储模型(各流派根据此模型来划分):

列式存储模型(

应用场景:在分布式FS之上,支持随机rw的分布式数据存储;

典型产品:HBase、Hypertable、Cassandra;

数据模型:以列为中心存储,将同一列数据存储在一起;

优点:快速查询,高可扩展性,易于实现分布式扩展;

文档模型(

应用场景:非强事务要求的web应用;

典型产品:MongoDB、ElasticSearch、CouchDB、CouchDB Server);

数据模型:键值模型,存储为文档;

优点:数据模型无须事先定义;

键值模型(

应用场景:内容缓存,用于大量并行数据访问的高负载场景;

典型产品:redis、DynamoDB(amazon)、Azure、Table Storage、Riak;

数据模型:基于hash表实现的key-value;

优点:查询迅速;

图式模型(

应用场景:社交网络、推荐系统、关系图谱;

典型产品:Neo4j、Infinite Graph;

数据模型:图式结构;

优点:适用于图式计算场景;

introduction mongodb:

MongoDB(from humongous) is a scalable,high-performance, open source, schema free , document nosql oriented database;

<a href="https://s5.51cto.com/wyfs02/M01/9B/16/wKiom1ld8YaDW4sYAABOc-7cu-o600.jpg" target="_blank"></a>

<a href="https://s1.51cto.com/wyfs02/M01/9B/16/wKioL1ld8ZST3J5VAABTSy1PakM760.jpg" target="_blank"></a>

what is MongoDB?

humongous(huge+monstrous);

document database(document oriented databaase: uses JSON(BSON actually);

schema free;

C++;

open source;

GNU AGPL V3.0 License;

OSX,Linux,windows,solaris|32bi,64bit;

development and support by 10gen and was first released in February 2009;

NoSQL;

performance(written C++,full index support,no transactions(has atomic operations)支持原子事务, memory-mapped files(delayed writes));

saclable(replication,auto-sharding);

commercially supported(10gen): lots of documentation;

document-based queries:

flexible document queries expressed in JSON/Javascript;

Map/Reduce:(flexible aggregation and data processing; queries run in parallel on all shards);

gridFS(store files of any size easily);

geospatial indexing(find object based on location(i.e.find closes n items to x));

many production deloyments;

features:

collection oriented storage: easy storageof object/JSON-style data;

dynamic queries;

full index support,including on inner objects and embedded arrays;

query profiling;

replication and fail-over support;

efficient storage of binary data including large objects(e.g. photos and videos);

auto-sharding for cloud-level scalability(currently in alpha);

great for(websites; caching; high volument;low value; high scalability; storage of program objects and json);

not as great for(highly transactional;Ad-hoc business intelligence; problems requiring SQL);

JSON,java script object notation,名称/值对象的集合;值的有序列表;

DDL;

DML

https://www.mongodb.com/

http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/RPMS/

mongodb-org-mongos-2.6.4-1.x86_64.rpm

mongodb-org-server-2.6.4-1.x86_64.rpm

mongodb-org-shell-2.6.4-1.x86_64.rpm

mongodb-org-tools-2.6.4-1.x86_64.rpm

]# yum -y install mongodb-org-server-2.6.4-1.x86_64.rpm mongodb-org-shell-2.6.4-1.x86_64.rpmmongodb-org-tools-2.6.4-1.x86_64.rpm

]# rpm -qi mongodb-org-server   #(This package contains the MongoDB server software, default configuration files, and init.d scripts.)

&lt;!--Description :

MongoDB is built for scalability,performance and high availability, scaling from single server deployments tolarge, complex multi-site architectures. By leveraging in-memory computing,MongoDB provides high performance for both reads and writes. MongoDB’s nativereplication and automated failover enable enterprise-grade reliability andoperational flexibility.

MongoDB is an open-source database used bycompanies of all sizes, across all industries and for a wide variety ofapplications. It is an agile database that allows schemas to change quickly asapplications evolve, while still providing the functionality developers expectfrom traditional databases, such as secondary indexes, a full query languageand strict consistency.

MongoDB has a rich client ecosystemincluding hadoop integration, officially supported drivers for 10 programminglanguages and environments, as well as 40 drivers supported by the usercommunity.

MongoDB features:

* JSON Data Model with Dynamic Schemas

* Auto-Sharding for Horizontal Scalability

* Built-In Replication for HighAvailability

* Rich Secondary Indexes, includinggeospatial

* TTL indexes

* Text Search

* Aggregation Framework &amp; NativeMapReduce

--&gt;

]# rpm -qi mongodb-org-shell   #(This package contains the mongo shell.)

]# rpm -qi mongodb-org-tools   #(This package contains standard utilities for interacting with MongoDB.)

]# mkdir -pv /mongodb/data

mkdir: created directory `/mongodb'

mkdir: created directory `/mongodb/data'

]# id mongod

uid=497(mongod) gid=497(mongod)groups=497(mongod)

]# chown -R mongod.mongod /mongodb/

]# vim /etc/mongod.conf

#dbpath=/var/lib/mongo

dbpath=/mongodb/data

bind_ip=172.168.101.239

#httpinterface=true

httpinterface=true   #28017

]# /etc/init.d/mongod start

Starting mongod: /usr/bin/dirname: extraoperand `2&gt;&amp;1.pid'

Try `/usr/bin/dirname --help' for moreinformation.

                                                          [  OK  ]

]# vim /etc/init.d/mongod   #(解决2&gt;&amp;1.pid问题)

 #daemon --user "$MONGO_USER" "$NUMACTL $mongod $OPTIONS&gt;/dev/null 2&gt;&amp;1"

 daemon --user "$MONGO_USER" "$NUMACTL $mongod $OPTIONS&gt;/dev/null"

]# /etc/init.d/mongod restart

Stopping mongod:                                          [  OK  ]

Starting mongod:                                          [  OK  ]

]# ss -tnl | egrep "27017|28017"

LISTEN    0      128               127.0.0.1:27017                    *:*    

LISTEN    0      128               127.0.0.1:28017                    *:*  

]# du -sh /mongodb/data

3.1G          /mongodb/data

]# ll -h !$

ll -h /mongodb/data

total 81M

drwxr-xr-x 2 mongod mongod 4.0K Jun 2916:23 journal

-rw------- 1 mongod mongod  64M Jun 29 16:23 local.0

-rw------- 1 mongod mongod  16M Jun 29 16:23 local.ns

-rwxr-xr-x 1 mongod mongod    6 Jun 29 16:23 mongod.lock

<a href="https://s2.51cto.com/wyfs02/M01/9B/17/wKiom1ld8xehuxVGAACY-8xrR_I255.jpg" target="_blank"></a>

]# mongo --host 172.168.101.239   #(默认在test库)

MongoDB shell version: 2.6.4

connecting to: 172.168.101.239:27017/test

Welcome to the MongoDB shell.

……

&gt; show dbs

admin (empty)

local 0.078GB

&gt; show collections

&gt; show users

&gt; show logs

global

&gt; help

         db.help()                    help on db methods

         db.mycoll.help()             help on collection methods

         sh.help()                    sharding helpers

         rs.help()                    replica set helpers

         ……

&gt; db.help()

&gt; db.stats()   #(数据库状态)

{

         "db": "test",

         "collections": 0,

         "objects": 0,

         "avgObjSize": 0,

         "dataSize": 0,

         "storageSize": 0,

         "numExtents": 0,

         "indexes": 0,

         "indexSize": 0,

         "fileSize": 0,

         "dataFileVersion": {

         },

         "ok": 1

}

&gt; db.version()  

2.6.4

&gt; db.serverStatus()   #(mongodb数据库服务器状态)

&gt; db.getCollectionNames()

[ ]

&gt; db.mycoll.help()

&gt; use testdb

switched to db testdb

test  (empty)

&gt; db.stats()

         "db": "testdb",

&gt; db.students.insert({name:"tom",age: 22})

WriteResult({ "nInserted" : 1 })

students

system.indexes

admin  (empty)

local  0.078GB

test   (empty)

testdb 0.078GB

&gt; db.students.stats()

         "ns": "testdb.students",

         "count": 1,

         "size": 112,

         "avgObjSize": 112,

         "storageSize": 8192,

         "numExtents": 1,

         "nindexes": 1,

         "lastExtentSize": 8192,

         "paddingFactor": 1,

         "systemFlags": 1,

         "userFlags": 1,

         "totalIndexSize": 8176,

         "indexSizes": {

                   "_id_": 8176

[ "students","system.indexes" ]

&gt; db.students.insert({name:"jerry",age: 40,gender: "M"})

&gt; db.students.insert({name:"ouyang",age:80,course: "hamagong"})

&gt; db.students.insert({name: "yangguo",age:20,course: "meivnquan"})

&gt; db.students.insert({name:"guojing",age: 30,course: "xianglong"})

&gt; db.students.find()

{ "_id" :ObjectId("5955b505853f9091c4bf4056"), "name" :"tom", "age" : 22 }

{ "_id" :ObjectId("595b3816853f9091c4bf4057"), "name" :"jerry", "age" : 40, "gender" : "M" }

{ "_id" :ObjectId("595b3835853f9091c4bf4058"), "name" :"ouyang", "age" : 80, "course" :"hamagong" }

{ "_id" :ObjectId("595b384f853f9091c4bf4059"), "name" :"yangguo", "age" : 20, "course" :"meivnquan" }

{ "_id" :ObjectId("595b3873853f9091c4bf405a"), "name" :"guojing", "age" : 30, "course" :"xianglong" }

&gt; db.students.count()

5

mongodb documents:

mongodb stores data in the form of documents,which are JSON-like field and value pairs;

documents are analogous to structures in programming languages that associate keys with values,where keys may hold other pairs of keys and values(e.g. dictionaries,hashes,maps,and associative arrays);

formally,mongodb documents are BSON documents,which is a binary representation of JSON with additional type information;

field:value;

document:

stored in collection,think record or row;

can have _id key that works like primarykey in MySQL;

tow options for relationships: subdocumentor db reference;

mongodb collections:

mongodb stores all documents in collections;

a collection is a group of related documents that have a set of shared common indexes;

collections are analogous to a table inrelational databases;

collection:

think table,but with no schema;

for grouping into smaller query sets(speed);

each top entity in your app would have its own collection(users,articles,etc.);

full index support;

database operations: query

in mongodb a query targets a specific collection of documents;

queries specify criteria,or conditions,that identify the documents that mongodb returns to the clients;

a query may include a projection that specifies the fields from the matchina documents to return;

you can optionally modify queries to impose limits,skips,and sort orders;

如:

db.users.find({age: {$gt: 18}}).sort({age:1}})

db.COLLECTION.find({FIELD: {QUERYCRITERIA}}).MODIFIER

query interface:

for query operations,mongodb provide adb.collection.find() method;

the method accepts both the query criteria and projections and returns a cursor to the matching documents

query behavior:

all queries in mongodb address a single collection;

you can modify the query to impose limits,skips,and sort orders;

the order of documents returned by a queryis not defined and is not necessarily consistent unless you specify a sort();

operations that modify existing documents(i.e. updates) use the same query syntax as queries to select documents to update;

in aggregation pipeline,the $match pipelinestage provides access to mongodb queries;

data modification:

data modification refers to operations that create,update,or delete data;

in mongodb,these operations modify the data of a single collection;

all write operations in mongodb are atomicon the level of a single document;

for the update and delete operations,you can specify the criteria to select the documents to update or remove;

RDBMS vs  mongodb

table,view        collection

row                     jsondocument

index                  index

join                     embedded

partition            shard

partition key    shard key

db.collection.find(&lt;query&gt;,&lt;projection&gt;)

类似于sql中的select语句,其中&lt;query&gt;相当于where子句,而&lt;projection&gt;相当于要选定的字段;

如果使用的find()方法中不包含&lt;query&gt;,则意味着要返回对应collection的所有文档;

db.collection.count()方法可统计指定collection中文档的个数;

举例:

&gt;use testdb

&gt; for (i=1;i&lt;=100;i++)

{ db.testColl.insert({hostname:"www"+i+".magedu.com"})}

&gt;db.testColl.find({hostname: {$gt:"www95.magedu.com"}})

find的高级用法:

mongodb的查询操作支持挑选机制有comparison,logical,element,javascript等几类;

比较运算comparison:

$gt;$gte;$lt;$lte;$ne,语法格式:{field:{$gt: VALUE}}

&gt; db.students.find({age: {$gt: 30}})

$in;$nin,语法格式:{field: {$in:[VALUE1,VALUE2]}}

&gt; db.students.find({age: {$in:[20,80]}})

&gt; db.students.find({age: {$in:[20,80]}}).limit(1)

&gt; db.students.find({age: {$in:[20,80]}}).count()

2

&gt; db.students.find({age: {$in: [20,80]}}).skip(1)

组合条件:

逻辑运算:$or或运算;$and与运算;$not非运算;$nor反运算,语法格式:{$or: [{&lt;expression1&gt;},{expression2}]}

&gt; db.students.find({$or: [{age: {$nin:[20,40]}},{age: {$nin: [22,30]}}]})

{ "_id" : ObjectId("595b3816853f9091c4bf4057"),"name" : "jerry", "age" : 40, "gender": "M" }

&gt; db.students.find({$and: [{age: {$nin:[20,40]}},{age: {$nin: [22,30]}}]})

元素查询:

根据文档中是否存在指定的字段进行的查询,$exists;$mod;$type;

$exists,根据指定字段的存在性挑选文档,语法格式:{field:{$exists: &lt;boolean&gt;}},true则返回存在指定字段的文档,false则返回不存在指定字段的文档

$mod,将指定字段的值进行取模运算,并返回其余数为指定值的文档,语法格式:{field: {$mod: [divisor,remainder]}}

$type返回指定字段的值类型为指定类型的文档,语法格式:{field: {$type: &lt;BSON type&gt;}}

注:double,string,object,array,binarydata,undefined,boolean,date,null,regular expression,javascript,timestamp

更新操作:

update()方法可用于更改collection中的数据,默认情况下,update()只更新单个文档,若要一次更新所有符合指定条件的文档,则要使用multi选项;

语法格式:db.collection.update(&lt;query&gt;,&lt;update&gt;,&lt;options&gt;),其中query类似于sql语句中的where,而&lt;update&gt;相当于附带了LIMIT 1的SET,如果在&lt;options&gt;处提供multi选项,则update语句类似于不带LIMIT语句的update();

&lt;update&gt;参数的使用格式比较独特,其仅能包含使用update专有操作符来构建的表达式,其专有操作符大致包含field,array,bitwise,其中filed类常用的操作有:

$set,修改字段的值为新指定的值,语法格式:({field:value},{$set: {field: new_value}})

$unset,删除指定字段,语法格式:({field:value},{$unset: {field1,field2,...}})

$rename,更改字段名,语法格式:($rename:{oldname: newname,oldname: newname})

$inc,增大指定字段的值,语法格式:({field:value},{$inc: {field1: amount}}),其中{field: value}用于指定挑选标准,{$inc: {field1: amount}用于指定要提升其值的字段及提升大小

删除操作:

&gt; db.students.remove({name:"tom"})   #(db.mycoll.remove(query))

WriteResult({ "nRemoved" : 1 })

&gt; db.collections

testdb.collections

&gt; db.students.drop()   #(db.mycoll.drop(),drop thecollection)

true

&gt; db.dropDatabase()   #(删除database)

{ "dropped" : "testdb","ok" : 1 }

&gt; quit()

索引:

索引类型:

B+Tree;

hash;

空间索引;

全文索引;

indexes are special data structures that store a small portion of the collection's set in an easy to traverse form: theindex stores the value of a specific field or set of fields,ordered by thevalue of the field;

mongodb defines indexes at the collection level and supports indexes on any field or sub-field of the documents in amongodb collection;

mongodb的索引类型:

single field indexes单字段索引:

a single field index only includes datafrom a single field of the documents in a collection;

mongodb supports single field indexes on fields at the top level of a document and on fields in sub-documents;

compound indexes组合索引:

a compound index includes more than onfield of the documents in a collection;

multikey indexes多键索引:

a multikey index references an array and records a match if a query includes any value in the array;

geospatial indexes and queries空间索引;

geospatial indexes support location-based searches on data that is stored as either geo JSON objects or legacy coordinate pairs;

text indexes文本索引:

text indexes supports search of string content in documents;

hashed indexes,hash索引: 

hashed indexes maintain entries with hashes of the values of the indexed field;

index creation:

mongodb provides several options that only affect the creation of the index;

specify these options in a document as the second argument to the db.collection.ensureIndex() method:

unique index:db.addresses.ensureIndex({"user_id": 1},{unique: true})

sparse index:db.addresses.ensureIndex({"xmpp_id": 1},{sparse: true})

mongodb与索引相关的方法:

db.mycoll.ensureIndex(field[,options])   #创建索引options有name,unique,dropDups,sparse(稀疏索引)

db.mycoll.dropIndex(INDEX_NAME)   #删除索引

db.mycoll.dropIndexes()

db.mycoll.getIndexes()

db.mycoll.reIndex()   #重建索引

&gt; show dbs;

&gt; use testdb;

&gt; for(i=1;i&lt;=1000;i++){db.students.insert({name: "student"+i,age:(i%120),address: "#85 LeAi Road,ShangHai,China"})}

1000

&gt; db.students.find()   #(每次显示20行,用it继续查看后20行)

Type "it" for more

&gt; db.students.ensureIndex({name: 1})

         "createdCollectionAutomatically": false,

         "numIndexesBefore": 1,

         "numIndexesAfter": 2,

&gt; db.studens.getIndexes()   #当前collect的index个数,索引名为name_1

[

         {

                   "v": 1,

                   "key": {

                            "_id": 1

                   },

                   "name": "_id_",

                   "ns": "test.students"

                            "name": 1

                   "name": "name_1",

         }

]

&gt;db.students.dropIndex("name_1")  #删除索引,索引名要用引号

{ "nIndexesWas" : 2,"ok" : 1 }

&gt; db.students.ensureIndex({name:1},{unique: true})   #唯一键索引

&gt; db.students.getIndexes()

                   "unique": true,

&gt; db.students.insert({name:"student20",age: 20})   #在唯一键索引的基础上,再次插入相同的field时会报错

WriteResult({

         "nInserted": 0,

         "writeError": {

                   "code": 11000,

                   "errmsg": "insertDocument :: caused by :: 11000 E11000 duplicate key error index:test.students.$name_1  dup key: { :\"student20\" }"

})

&gt; db.students.find({name:"student500"})

{ "_id" : ObjectId("595c3a9bcf42a9afdbede7c0"),"name" : "student500", "age" : 20,"address" : "#85 LeAi Road,ShangHai,China" }

&gt; db.students.find({name:"student500"}).explain()   #显示执行过程

         "cursor": "BtreeCursor name_1",

         "isMultiKey": false,

         "n": 1,

         "nscannedObjects": 1,

         "nscanned": 1,

         "nscannedObjectsAllPlans": 1,

         "nscannedAllPlans": 1,

         "scanAndOrder": false,

         "indexOnly": false,

         "nYields": 0,

         "nChunkSkips": 0,

         "millis": 0,

         "indexBounds": {

                   "name": [

                            [

                                     "student500",

                                     "student500"

                            ]

                   ]

         "server": "ebill-redis.novalocal:27017",

         "filterSet": false

&gt; db.students.find({name: {$gt:"student500"}}).explain()

         "n": 552,

         "nscannedObjects": 552,

         "nscanned": 552,

         "nscannedObjectsAllPlans": 552,

         "nscannedAllPlans": 552,

         "nYields": 4,

         "millis": 1,

                                     {

                                     }

]# mongod -h   #分General options;Replicationoptions;Master/slave options;Replica set options;Sharding options;

 --bind_ip arg               commaseparated list of ip addresses to listen on

                              - all local ipsby default

 --port arg                 specify port number - 27017 by default

 --maxConns arg              maxnumber of simultaneous connections - 1000000

                              by default,并发最大连接数

 --logpath arg               logfile to send write to instead of stdout - has

                              to be a file, notdirectory

 --httpinterface             enablehttp interface,web页面监控mongodb统计信息

 --fork                      forkserver process,mongod是否运行在后台

 --auth                      runwith security,使用认证才能访问&gt;db.createUser(Userdocument),&gt;db.auth(user,password)

 --repair                    runrepair on all dbs,上次未正常关闭,用于再启动时修复

 --journal                   enablejournaling,启用事务日志,先将日志顺序写到事务日志再每隔一段时间同步到数据日志文件,将随机IO--&gt;顺序IO提高性能,单实例的mongodb此选项一定要打开,这日保证数据日持久的

 --journalOptions arg       journal diagnostic options

 --journalCommitInterval arg how often to group/batch commit (ms)

调试选项:

 --cpu                      periodically show cpu and iowait utilization

  --sysinfo                   print some diagnostic systeminformation

 --slowms arg (=100)         valueof slow for profile and console log,超过此值的查询都日慢查询

 --profile arg               0=off1=slow, 2=all,性能剖析

Replication options:

 --oplogSize arg       size to use(in MB) for replication op log. default is

                        5% of disk space (i.e.large is good)

Master/slave options (old; use replica setsinstead):

 --master              master mode

 --slave               slave mode

 --source arg          when slave:specify master as &lt;server:port&gt;

 --only arg            when slave:specify a single database to replicate

 --slavedelay arg      specifydelay (in seconds) to be used when applying

                        master ops to slave

 --autoresync         automatically resync if slave data is stale

Replica set options:

 --replSet arg           arg is&lt;setname&gt;[/&lt;optionalseedhostlist&gt;]

 --replIndexPrefetch arg specify index prefetching behavior (if secondary)

                          [none|_id_only|all]

Sharding options:

 --configsvr           declare thisis a config db of a cluster; default port

                        27019; default dir/data/configdb

 --shardsvr            declare thisis a shard db of a cluster; default port

                        27018

mongodb的复制:

两种类型(master/slave;replica sets):

master/slave,与mysql近似,此方式已很少用;

replica sets:

一般为奇数个节点(至少3个节点),可使用arbiter参与选举;

heartbeat2S,通过选举方式实现自动失效转移;

复制集(副本集),可实现自动故障转移,易于实现更高级的部署操作,服务于同一数据集的多个mongodb实例,一个复制集只能有一个主节点(rw)其它都是从节点(r),主节点将数据修改操作保存在oplog中;

arbiter仲裁者,一般至少3个节点,一主两从,从每隔2s与主节点传递一次heartbeat信息,若从在10s内监测不到主节点心跳,会自动在两从节点上重新选举一个为主节点;

replica sets中的节点分类:

0优先级的节点,不会被选举为主节点,但可参与选举过程,冷备节点,可被client读);

被隐藏的从节点,首先是一个0优先级的从节点,且对client不可见;

延迟复制的从节点,首先是一个0优先级的从节点,且复制时间落后于主节点一个固定时长;

arbiter仲裁节点;

mongodb的复制架构:

oplog,大小固定的文件,存储在local数据库中,复制集中,仅主可在本地写oplog,即使从有oplog也不会被使用,只有在从被选举为主时才可在本地写自己的oplog,mysql的二进制日志随着使用会越来越大,而oplog大小固定,在首次启动mongodb时会划分oplog大小(oplog.rs),一般为disk空间的5%,若5%的结果小于1G,则至少被指定为1G;

从节点有如下类型对应主节点的操作:

initial sync初始同步;

post-rollback catch-up回滚后追赶;

sharding chunk migrations切分块迁移;

local库存放了复制集的所有元数据和oplog,local库不参与复制过程,除此之外其它库都被复制到从节点,用于存储oplog的是一个名为oplog.rs的collection,当一个实例加入某复制集后,在第一次启动时被自动创建oplog.rs,oplog.rs的大小依赖于OS及FS,可用选项--oplogSize自定义大小;

test   0.078GB

testdb (empty)

&gt; use local

switched to db local

startup_log

mongodb的数据同步类型:

initial sync初始同步,节点没有任何数据,节点丢失副本,复制历史,初始同步的步骤:克隆所有数据库、应用数据集的所有改变(复制oplog并应用在本地)、为所有collection构建索引;

复制,主上写一点会同步到从;

一致性(仅主可写);

--journal,保证数据持久性,单实例的mongodb一定要打开;

支持多线程复制,对于单个collection仅可单线程复制(在同一个namespace内依然是单线程);

可实现索引预取(数据预取),提高性能;

<a href="https://s2.51cto.com/wyfs02/M02/9B/17/wKioL1ld9LeB156mAABvzV3GJXk799.jpg" target="_blank"></a>

<a href="https://s1.51cto.com/wyfs02/M01/9B/17/wKioL1ld9N3jescqAAA-3dCTOuE908.jpg" target="_blank"></a>

<a href="https://s4.51cto.com/wyfs02/M02/9B/17/wKiom1ld9MahlyyFAABWXhHKdos963.jpg" target="_blank"></a>

purpose of replication:

replication provides redundancy and increases data availability;

with multiple copies of data on different database servers,replication protects a database from the loss of a single server;

replication also allows you to recover from hardware failure and service interruptions;

with additional copies of the data,you candedicate one to disaster recovery,reporting,or backup;

replication in mongodb:

a replica set is a group of mongod instances that host the same data set:

one mongod,the primary,receives all writeoperations ;

all other instances,secondaries,apply operations from the primary so that they have the same data set;

the primary accepts all write operations from clients,replica set can have only one primary:

because only one member can accept write opertations,replica sets provide strict consistency;

to support replication,the primary logs all changes to its data sets in its oplog;

the secondaries replicate the primary's oplog and apply the operations to their data sets;

secondaries data sets reflect the primary'sdata set;

if the primary is unavailable,the replica set will elect a secondary to be primary;

by default,clients read from the primary,however,clients can specify a read preferences to send read operations to secondaries;

arbiter:

you may add an extra mongod instance areplica set as an arbiter;

arbiters do not maintain a dataset,arbiters only exist to vote in elections;

if your replica set has an even number of members,add an arbiter to obtain a majority of votes in an election for primary;

arbiters do not require dedicated hardware;

automatic failover:

when a primary does not communicate with the other members of the set for more than 10 seconds,the replica set will attempt to select another member to become the new primary;

the first secondary that receives a majority of the votes becomes primary;

priority 0 replica set members:

a priority 0 member is a secondary that cannot become primary;

priority 0 members cannot trigger elections:

otherwise these members function as normal secondaries;

a priority 0 member maintains a copy of the data set,accepts read opertaions,and votes in elections;

configure a priority 0 member to prevent secondaries from becoming primary,which is particularyly useful in multi-datacenter deployments;

in a three-member replica set, in one datacenter hosts the primary and a secondary;

a second data center hosts one priority 0 member that cannot become primary;

复制集重新选举的影响条件:

heartbeat;

priority(优先级,默认为1,0优先级的节点不能触发选举操作);

optime;

网络连接;

网络分区;

选举机制:

触发选举的事件:

新复制集初始化时;

从节点联系不到主节点时;

主节点下台(不可用)时,&gt;rs.stepdown(),某从节点有更高的优先级且已满足成为主节点其它所有条件,主节点无法联系到副本集的多数方;

操作:

172.168.101.239   #master

172.168.101.233   #secondary

172.168.101.221   #secondary

]# vim /etc/mongod.conf   #(三个节点都用此配置)

fork=true

# in replicated mongo databases, specifythe replica set name here

#replSet=setname

replSet=testSet   #(复制集名称,集群名称)

replIndexPrefetch=_id_only   #(specify indexprefetching behavior (if secondary) [none|_id_only|all],副本索引预取,使复制更高效,仅在secondary上有效)

]# mongo --host 172.168.101.239   #(在primary上操作)

test  0.078GB

&gt; rs.status()

         "startupStatus": 3,

         "info": "run rs.initiate(...) if not yet done for the set",

         "ok": 0,

         "errmsg": "can't get local.system.replset config from self or any seed(EMPTYCONFIG)"

&gt; rs.initiate()

         "info2": "no configuration explicitly specified -- making one",

         "me": "172.168.101.239:27017",

         "info": "Config now saved locally.  Shouldcome online in about a minute.",

testSet:PRIMARY&gt;rs.add("172.168.101.233:27017")

{ "ok" : 1 }

testSet:PRIMARY&gt; rs.add("172.168.101.221:27017")  #(另可用rs.addArb(hostportstr)添加仲裁节点)

testSet:PRIMARY&gt; rs.status()

         "set": "testSet",

         "date": ISODate("2017-07-05T07:22:59Z"),

         "myState": 1,

         "members": [

                   {

                            "_id": 0,

                            "name": "172.168.101.239:27017",

                            "health": 1,

                            "state": 1,

                            "stateStr": "PRIMARY",

                            "uptime": 970,

                            "optime": Timestamp(1499238843, 1),

                            "optimeDate": ISODate("2017-07-05T07:14:03Z"),

                            "electionTime": Timestamp(1499238673, 2),

                            "electionDate": ISODate("2017-07-05T07:11:13Z"),

                            "self": true

                            "_id": 1,

                            "name": "172.168.101.233:27017",

                            "state": 2,

                            "stateStr": "SECONDARY",

                            "uptime": 557,

                            "lastHeartbeat": ISODate("2017-07-05T07:22:57Z"),

                            "lastHeartbeatRecv": ISODate("2017-07-05T07:22:57Z"),

                            "pingMs": 1,

                            "syncingTo": "172.168.101.239:27017"

                            "_id": 2,

                            "name": "172.168.101.221:27017",

                            "uptime": 536,

                            "pingMs": 0,

                   }

         ],

testSet:PRIMARY&gt; use testdb

testSet:PRIMARY&gt;db.students.insert({name: "tom",age: 22,course: "hehe"})

]# mongo --host 172.168.101.233   #(在secondary上操作)

testSet:SECONDARY&gt; rs.slaveOk()   #(执行此操作后才可查询)

testSet:SECONDARY&gt; rs.status()

         "date": ISODate("2017-07-05T07:24:54Z"),

         "myState": 2,

         "syncingTo": "172.168.101.239:27017",

                            "uptime": 669,

                            "lastHeartbeat": ISODate("2017-07-05T07:24:53Z"),

                            "lastHeartbeatRecv" :ISODate("2017-07-05T07:24:53Z"),

                            "electionDate": ISODate("2017-07-05T07:11:13Z")

                            "uptime": 1292,

                            "uptime": 649,

                            "lastHeartbeatRecv": ISODate("2017-07-05T07:24:53Z"),

                            "pingMs": 5,

testSet:SECONDARY&gt; use testdb

testSet:SECONDARY&gt; db.students.find()

{ "_id" :ObjectId("595c95095800d50614819643"), "name" :"tom", "age" : 22, "course" : "hehe" }

]# mongo --host 172.168.101.221   #(在另一secondary上操作)

testSet:SECONDARY&gt; db.students.find()   #(未先执行rs.slaveOk())

error: { "$err" : "notmaster and slaveOk=false", "code" : 13435 }

testSet:SECONDARY&gt; rs.slaveOk()

testSet:SECONDARY&gt;db.students.insert({name: jowin,age: 30,course: "mage"})   #(只可在primary上写)

2017-07-05T15:37:05.069+0800ReferenceError: jowin is not defined

testSet:PRIMARY&gt; rs.stepDown()   #(在primary上操作,step downas primary (momentarily) (disconnects),模拟让主下线)

2017-07-05T15:38:46.819+0800DBClientCursor::init call() failed

2017-07-05T15:38:46.820+0800 Error: errordoing query: failed at src/mongo/shell/query.js:81

2017-07-05T15:38:46.821+0800 tryingreconnect to 172.168.101.239:27017 (172.168.101.239) failed

2017-07-05T15:38:46.822+0800 reconnect172.168.101.239:27017 (172.168.101.239) ok

         "date": ISODate("2017-07-05T07:39:46Z"),

         "syncingTo": "172.168.101.221:27017",

                            "stateStr" :"SECONDARY",

                            "uptime": 1977,

                            "optime": Timestamp(1499239689, 1),

                            "optimeDate": ISODate("2017-07-05T07:28:09Z"),

                            "infoMessage": "syncing to: 172.168.101.221:27017",

                            "uptime": 1564,

                            "lastHeartbeat": ISODate("2017-07-05T07:39:45Z"),

                            "lastHeartbeatRecv": ISODate("2017-07-05T07:39:45Z"),

                            "lastHeartbeatMessage": "syncing to: 172.168.101.239:27017",

                            "stateStr" :"PRIMARY",

                            "uptime": 1543,

                            "lastHeartbeat": ISODate("2017-07-05T07:39:46Z"),

                            "lastHeartbeatRecv": ISODate("2017-07-05T07:39:46Z"),

                            "electionTime": Timestamp(1499240328, 1),

                            "electionDate": ISODate("2017-07-05T07:38:48Z")

testSet:SECONDARY&gt;    #(在新选举的primary上操作)

testSet:PRIMARY&gt;

testSet:PRIMARY&gt;db.printReplicationInfo()   #(查看oplog大小及时间)

configured oplog size:   1180.919921875MB

log length start to end: 846secs (0.24hrs)

oplog first event time:  Wed Jul 05 2017 15:14:03 GMT+0800 (CST)

oplog last event time:   Wed Jul 05 2017 15:28:09 GMT+0800 (CST)

now:                    Wed Jul 05 201715:42:57 GMT+0800 (CST)

testSet:SECONDARY&gt; quit()   #(将172.168.101.239停服务,当前的primary在10S内连不上,会将其health标为0)

[root@ebill-redis ~]# /etc/init.d/mongodstop

         "date": ISODate("2017-07-05T07:51:58Z"),

                            "health" : 0,

                            "state": 8,

                            "stateStr": "(not reachable/healthy)",

                            "uptime": 0,

                            "lastHeartbeat": ISODate("2017-07-05T07:51:57Z"),

                            "lastHeartbeatRecv": ISODate("2017-07-05T07:44:24Z"),

                            "syncingTo": "172.168.101.221:27017"

testSet:PRIMARY&gt; cfg=rs.conf()   #(将172.168.101.233从节点的优先级调为2,使之触发选举时成为主,立即生效)

         "_id": "testSet",

         "version": 3,

                            "host": "172.168.101.239:27017"

                            "host": "172.168.101.233:27017"

                            "host": "172.168.101.221:27017"

         ]

testSet:PRIMARY&gt;cfg.members[1].priority=2

testSet:PRIMARY&gt; rs.reconfig(cfg)   #(经测试此步骤可不用执行,在执行&gt;cfg.members[1].priority=2,优先级最高的节点将直接成为primary)

2017-07-05T16:04:10.101+0800DBClientCursor::init call() failed

2017-07-05T16:04:10.103+0800 tryingreconnect to 172.168.101.221:27017 (172.168.101.221) failed

2017-07-05T16:04:10.104+0800 reconnect172.168.101.221:27017 (172.168.101.221) ok

reconnected to server after rs command (whichis normal)

testSet:PRIMARY&gt; cfg=rs.conf()   #(添加仲裁节点,若当前的secondary数据已复制完成则不能修改此属性值,要先&gt;rs.remove(hostportstr)再&gt;rs.addArb(hostportstr)

         "version": 6,

                            "host": "172.168.101.239:27017",

                            "priority": 3

                            "host": "172.168.101.233:27017",

                            "priority": 2

testSet:PRIMARY&gt;cfg.members[2].arbiterOnly=true

testSet:PRIMARY&gt; rs.reconfig(cfg)

         "errmsg": "exception: arbiterOnly may not change for members",

         "code": 13510,

         "ok": 0

testSet:PRIMARY&gt; rs.printSlaveReplicationInfo()

source: 172.168.101.233:27017

         syncedTo:Wed Jul 05 2017 16:08:07 GMT+0800 (CST)

         0secs (0 hrs) behind the primary

source: 172.168.101.221:27017

mongodb sharding:

数据量大导致单机性能不足;

mysql的分片借助Gizzard(twitter)、HiveDB、MySQLproxy+HSCALE、Hibernate shard(google)、Pyshard;

写离散、读要集中;

只要是集群,各node间时间要同步;

分片架构中的角色:

mongos  #router or proxy,生产中mongos要作HA;

config server   #元数据服务器,元数据实际是collection的index,生产中configserver3台,能实现内部选举;

shard  #数据节点,即mongod实例节点,生产中至少3台;

<a href="https://s4.51cto.com/wyfs02/M00/9B/17/wKioL1ld9bjQKgBZAABn6jIFzTU829.jpg" target="_blank"></a>

range  #基于范围切片

list  #基于列表分片

hash  #基于hash分片

sharding is the process of storing datarecords across multiple machines and is mongdb's approach to meeting thedemands of data growth;

as the size of the data increases,a singlemachine may not be sufficient to store the data nor provide an acceptable readand write throughput;

sharding solves the problem with horizontalscaling;

with sharding,you add more machines tosupport data growth and the demands of read and write operation;

purpose of sharding:

database systems with large data sets andhigh throughput applications can challenge the capacity of a single server;

high query rates can exhaust the cpucapacity of the server;

larger data sets exceed the storagecapacity of a single machine;

finally,working set sizes larger than thesystem's ram stress the IO capacity of disk drives;

<a href="https://s3.51cto.com/wyfs02/M01/9B/17/wKiom1ld9cegDMNNAABOEX747E8024.jpg" target="_blank"></a>

<a href="https://s1.51cto.com/wyfs02/M01/9B/17/wKioL1ld9dbRv_CmAACFK2NhKAk221.jpg" target="_blank"></a>

sharding in mongodb:

sharded cluster has the following components:shards,query,routers and config server:

shards store the data. to provide high availability and data consistency,in a production sharded cluster,each shard is a replicaset;

query routers,or mongos instances,interface with client applications and direct operations to the appropriate shard orshards;

the query router processes and targets operations to shards and then returns results to the clients;

a sharded cluster can containn more than one query router to divide the client request load;

a client sends requests to one query router;

most sharded cluster have many query routers;

config servers store the clsuter's metadata:

this data contains a mapping of the cluster's data set to the shards;

the query router uses this metadata to target opertations to specific shards;

production sharded clusters have exactly 3 config server;

data pratitioning:

shard keys:

to shard a collection,you need to select ashard key;

a shard key is either an indexed field oran indexed compound field that exists in every document in the collection;

mongodb divides the shard key values into chunks and distributes the chunks evenly across the shards;

to divide the shard key values into chunks,mongodb uses either range based partitioning and hash based partitioning;

range based sharding:

for range-based sharding,mongodb divides the data set into ranges determined by the shard key values to provide range based partitioning;

hash based sharding:

for hash based partitioning,mongodb computes a hash of a field's value,and then uses these hashes to create chunks;

performance distinctions between range and hash based partitioning:

range based partitioning supports more efficient range queries;

however,range based partitioning can result in an uneven distribution of data,which may negate some of the benefits of sharding;

hash based partitioning,by contrast,ensures an even distribution of data at the expense of efficient range queries;

but random distribution makes it more likely that a range query on the shard key will not be able to target a fewshards but would more likely query every shard in order to return a result;

maintaining a balanced data distribution:

mongodb ensures a balanced cluster usingtwo background process: splitting and the balancer

splitting:

a background process that keeps chunks from growing too large;

when a chunk grows beyond a specified chunksize,mongodb splits the chunk in half;

inserts and updates triggers splits;

splits are a efficient meta-data change;

to create splits,mongodb does not migrateany data or affect the shards;

balnacing:

the balancer is a background process that manages chunk migrations;

the balancer runs in all of ther queryrouters in a cluster;

when the distribution of a shardedcollection in a cluster is uneven,the balanacer process migrates chunks fromthe shard that has the largest number of chunks to the shard with the lesatnumber of chunks until the collection balances;

the shards mange chunk migrations as a background operation;

during migration,all requests for a chunks data address the origin shard;

primary shard:

every database has a primary shard that holds all the un-sharded collections in that database;

broadcast operations:

mongos instances broadcast queries to all shards for the collection unless the mogos can determine which shard or subset of shards stores this data;

targeted operations:

all insert() operations target to one shard;

all single update(),including upsert operations,and remove() operations must target to one shard;

sharded cluster metadata:

config servers store the metadata for asharded cluster:

the metadata reflects state and organization of the sharded data sets and system;

the metadata includes the list of chunks one very shard and the ranges that define the chunks;

the mongos instances cache this data anduse it to route read and write operations to shards;

config servers store the metadata in the config database;

172.168.101.228   #mongos

172.168.101.239   #config server

172.168.101.233&amp;172.168.101.221   #shard{1,2}

在config server操作:

configsvr=true

httpinterface=true

]# ss -tnl | egrep "27019|28019"

LISTEN    0      128         172.168.101.239:27019                    *:*    

LISTEN    0      128         172.168.101.239:28019                    *:*

在shard{1,2}操作:

bind_ip=172.168.101.233

LISTEN    0      128         172.168.101.233:27017                    *:*    

LISTEN    0      128         172.168.101.233:28017                    *:*

在mongos操作:

]# yum -y localinstall mongodb-org-mongos-2.6.4-1.x86_64.rpm mongodb-org-server-2.6.4-1.x86_64.rpm mongodb-org-shell-2.6.4-1.x86_64.rpm mongodb-org-tools-2.6.4-1.x86_64.rpm   #(仅mongos节点要安装mongodb-org-mongos-2.6.4-1.x86_64.rpm,其它节点仅安装server,shell,tools)

]# mongos --configdb=172.168.101.239:27019 --logpath=/var/log/mongodb/mongos.log --httpinterface --fork

2017-07-06T15:00:32.926+0800 warning:running with 1 config server should be done only for testing purposes and isnot recommended for production

about to fork child process, waiting untilserver is ready for connections.

forked process: 5349

child process started successfully, parentexiting

LISTEN    0      128                       *:27017                    *:*    

LISTEN    0      128                       *:28017                    *:*  

]# mongo --host 172.168.101.228

connecting to: 172.168.101.228:27017/test

For interactive help, type "help".

For more comprehensive documentation, see

         http://docs.mongodb.org/

Questions? Try the support group

         http://groups.google.com/group/mongodb-user

mongos&gt; sh.help()

mongos&gt; sh.status()

--- Sharding Status ---

 sharding version: {

         "_id": 1,

         "version": 4,

         "minCompatibleVersion": 4,

         "currentVersion": 5,

         "clusterId": ObjectId("595ddf9b52f218b1b6f9fcf2")

 shards:

 databases:

         {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }

mongos&gt;sh.addShard("172.168.101.233")

{ "shardAdded" :"shard0000", "ok" : 1 }

mongos&gt;sh.addShard("172.168.101.221")

{ "shardAdded" :"shard0001", "ok" : 1 }

         {  "_id" : "shard0000",  "host" :"172.168.101.233:27017" }

         {  "_id" : "shard0001",  "host" :"172.168.101.221:27017" }

mongos&gt;sh.enableSharding("testdb")   #(在testdb库上启用sharding功能)

         {  "_id" : "shard0001",  "host" : "172.168.101.221:27017"}

         {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0001" }

         {  "_id" : "testdb",  "partitioned" : true,  "primary" : "shard0001" }

mongos&gt; for (i=1;i&lt;=10000;i++){db.students.insert({name: "student"+i,age: (i%120),classes:"class"+(i%10),address: "www.magedu.com"})}

mongos&gt; db.students.find().count()

10000

mongos&gt;sh.shardCollection("testdb.students",{age: 1})

         "proposedKey": {

                   "age": 1

         "curIndexes": [

                            "v": 1,

                            "key": {

                                     "_id": 1

                            },

                            "name": "_id_",

                            "ns": "testdb.students"

         "errmsg": "please create an index that starts with the shard key beforesharding."

sh.moveChunk(fullName,find,to)            move the chunk where 'find' is to'to' (name of shard),手动均衡,不建议使用

mongos&gt; use admin

switched to db admin

mongos&gt;db.runCommand("listShards")

         "shards": [

                            "_id": "shard0000",

                  {

                            "_id": "shard0001",

mongos&gt; sh.isBalancerRunning()   #(均衡器仅在工作时才显示true)

false

mongos&gt; sh.getBalancerState()

mongos&gt; db.printShardingStatus()   #(同&gt;sh.status())

         sh.help()                    sharding helpers

         helpadmin                   administrativehelp

         helpconnect                 connecting to adb help

         helpkeys                    key shortcuts

         helpmisc                    misc things to know

         helpmr                      mapreduce

         showdbs                     show databasenames

         showcollections             show collectionsin current database

         showusers                   show users incurrent database

         showprofile                 show most recentsystem.profile entries with time &gt;= 1ms

         showlogs                    show theaccessible logger names

         showlog [name]              prints out thelast segment of log in memory, 'global' is default

         use&lt;db_name&gt;                set current database

         db.foo.find()                list objects in collection foo

         db.foo.find({ a : 1 } )     list objects in foo wherea == 1

         it                           result of the lastline evaluated; use to further iterate

         DBQuery.shellBatchSize= x   set default number of items todisplay on shell

         exit                         quit the mongo shell

DB methods:

         db.adminCommand(nameOrDocument)- switches to 'admin' db, and runs command [ just calls db.runCommand(...) ]

         db.auth(username,password)

         db.cloneDatabase(fromhost)

         db.commandHelp(name)returns the help for the command

         db.copyDatabase(fromdb,todb, fromhost)

         db.createCollection(name,{ size : ..., capped : ..., max : ... } )

         db.createUser(userDocument)

         db.currentOp()displays currently executing operations in the db

         db.dropDatabase()

         db.eval(func,args) run code server-side

         db.fsyncLock()flush data to disk and lock server for backups

         db.fsyncUnlock()unlocks server following a db.fsyncLock()

         db.getCollection(cname)same as db['cname'] or db.cname

         db.getCollectionNames()

         db.getLastError()- just returns the err msg string

         db.getLastErrorObj()- return full status object

         db.getMongo()get the server connection object

         db.getMongo().setSlaveOk()allow queries on a replication slave server

         db.getName()

         db.getPrevError()

         db.getProfilingLevel()- deprecated

         db.getProfilingStatus()- returns if profiling is on and slow threshold

         db.getReplicationInfo()

         db.getSiblingDB(name)get the db at the same server as this one

         db.getWriteConcern()- returns the write concern used for any operations on this db, inherited fromserver object if set

         db.hostInfo()get details about the server's host

         db.isMaster()check replica primary status

         db.killOp(opid)kills the current operation in the db

         db.listCommands()lists all the db commands

         db.loadServerScripts()loads all the scripts in db.system.js

         db.logout()

         db.printCollectionStats()

         db.printReplicationInfo()

         db.printShardingStatus()

         db.printSlaveReplicationInfo()

         db.dropUser(username)

         db.repairDatabase()

         db.resetError()

         db.runCommand(cmdObj)run a database command.  if cmdObj is astring, turns it into { cmdObj : 1 }

         db.serverStatus()

         db.setProfilingLevel(level,&lt;slowms&gt;)0=off 1=slow 2=all

         db.setWriteConcern(&lt;write concern doc&gt; ) - sets the write concern for writes to the db

         db.unsetWriteConcern(&lt;write concern doc&gt; ) - unsets the write concern for writes to the db

         db.setVerboseShell(flag)display extra information in shell output

         db.shutdownServer()

         db.stats()

         db.version()current version of the server

DBCollection help

         db.mycoll.find().help()- show DBCursor help

         db.mycoll.count()

         db.mycoll.copyTo(newColl)- duplicates collection by copying all documents to newColl; no indexes arecopied.

         db.mycoll.convertToCapped(maxBytes)- calls {convertToCapped:'mycoll', size:maxBytes}} command

         db.mycoll.dataSize()

         db.mycoll.distinct(key ) - e.g. db.mycoll.distinct( 'x' )

         db.mycoll.drop()drop the collection

         db.mycoll.dropIndex(index)- e.g. db.mycoll.dropIndex( "indexName" ) or db.mycoll.dropIndex( {"indexKey" : 1 } )

         db.mycoll.dropIndexes()

         db.mycoll.ensureIndex(keypattern[,options])- options is an object with these possible fields: name, unique, dropDups

         db.mycoll.reIndex()

         db.mycoll.find([query],[fields])- query is an optional query filter. fields is optional set of fields toreturn.

                                                      e.g. db.mycoll.find( {x:77} , {name:1, x:1} )

         db.mycoll.find(...).count()

         db.mycoll.find(...).limit(n)

         db.mycoll.find(...).skip(n)

         db.mycoll.find(...).sort(...)

         db.mycoll.findOne([query])

         db.mycoll.findAndModify({ update : ... , remove : bool [, query: {}, sort: {}, 'new': false] } )

         db.mycoll.getDB()get DB object associated with collection

         db.mycoll.getPlanCache()get query plan cache associated with collection

         db.mycoll.getIndexes()

         db.mycoll.group({ key : ..., initial: ..., reduce : ...[, cond: ...] } )

         db.mycoll.insert(obj)

         db.mycoll.mapReduce(mapFunction , reduceFunction , &lt;optional params&gt; )

         db.mycoll.aggregate([pipeline], &lt;optional params&gt; ) - performs an aggregation on acollection; returns a cursor

         db.mycoll.remove(query)

         db.mycoll.renameCollection(newName , &lt;dropTarget&gt; ) renames the collection.

         db.mycoll.runCommand(name , &lt;options&gt; ) runs a db command with the given name where the firstparam is the collection name

         db.mycoll.save(obj)

         db.mycoll.stats()

         db.mycoll.storageSize()- includes free space allocated to this collection

         db.mycoll.totalIndexSize()- size in bytes of all the indexes

         db.mycoll.totalSize()- storage allocated for all data and indexes

         db.mycoll.update(query,object[, upsert_bool, multi_bool]) - instead of two flags, you can pass anobject with fields: upsert, multi

         db.mycoll.validate(&lt;full&gt; ) - SLOW

         db.mycoll.getShardVersion()- only for use with sharding

         db.mycoll.getShardDistribution()- prints statistics about data distribution in the cluster

         db.mycoll.getSplitKeysForChunks(&lt;maxChunkSize&gt; ) - calculates split points over all chunks and returnssplitter function

         db.mycoll.getWriteConcern()- returns the write concern used for any operations on this collection,inherited from server/db if set

         db.mycoll.setWriteConcern(&lt;write concern doc&gt; ) - sets the write concern for writes to thecollection

         db.mycoll.unsetWriteConcern(&lt;write concern doc&gt; ) - unsets the write concern for writes to thecollection

&gt; db.mycoll.find().help()

find() modifiers

         .sort({...} )

         .limit(n )

         .skip(n )

         .count(applySkipLimit)- total # of objects matching query. by default ignores skip,limit

         .size()- total # of objects cursor would return, honors skip,limit

         .explain([verbose])

         .hint(...)

         .addOption(n)- adds op_query options -- see wire protocol

         ._addSpecial(name,value) - http://dochub.mongodb.org/core/advancedqueries#AdvancedQueries-Metaqueryoperators

         .batchSize(n)- sets the number of docs to return per getMore

         .showDiskLoc()- adds a $diskLoc field to each returned object

         .min(idxDoc)

         .max(idxDoc)

         .comment(comment)

         .snapshot()

         .readPref(mode,tagset)

Cursor methods

         .toArray()- iterates through docs and returns an array of the results

         .forEach(func )

         .map(func )

         .hasNext()

         .next()

         .objsLeftInBatch()- returns count of docs left in current batch (when exhausted, a new getMorewill be issued)

         .itcount()- iterates through documents and counts them

         .pretty()- pretty print each document, possibly over multiple lines

&gt; rs.help()

         rs.status()                     { replSetGetStatus : 1 }checks repl set status

         rs.initiate()                   { replSetInitiate : null }initiates set with default settings

         rs.initiate(cfg)                { replSetInitiate : cfg }initiates set with configuration cfg

         rs.conf()                       get the currentconfiguration object from local.system.replset

         rs.reconfig(cfg)                updates the configuration of arunning replica set with cfg (disconnects)

         rs.add(hostportstr)             add a new member to the set withdefault attributes (disconnects)

         rs.add(membercfgobj)            add a new member to the set withextra attributes (disconnects)

         rs.addArb(hostportstr)          add a new member which isarbiterOnly:true (disconnects)

         rs.stepDown([secs])             step down as primary (momentarily)(disconnects)

         rs.syncFrom(hostportstr)        make a secondary to sync from the givenmember

         rs.freeze(secs)                 make a node ineligible tobecome primary for the time specified

         rs.remove(hostportstr)          remove a host from the replica set(disconnects)

         rs.slaveOk()                    shorthand fordb.getMongo().setSlaveOk()

         rs.printReplicationInfo()       check oplog size and time range

         rs.printSlaveReplicationInfo()  check replica set members and replication lag

         db.isMaster()                   check who is primary

         reconfigurationhelpers disconnect from the database so the shell will display

         anerror, even if the command succeeds.

         seealso http://&lt;mongod_host&gt;:28017/_replSet for additional diagnostic info

         sh.addShard(host )                       server:portOR setname/server:port

         sh.enableSharding(dbname)                 enables sharding on thedatabase dbname

         sh.shardCollection(fullName,key,unique)   shards the collection

         sh.splitFind(fullName,find)               splits the chunk that find is inat the median

         sh.splitAt(fullName,middle)               splits the chunk that middle isin at middle

         sh.moveChunk(fullName,find,to)            move the chunk where 'find' is to'to' (name of shard)

         sh.setBalancerState(&lt;bool on or not&gt; )   turns thebalancer on or off true=on, false=off

         sh.getBalancerState()                     return true if enabled

         sh.isBalancerRunning()                    return true if the balancerhas work in progress on any mongos

         sh.addShardTag(shard,tag)                 adds the tag to the shard

         sh.removeShardTag(shard,tag)              removes the tag from the shard

         sh.addTagRange(fullName,min,max,tag)      tags the specified range of the givencollection

         sh.status()                               prints a generaloverview of the cluster

《NoSQL数据库入门》

《MongoDB权威指南》

《深入学习MongoDB》

学习模式:

基本概念;

安装;

数据类型与数据模型;

与应用程序对接;

shell操作命令;

主从复制;

分片;

管理维护;

应用案例;

mongodb:

面向文档的NoSQL;

<a href="https://www.mongodb.com/" target="_blank">https://www.mongodb.com/</a>

名称源自humongous(巨大无比);

C++编写,支持linux/windows(32bit/64bit),solaris64bit,OS X64bit;

10gen(云平台)研发的面向文档的数据库(2007年10月,mongodb由10gen团队所发展,2009年2月首度推出),目前最新版3.4.2;

开源软件,GPL许可,也有10gen颁发的商业许可(具有商业性质的企业级支持、培训和咨询服务);

https://repo.mongodb.com/yum/redhat/

mongodb是NoSQL中发展最好的一款,最接近关系型DB,并且最有可能取而代之它;

mongodb基本不需要多做做优化,只要掌握索引就行,它自身的自动分片集群会化解压力;

mongodb查询语句是在java script基础上发展而来;

LAMP--&gt;LNMP(nginx,mongodb,python)

hadoop和redis是由个人开发的,mongodb是由公司开发的;

技术特点:

面向文档的存储引擎,可方便支持非结构化数据;

全面的索引支持,可在任意属性上建立索引;

数据库本身内置的复制与高可用;

数据库本身支持的自动化分片集群;

丰富的基于文档的查询功能;

原子化的查询操作;

支持map/reduce;

GridFS;

非结构化数据(举例:调查表,公司设备管理):

不能确定表的列结构的数据(而SQL,sturcture query language结构化查询语言,先确定表结构即列);

误区(认为多媒体数据、大数据是非结构化数据);

非结构化的烦恼(无法固定模式/模型;数据结构持续变化中;数据库管理员和开发人员的压力被非必须的扩大);

文档(行)

可理解为调查表,一行是一个调查表,由多个"key:value"(“属性名:值”)组成,在多行中,可给不固定的key(属性)加上索引,value(值)有数据类型区分;

{"foo":3,"greeting":"Hello,world!"}

数值:

{"foo":3}

字符串:

{"foo":"3"}

区分大小写:

{"Foo":3}

key不能重复:

{"greeting":"Helloworld!","greeting":"Hello,MongoDB!"}

文档中可以嵌入文档

集合:

集合就是一组文档;

文档类似于关系库里的行;

集合类似于关系库里的表;

集合是无模式的,即集合中的文档可以五花八门,无需固定结构;

集合的命名;

数据库:

数据库是由多个集合组成;

数据库的命名与命名空间规则,见《MongoDB the definitive guide.pdf》;

特殊数据库;

集合的命名:

a collection is identified by itsname.collection names can be any utf-8 string,with a few resrictions:

1.the empty string("") is not avalid collection name;

2.collection names may not contain thecharacter \0(the null character) because this delineates the end of acollection name;

3.you should not create any collectionsthat start with system,a prefix reserved for system collections,for example,thesystem .users collection contains the database's users,and the systemnamespaces collection contains information about all of the databases'scollections.

4.user-created collections should notcontain the reserved characters $ in the name. the various drivers avaiousdrivers available for the database do support using $ in collection namesbecause some system-generated collections contain it. you should not use $ in aname unless you are accessing one of these collections.

数据库命名:

1.the empty string("") is not avalid database name;

2.a database name cannot contain any ofthese characters:' '(single space)、-、$、/、\、\0(the nullcharacter);

3.database names should be all lowercase;

4.database names are limited to a maximumof 64bytes;

缺省库:

admin(this is the "root" database,in terms of authentication.ifa user is added to the admin database,the user automatically inheritspermissions for all databases,there are also certain server-wide commands thatcan be run only from the admin database; such as listing all of the databasesor shutting down server);

local(the database will never be replicated and can be used to store anycollections that should be local to a single server(see chapter 9 for moreinformation about replication and the local database));

config(when mongo is being used in a sharded set up (see chapter 10),theconfig database is used internally to store information about the shards);

<a href="https://s3.51cto.com/wyfs02/M00/9B/18/wKioL1ld-jjzIilrAABsBgcrao8250.jpg" target="_blank"></a>

condor cluster,秃鹰群,2010-12-7,美国空军研究实验室近日研制出一台号称美国国防部速度最快的超级计算机--“秃鹰群”。这台超级计算机由1760台索尼PS3游戏机组成,运算性能可达每秒500万亿次浮点运算,它也是世界上第33大超级计算机;“秃鹰群”超级计算机项目开始于4年前,当时每台PS游戏机价值约400美元,组成超级计算机核心的PS3游戏机共花费大约为200万美元,美国空军研究实验室高性能计算部主任马克-巴内尔表示,这样的花费只相当于利用成品计算机部件组装的同等系统花费的5%到10%;基于PS3游戏机的超级计算机的另一个优点就是它的能效,它所消耗的能量仅仅相当于类似超级计算机的10%。除了PS3游戏机外,这台超级计算机还包括了168个独立的图像处理单元和84个协调服务器;

<a href="https://s2.51cto.com/wyfs02/M01/9B/18/wKiom1ld-kuCRkUdAABps2JFa60675.jpg" target="_blank"></a>

“天河一号”超级计算机由中国国防科学技术大学研制,部署在天津的国家超级计算机中心,其测试运算速度可以达到每秒2570万亿次。2009年10月29日,作为第一台国产千万亿次超级计算机“天河一号”在湖南长沙亮相;该超级计算机峰值性能速度能达到每秒1206万亿次双精度浮点数,而LINPACK实测性能速度达到每秒563.1万亿次;10亿人民币;

mongodb查询:

操作

SQL

mongodb

所有记录

SELECT * FROM users;

&gt;db.users.find()

age=33的记录

&gt;SELECT * from users WHERE age=33;

&gt;db.users.find({age:33});

子键(字段)筛选

&gt;SELECT a,b from users WHERE age=33;

db.users.find({age:33},{a:1,b:1});

排序

&gt;SELECT * FROM users WHERE age=33  ORDER BY name;

db.users.find({age:33}).sort({name:1});

比大小

&gt;SELECT * FROM users WHERE age&gt;33;

db.users.find({age:{$gt:33}});

正则(模糊匹配)

&gt;SELECT * FROM users WHERE name LIKE  "Joe%";

&gt;db.users.find({name:/^Joe/});

忽略,限制

&gt;SELECT * FROM users LIMIT 10 SKIP 20;

&gt;db.users.find().limit(10).skip(20);

or操作

&gt;SELECT * FROM users WHERE a=1 or b=2;

&gt;db.users.find({$or:[{a:1},{b:2}]);

仅返回1条

&gt;SELECT * FROM users LIMIT 1;

&gt;db.users.findOne();

distinct聚合

&gt;SELECT DISTINCT last_name FROM users;

db.users.distinct("Last_name");

count聚合

&gt;SELECT COUNT(AGE) from users;

db.users.find({age:{"$exists":true}}).count();

查询计划

&gt;EXPLAIN SELECT * FROM users WHERE  z=3;

&gt;db.users.find({z:3}).explain();

本文转自 chaijowin 51CTO博客,原文链接:http://blog.51cto.com/jowin/1945066,如需转载请自行联系原作者