天天看點

mongodb更新某個字段_MongoDB 哈希分片Hashed Sharding 哈希分片

Hashed Sharding 哈希分片

Hashed sharding uses a hashed index to partition data across your shared cluster. Hashed indexes compute the hash value of a single field as the index value; this value is used as your shard key. [1]

哈希分片使用哈希索引來在分片叢集中對資料進行劃分。哈希索引計算某一個字段的哈希值作為索引值,這個值被用作片鍵。

mongodb更新某個字段_MongoDB 哈希分片Hashed Sharding 哈希分片

Hashed sharding provides more even data distribution across the sharded cluster at the cost of reducing Targeted Operations vs. Broadcast Operations. Post-hash, documents with “close” shard key values are unlikely to be on the same chunk or shard - the 

mongos

 is more likely to perform Broadcast Operations to fulfill a given ranged query. 

mongos

 can target queries with equality matches to a single shard.

哈希分片以減少定向操作和增加廣播操作為代價,分片叢集内的資料分布更加均衡。在哈希之後,擁有比較“接近”的片鍵的文檔将不太可能會分布在相同的資料庫或者分片上。mongos更有可能執行廣播操作來完成一個給定的範圍查詢。相對的,mongos可以将等值比對的查詢直接定位到單個分片上。

TIP 注意:

MongoDB automatically computes the hashes when resolving queries using hashed indexes. Applications do not need to compute hashes.

當使用哈希索引來解析查詢時,MongoDB會自動計算哈希值。應用程式不需要計算哈希。

WARNING 警告

MongoDB 

hashed

 indexes truncate floating point numbers to 64-bit integers before hashing. For example, a 

hashed

 index would store the same value for a field that held a value of 

2.3

2.2

, and 

2.9

. To prevent collisions, do not use a 

hashed

 index for floating point numbers that cannot be reliably converted to 64-bit integers (and then back to floating point). MongoDB 

hashed

 indexes do not support floating point values larger than 2^53.

MongoDB哈希索引在哈希計算之前會将浮點數截斷為64位整數。例如,哈希索引會将為具有

2.3

2.2

2.9

的值的字段存儲為相同的值。為了避免沖突,請勿對不能可靠地轉換為64位整數(然後再傳回到浮點)的浮點數使用哈希索引。MongoDB哈希索引不支援大于2^53的浮點值。

To see what the hashed value would be for a key, see 

convertShardKeyToHashed()

.

如果想檢視一個鍵的哈希值是什麼,請參考

convertShardKeyToHashed()

[1] Starting in version 4.0, the 

mongo

 shell provides the method 

convertShardKeyToHashed()

. This method uses the same hashing function as the hashed index and can be used to see what the hashed value would be for a key.
從4.0版開始,mongo shell提供了

convertShardKeyToHashed()

方法。此方法使用與哈希索引相同的哈希函數,可用于檢視鍵的哈希值。

Hashed Sharding Shard Key 哈希分片的片鍵

The field you choose as your hashed shard key should have a good cardinality, or large number of different values. Hashed keys are ideal for shard keys with fields that change monotonically like ObjectId values or timestamps. A good example of this is the default 

_id

 field, assuming it only contains ObjectID values.

您選擇作為哈希片鍵的字段應具有良好的【基數】或者該字段包含大量不同的值。哈希分片非常适合選取具有像

ObjectId

值或時間戳那樣單調更改的字段作為片鍵。一個很好的例子是預設的

_id

字段,假設它僅包含

ObjectID

值(而非使用者自定義的

_id

)。

To shard a collection using a hashed shard key, see Shard a Collection.

要使用哈希片鍵對集合進行分片,請參閱【對集合進行分片】。

Hashed vs Ranged Sharding 哈希分片 VS 範圍分片

Given a collection using a monotonically increasing value 

X

 as the shard key, using ranged sharding results in a distribution of incoming inserts similar to the following:

給定一個使用單調遞增的值

X

作為片鍵的集合,使用範圍分片會導緻插入資料的分布類似于下面這樣:

mongodb更新某個字段_MongoDB 哈希分片Hashed Sharding 哈希分片

Since the value of 

X

 is always increasing, the chunk with an upper bound of maxKey receives the majority incoming writes. This restricts insert operations to the single shard containing this chunk, which reduces or removes the advantage of distributed writes in a sharded cluster.

由于

X

的值始終在增加,是以具有

maxKey

(上限)的資料塊将接收大多數傳入的寫操作。這将插入操作限制在隻能定向到包含此塊的單個分片,進而減少或消除了分片叢集中分布式寫入的優勢。

By using a hashed index on 

X

, the distribution of inserts is similar to the following:

通過在

X

上使用哈希索引,插入的分布将類似于下面這樣:

mongodb更新某個字段_MongoDB 哈希分片Hashed Sharding 哈希分片

Since the data is now distributed more evenly, inserts are efficiently distributed throughout the cluster.

由于現在資料分布更加均勻,是以可以在整個叢集中更高效地分布式插入資料。

Shard the Collection 對一個集合進行分片

Use the 

sh.shardCollection()

 method, specifying the full namespace of the collection and the target hashed index to use as the shard key.

使用

sh.shardCollection()

方法,指定集合的完整命名空間以及作為片鍵的目标哈希索引。

sh.shardCollection( "database.collection", {  : "hashed" } )
           
IMPORTANT 重要
  • Once you shard a collection, the selection of the shard key is immutable; i.e. you cannot select a different shard key for that collection.

    一旦對某個集合進行分片後,片鍵的選擇是不可變的。也就是說,您不能再為該集合選擇其他的片鍵。

  • Starting in MongoDB 4.2, you can update a document’s shard key value unless the shard key field is the immutable 

    _id

     field. For details on updating the shard key, see Change a Document’s Shard Key Value.Before MongoDB 4.2, a document’s shard key field value is immutable.

    從MongoDB 4.2開始,除非片鍵字段是不可變的

    _id

    字段,否則您可以更新文檔的片鍵值。有關更新片鍵的詳細資訊,請參閱【更改文檔的片鍵值】。在MongoDB 4.2以前的版本,片鍵是不可變的。

Shard a Populated Collection 對一個已有資料的集合進行分片

If you shard a populated collection using a hashed shard key:

如果您使用哈希片鍵對一個已經包含資料的集合進行分片操作:

  • The sharding operation creates the initial chunk(s) to cover the entire range of the shard key values. The number of chunks created depends on the configured chunk size.

    分片操作将建立初始資料塊,以覆寫片鍵值的整個範圍。建立的資料塊數取決于【配置的資料塊大小】。

  • After the initial chunk creation, the balancer migrates these initial chunks across the shards as appropriate as well as manages the chunk distribution going forward.

    在初始資料塊建立之後,均衡器會在分片上适當地遷移這些初始資料塊,并管理後續的資料塊配置設定。

Shard an Empty Collection 對一個空集合進行分片

If you shard an empty collection using a hashed shard key:

如果您使用哈希片鍵對一個空集合進行分片操作:

  • With no zones and zone ranges specified for the empty or non-existing collection:

    如果沒有為空集合或不存在的集合指定區域和區域範圍:

    • The sharding operation creates empty chunks to cover the entire range of the shard key values and performs an initial chunk distribution. By default, the operation creates 2 chunks per shard and migrates across the cluster. You can use 

      numInitialChunks

       option to specify a different number of initial chunks. This initial creation and distribution of chunks allows for faster setup of sharding.

      分片操作将建立空資料塊,以覆寫片鍵值的整個範圍,并執行初始資料塊配置設定。預設情況下,該操作為每個分片建立2個資料塊,并在整個叢集中遷移。您可以使用

      numInitialChunks

      選項指定不同數量的初始塊。資料塊的這種初始建立和配置設定可以使分片設定更加快速。
  • After the initial distribution, the balancer manages the chunk distribution going forward.

    初始配置設定之後,均衡器将管理後續的資料塊配置設定。

  • With zones and zone ranges specified for the empty or a non-existing collection (Available starting in MongoDB 4.0.3),

    如果已經為空集合或不存在的集合指定區域和區域範圍(從MongoDB4.0.3版本起可用):

    • The sharding operation creates empty chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. This initial creation and distribution of chunks allows for faster setup of zoned sharding.

      分片操作會為定義的區域範圍以及所有其他分片建立空資料塊,以覆寫片鍵值的整個範圍,并根據區域範圍執行初始資料塊配置設定。資料塊的這種初始建立和配置設定可以使分片設定更加快速。

  • After the initial distribution, the balancer manages the chunk distribution going forward.

    初始配置設定之後,均衡器将管理後續的資料塊配置設定。

SEE ALSO 另請參考:

To learn how to deploy a sharded cluster and implement hashed sharding, see Deploy a Sharded Cluster.

要了解如何部署分片叢集和實作哈希分片,請參閱【部署分片叢集】。