天天看點

LSM Tree 學習筆記——本質是将随機的寫放在記憶體裡形成有序的小memtable,然後定期合并成大的table flush到磁盤

The Sorted String Table (<code>SSTable</code>) is one of the most popular outputs for storing, processing, and exchanging datasets. 

An SSTable is a simple abstraction to efficiently store large numbers of key-value pairs while optimizing for high throughput, sequential read/write workloads.

Unfortunately, the SSTable name itself has also been overloaded by the industry to refer to services that go well beyond just the sorted table, which has only added unnecessary confusion to what is a very simple and a useful data structure on its own. Let's take a closer look under the hood of an SSTable and how LevelDB makes use of it.

SSTable本身是個簡單而有用的資料結構, 而往往由于工業界對于它的overload, 導緻大家的誤解 

它本身就像他的名字一樣, 就是a set of sorted key-value pairs 

如下圖左, 當檔案比較大的時候, 也可以建立key:offset的index, 用于快速分段定位, 但這個是可選的.

這個結構和普通的key-value pairs的差別, 可以support range query和random r/w

LSM Tree 學習筆記——本質是将随機的寫放在記憶體裡形成有序的小memtable,然後定期合并成大的table flush到磁盤

A "Sorted String Table" then is exactly what it sounds like, it is a file which contains a set of arbitrary, sorted key-value pairs inside. 

Duplicate keys are fine, there is no need for "padding" for keys or values, and keys and values are arbitrary blobs. Read in the entire file sequentially and you have a sorted index. Optionally, if the file is very large, we can also prepend, or create a standalone <code>key:offset</code> index for fast access.

That's all an SSTable is: very simple, but also a very useful way to exchange large, sorted data segments.

僅僅SSTable資料結構本身仍然無法support高效的range query和random r/w的場景 

名字很形象, 首先是基于log的, 不斷産生SSTable結構的log檔案, 并且是需要不斷merge以提高效率的

下圖很好的描繪了LSM Tree的結構和大部分操作

LSM Tree 學習筆記——本質是将随機的寫放在記憶體裡形成有序的小memtable,然後定期合并成大的table flush到磁盤

本文轉自張昺華-sky部落格園部落格,原文連結:http://www.cnblogs.com/bonelee/p/6408775.html,如需轉載請自行聯系原作者