天天看点

Bigtable的些许重点

分布式数据库系统

针对于海量数据,可扩展,高吞吐量,低时延

不支持关系模型

通过row和column进行索引,row和column可以是任意字符串

所存储的数据也是字符串

Bigtable是一个map,value是array of bytes,通过row key, column key, timestamp检索。

(row:string, column:string, time:int64) --> string

读写操作对于行是原子性的

The row range for a table is dynamically partitioned.

Each row range is called a tablet, which is the unit of distribution and load balancing.

一个tablet包含多行

列族是访问控制的基本单位

同一列族的数据通常是相同类型的,对同一列族的数据进行压缩

列族属于模式,数量有限(in the hundreds at most),很少改变

列的数量是不限制的(have an unbounded number of columns)

列键 column key命名为family:qualifier

时间戳timestamp

可通过HBase系统指定,也可在客户端指定

可存储最新n条数据,也可存储最近几天的数据

基于GFS存储log和data files

SSTable文件格式来存储Bigtable data

Internaly, each SSTable contains a sequence of blocks(typically each block is 64KB in size, but this is configurable).

A block index (stored at the end of the SSTable) is used to locate blocks; the index is loaded into memory when the SSTable is opened.

Chubby provides a namespace that consists of directories and small files.

Each directory or file can be used as a lock, and reads and writes to a file are atomic.

Chubby的职责:

1、to ensure that there is at most one active master at any time

2、to store the bootstrap location of Bigtable data

3、to discover tablet servers and finallize tablet server deaths

4、to store Bigtable schema information (the column family information for each table)

5、to store access control lists

如果Chubby持续一段时间不能访问,Bigtable becomes unavialiable。

实现包含三个部分:

a library that is linked into every client

one master server

many tablet servers (can be dynamically added or removed)

master server的职责:

1、assigning tablets to tablet servers

2、detecting the addition and expiration of tablet servers

3、balancing tablet-server load

4、garbage collection of files in GFS

5、handles schema changes such as table and column family creations

teblet server的职责:

1、manages a set of tablets (typically we have somewhere between ten to thousand tablets per tablet server)

2、handles read and write requests to the teblets that is loaded

3、splits tablets that have grown too large

每个table包含若干tablets,每个tablet对应多行

通过3层类B+树实现tablet的索引

第一层是保存在Chubby中的文件,记录的是root tablet的位置

第二层为root tablet,包含METADATA表的tablets信息,为了保证三层结构,root table只有一个,不进行split。

第二层为METADATA,包含的是user tablets的位置,以tablet's table identifier和its end row为key,value为对应tablet的位置

root tablet实际是第一个METADATA表的tablet。

Locality groups:

用户可以将同时访问的列族设置为一个locality group,每个locality group作为一个SSTable存储

Compression:

可以将包含locality group的SSTable文件进行压缩存储以节省空间

Caching for read performance:

Higher-level cache: the key-value pairs returned by the SSTable

Lower-level cache: SSTables blocks

Bloom filters:

A read operation has to read from all SSTables that make up the state of a tablet.

过滤器可以减少磁盘访问量

继续阅读