天天看点

Rocksdb 写入数据后 GetApproximateSizes 获取的大小竟然为0?

项目开发中需要从引擎 获取一定范围的数据大小,用作打点上报,测试过程中竟然发现写入了一部分数据之后通过GetApproximateSizes 获取写入的key的范围时取出来的数据大小竟然为0。。。难道发现了一个bug?(欣喜)

因为写入的数据是小于一个sst的data-block(默认是4K),会不会因为GetApproximateSizes 对小于一个data-block的数据大小都默认是0?对于一个严谨的引擎,这么明显的问题显然不可忍。

问题代码:

#include <iostream>
#include <string>

#include <rocksdb/db.h>
#include <rocksdb/slice.h>

#define VALUE_SIZE 100

using namespace std;
using namespace rocksdb;

void check_status(Status s, std::string op) {
    if (!s.ok()) {
        cout << " Excute " << op << " failed "
             << s.ToString() << endl;
        exit(1);
    }
}

static std::string Key(int i) {
    char buf[100];
    snprintf(buf, sizeof(buf), "key%06d", i);
    return std::string(buf);
}

int main() {
    rocksdb::DB* db;
    rocksdb::Options options;
    rocksdb::Status s;
    
    options.create_if_missing = true;
    options.compression = kNoCompression;

    // 打开db
    check_status(rocksdb::DB::Open(options, "./db", &db), "Open DB");

    // 写入10条key-value,value大小是100B
    for (int i = 0;i < 10; i++) {
        check_status(db->Put(WriteOptions(), 
                            Key(i), 
                            Slice(string(VALUE_SIZE, 'a' + (i % 26)))), 
                    "Put DB");
    }

    // 取其中的key范围为[1,3],获取处于这个范围的key-value大小
    uint64_t size;
    string start = Key(1);
    string end = Key(3);
    Range r(start, end);
    db->GetApproximateSizes(&r, 1, &size);
    cout << "Approximate size is " << size << endl; 

    delete db;
    return 0;
}      

最终的执行结果是:

Approximate size is 0      

本来开开心心,很明显的问题,想要分析一下原因,向社区提一个PR,结果翻看了一下源代码就没心情了,还是自己太天真。

这个获取指定范围的key大小的接口是有一个额外参数的​

​include_flags​

​:

virtual void GetApproximateSizes(const Range* ranges, int n, uint64_t* sizes,
                                   uint8_t include_flags = INCLUDE_FILES) {
    GetApproximateSizes(DefaultColumnFamily(), ranges, n, sizes, include_flags);
  }      

这个额外参数是用来指定从rocksdb的哪一个组件获取指定范围的key的大小,比如从memtable,或则 sst?

自己使用默认参数 写入了一小部分数据,显然没有达到触发flush的条件,都会存储在memtable,所以这里从默认的sst文件获取这个范围的key大小时显然获取不到。

可以继续看更底层的实现:

Status DBImpl::GetApproximateSizes(const SizeApproximationOptions& options,
                                   ColumnFamilyHandle* column_family,
                                   const Range* range, int n, uint64_t* sizes) {
  ......
  Version* v;
  auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
  auto cfd = cfh->cfd();
  // 增加针对当前cf的引用
  SuperVersion* sv = GetAndRefSuperVersion(cfd);
  v = sv->current;

  // 允许同时传入多个range,这里对传入的range进行遍历
  for (int i = 0; i < n; i++) {
    Slice start = range[i].start;
    Slice limit = range[i].limit;

    // Add timestamp if needed
    std::string start_with_ts, limit_with_ts;
    if (ts_sz > 0) {
      // Maximum timestamp means including all key with any timestamp
      AppendKeyWithMaxTimestamp(&start_with_ts, start, ts_sz);
      // Append a maximum timestamp as the range limit is exclusive:
      // [start, limit)
      AppendKeyWithMaxTimestamp(&limit_with_ts, limit, ts_sz);
      start = start_with_ts;
      limit = limit_with_ts;
    }
    // Convert user_key into a corresponding internal key.
    InternalKey k1(start, kMaxSequenceNumber, kValueTypeForSeek);
    InternalKey k2(limit, kMaxSequenceNumber, kValueTypeForSeek);
    sizes[i] = 0;
    // 从sst文件中取指定key范围的大小
    if (options.include_files) {
      sizes[i] += versions_->ApproximateSize(
          options, v, k1.Encode(), k2.Encode(), /*start_level=*/0,
          /*end_level=*/-1, TableReaderCaller::kUserApproximateSize);
    }
    // 从memtable中取出指定key范围的大小,包括mem和imm
    if (options.include_memtabtles) {
      sizes[i] += sv->mem->ApproximateStats(k1.Encode(), k2.Encode()).size;
      sizes[i] += sv->imm->ApproximateStats(k1.Encode(), k2.Encode()).size;
    }
  }

  // 释放对superversion的引用
  ReturnAndCleanupSuperVersion(cfd, sv);
  return Status::OK();
}      

再对应到从sst文件的blockbased table中取数据,需要创建blockbased的index的iter来取start-end key所属的datablock的偏移地址。

如果要从memtable 中取数据,也就是需要遍历skiplist,顺序逐层遍历跳表,找到属于start-end范围内的所有key的个数,统一计算大小。

经过上面一轮的分析,我们就知道了想要通过GetApproximateSizes 获取准确的一个区间内的key-value大小,需要同时计算memtable+sst的大小,这才足够精确。

ps: 同样的数据放在memtable和放在sst中是不一样的,因为sst中除了data-block中key-value数据,还有indexblock,还有metaindex,还有footer。所以统计同样的数据在memtable和sst中会有一些差异。

最终正确使用​

​GetApproximateSizes()​

​ 接口的方式如下:

#include <iostream>
#include <string>

#include <rocksdb/db.h>
#include <rocksdb/slice.h>


#define VALUE_SIZE 100

using namespace std;
using namespace rocksdb;

void check_status(Status s, std::string op) {
  if (!s.ok()) {
    cout << " Excute " << op << " failed "
       << s.ToString() << endl;
    exit(1);
  }
}

static std::string Key(int i) {
  char buf[100];
  snprintf(buf, sizeof(buf), "key%06d", i);
  return std::string(buf);
}

int main() {
    rocksdb::DB* db;
    rocksdb::Options options;
    rocksdb::Status s;
    
    options.create_if_missing = true;
  options.compression = kNoCompression;

  check_status(rocksdb::DestroyDB("./db", options),
        "DestroyDB");

  check_status(rocksdb::DB::Open(options, "./db", &db), "Open DB");

  for (int i = 0;i < 3; i++) {
    check_status(db->Put(WriteOptions(), 
              Key(i), 
              Slice(string(VALUE_SIZE, 'a' + (i % 26)))), 
          "Put DB");
  }

  uint64_t size;
  string start = Key(1);
  string end = Key(3);
  Range r(start, end);
  db->GetApproximateSizes(&r, 1, &size);
  cout << "Approximate size is " << size << endl; 
  
  uint8_t include_both = DB::SizeApproximationFlags::INCLUDE_FILES |
                         DB::SizeApproximationFlags::INCLUDE_MEMTABLES;

  db->GetApproximateSizes(&r, 1, &size, include_both);
  cout << "After set memtable flag, Approximate size is " << size << endl; 

  db->Flush(FlushOptions());
  db->GetApproximateSizes(&r, 1, &size);
  cout << "After flush, Approximate size is " << size << endl; 

  delete db;

  return 0;
}      
Approximate size is 0
After set memtable flag, Approximate size is 238
After flush, Approximate size is 1151