leveldb 源码分析与实现 - 读取

leveldb 源码分析与实现: 读取

读文件的过程

首先查看 memtable,其次查看 immutable memtable,最后尝试从 versions_->current() 中获取。

// First look in the memtable, then in the immutable memtable (if any).
LookupKey lkey(key, snapshot);
if (mem->Get(lkey, value, &s)) {
  // Done
} else if (imm != nullptr && imm->Get(lkey, value, &s)) {
  // Done
} else {
  s = current->Get(options, lkey, value, &stats);
  have_stat_update = true;
}

MemTable

// memtable.cc
bool MemTable::Get(const LookupKey& key, std::string* value, Status* s) {
  // 取出和这个 key 关联的 tag
  const uint64_t tag = DecodeFixed64(key_ptr + key_length - 8);
  switch (static_cast<ValueType>(tag & 0xff)) {
    // 有值
    case kTypeValue: {
      Slice v = GetLengthPrefixedSlice(key_ptr + key_length);
      value->assign(v.data(), v.size());
      return true;
    }

    // key 已经被 delete 了
    case kTypeDeletion:
      *s = Status::NotFound(Slice());
      return true;
  }
}

Immutable MemTable

imm 代表的是 Immutable MemTable,意味着只能从这个 MemTable 中读取值,而不能写入值。在写入值之前,leveldb 会在适当时机,将 mem_ 转为 imm_

// ...
imm_ = mem_;
has_imm_.store(true, std::memory_order_release);
mem_ = new MemTable(internal_comparator_);
//

从当前 Version 查找

重要结构体

FileMetaData 是一个结构体:

// version_edit.h
struct FileMetaData {
  FileMetaData() : refs(0), allowed_seeks(1 << 30), file_size(0) {}

  int refs;
  int allowed_seeks;  // Seeks allowed until compaction
  uint64_t number;
  uint64_t file_size;    // File size in bytes
  InternalKey smallest;  // Smallest internal key served by table
  InternalKey largest;   // Largest internal key served by table
};

Rep 是定义在 table.cc 文件中的结构体:

struct Table::Rep {
  ~Rep() {
    delete filter;
    delete[] filter_data;
    delete index_block;
  }

  Options options;
  Status status;
  RandomAccessFile* file;
  uint64_t cache_id;
  FilterBlockReader* filter;
  const char* filter_data;

  BlockHandle metaindex_handle;  // Handle to metaindex_block: saved from footer
  Block* index_block;
};

查找的过程

// version_set.cc
Status Version::Get(const ReadOptions& options, const LookupKey& k,
                    std::string* value, GetStats* stats) {
  // ...
  ForEachOverlapping(state.saver.user_key, state.ikey, &state, &State::Match);
  // ...
}

ForEachOverlappinglevel-0 一直向更深的 level 进行搜索:

// version_set.cc
void Version::ForEachOverlapping(Slice user_key, Slice internal_key, void* arg,
                                 bool (*func)(void*, int, FileMetaData*)) {
  
  // Search level-0 in order from newest to oldest.
  for (uint32_t i = 0; i < files_[0].size(); i++) {
    // ...
  }

  // Search other levels.
  for (int level = 1; level < config::kNumLevels; level++) {
    // ...
  }
}

从此处可以得知,每一层的文件列表都存储在 files_ 数组中:

// version_set.h
// List of files per level
std::vector<FileMetaData*> files_[config::kNumLevels];

InternalKey 比较器

// dbformat.h
class InternalKeyComparator : public Comparator {
 private:
  const Comparator* user_comparator_;
}

// comparator.cc
class BytewiseComparatorImpl : public Comparator {
}

// options.cc
Options::Options() : comparator(BytewiseComparator()), env(Env::Default()) {}

InternalKey 比较器默认采用的是 leveldb 提供的 BytewiseComparatorImpl 比较器,其比较算法如下:

// comparator.cc
int Compare(const Slice& a, const Slice& b) const override {
  return a.compare(b);
}

// slice.h
inline int Slice::compare(const Slice& b) const {
  const size_t min_len = (size_ < b.size_) ? size_ : b.size_;
  int r = memcmp(data_, b.data_, min_len);
  if (r == 0) {
    if (size_ < b.size_)
      r = -1;
    else if (size_ > b.size_)
      r = +1;
  }
  return r;
}

int memcmp ( const void * ptr1, const void * ptr2, size_t num ): Compare two blocks of memory. Compares the first num bytes of the block of memory pointed by ptr1 to the first num bytes pointed by ptr2, returning zero if they all match or a value different from zero representing which is greater if they do not.

FileMetaData 的构建过程

我们在上述代码中已经看到,每一个文件是用 FileMetaData 来表示的,而这个数据结构中有几个最关键的字段 numbersmallestlargest,如下就是它们各自的赋值过程:

// db_impl.cc
Status DBImpl::WriteLevel0Table(MemTable* mem, VersionEdit* edit,
                                Version* base) {
  FileMetaData meta;
  // ...
  meta.number = versions_->NewFileNumber();
  Iterator* iter = mem->NewIterator();
  // ...
  s = BuildTable(dbname_, env_, options_, table_cache_, iter, &meta);
}

// builder.cc
Status BuildTable(const std::string& dbname, Env* env, const Options& options,
                  TableCache* table_cache, Iterator* iter, FileMetaData* meta) {
  iter->SeekToFirst();
  // ...
  TableBuilder* builder = new TableBuilder(options, file);
  meta->smallest.DecodeFrom(iter->key());
  Slice key;
  for (; iter->Valid(); iter->Next()) {
    key = iter->key();
    builder->Add(key, iter->value());
  }
  if (!key.empty()) {
    meta->largest.DecodeFrom(key);
  }
}

其中 InternalKeyDecodeFrom 方法的定义如下所示:

class InternalKey {
 private:
  std::string rep_;

 public:
  bool DecodeFrom(const Slice& s) {
    rep_.assign(s.data(), s.size());
    return !rep_.empty();
  }

在此我们知道了一个文件的 smallest 指的是添加这个文件时候,位于内存中的 mem 跳表的最小 key,而 largest 则是 mem 跳表上的最大 keynumber 则是版本集赋予的一个新的 FileNumber

跳表 mem 默认也是采用的 BytewiseComparator 作为元素的比较器的。