glib是如何实现hashtable的

hash表做为一个数据结构，具有快速查找的优势，理想情况下，每一次查找的时间复杂度都是O(1)。glib在文件ghash.c中实现了hashtable，今天学习了一下源代码，感觉收获还是蛮多的，下面对源代码做一个解析。

hash表之所以能做到快速查找，是因为他使用了数组做为容器。在glib中，hashtable这个数据结构有三个数组分别存放key，value和hash值。

struct _GHashTable
{
  gint             size;
  gint             mod;
  guint            mask;
  gint             nnodes;
  gint             noccupied;  /* nnodes + tombstones */

  gpointer        *keys;         //用于存放每一个key
  guint           *hashes;       //用于存放每一个key所对映的hash值
  gpointer        *values;       //用于存放每一个key所对映的value
  //如果有一个key = 'age'，经过计算得到这个key的hash值为431，并且这个key对映的值为22，则这三个数据分别存放在上面三个数组里
  GHashFunc        hash_func;
  GEqualFunc       key_equal_func;
  gint             ref_count;
#ifndef G_DISABLE_ASSERT
  /*
   * Tracks the structure of the hash table, not its contents: is only
   * incremented when a node is added or removed (is not incremented
   * when the key or data of a node is modified).
   */
  int              version;
#endif
  GDestroyNotify   key_destroy_func;
  GDestroyNotify   value_destroy_func;
};

可以看到还是hashtable这个数据结构中还有很多很多很多成员

size：hashtable的大小，即三个数组的大小，这个大小会随着数据的添加而不断变化。初始值为8

mod：hashtable的被除数，glib使用除法散列法来避免冲突。

mask：size - 1

nnodes：总有多少对(key,value)

在调用g_hash_table_new[_full]函数时，hashtable的各个成员被初始化，这个可以查看源代码：

GHashTable*
g_hash_table_new_full (GHashFunc       hash_func,
                       GEqualFunc      key_equal_func,
                       GDestroyNotify  key_destroy_func,
                       GDestroyNotify  value_destroy_func)
{
  GHashTable *hash_table;

  hash_table = g_slice_new (GHashTable);
  g_hash_table_set_shift (hash_table, HASH_TABLE_MIN_SHIFT);      //将size设为8
  hash_table->nnodes             = 0;
  hash_table->noccupied          = 0;
  hash_table->hash_func          = hash_func ? hash_func : g_direct_hash;
  hash_table->key_equal_func     = key_equal_func;
  hash_table->ref_count          = 1;
#ifndef G_DISABLE_ASSERT
  hash_table->version            = 0;
#endif
  hash_table->key_destroy_func   = key_destroy_func;
  hash_table->value_destroy_func = value_destroy_func;
  hash_table->keys               = g_new0 (gpointer, hash_table->size);
  hash_table->values             = hash_table->keys;                
  hash_table->hashes             = g_new0 (guint, hash_table->size);//初始化三个函数

  return hash_table;
}

在调用g_hash_table_insert时，我们可以看出来glib是如何实现hashtable的。我们先来看看glib是如何实现插入的。

static void
g_hash_table_insert_internal (GHashTable *hash_table,
                              gpointer    key,
                              gpointer    value,
                              gboolean    keep_new_key)
{
  guint key_hash;
  guint node_index;

  g_return_if_fail (hash_table != NULL);

  node_index = g_hash_table_lookup_node (hash_table, key, &key_hash);

  g_hash_table_insert_node (hash_table, node_index, key_hash, key, value, keep_new_key, FALSE);
}

上面的代码实现了插入操作，g_hash_table_lookup_node函数接受一个key变量，他会返回这个key相对映的下标，利用这个下标node_index，可以方便的得到这个key对映的value和hash——values[node_index]和hashs[node_index]。如果这是个新的key，那函数就会返回一个现在并没有被使用的下标。然后g_hash_table_insert_node就会将这个hash对(key, value)插入hash表中。

glib实现插入时会对原有的key进行检查，判断这个key是否已经存在了，如果已经存在就更改这个key对映的value，如果没有的话就添加key，hash，value。现在我们可以来看看glib是如何实现g_hash_table_lookup_node函数的。

hash表尽管使用了各种方法做hash函数，但是冲撞总是不可避免的。当我们使用除法散列时，每一个hash值对映的下标为hash%mod，如果一个key的hash值为10，另一个key的hash值为17，而被除数为7的话，这两个key就会发生冲撞。假设10已经插到下标为3（10%7）的数组里了，这时要插入17，我们只能向下查找，如果4的位置是空的，我们就可以插入到4当中了。如此循环，我们一定能找一个空的位置。但是，如果4这个位置已经被11占用后又释放了，我们需要一个标志，即使是释放了，也不能直接设为空，这个标志就是tombstone。

static inline guint

g_hash_table_lookup_node (GHashTable    *hash_table,
                          gconstpointer  key,
                          guint         *hash_return)
{
  guint node_index;
  guint node_hash;
  guint hash_value;
  guint first_tombstone = 0;
  gboolean have_tombstone = FALSE;
  guint step = 0;

  hash_value = hash_table->hash_func (key);
  if (G_UNLIKELY (!HASH_IS_REAL (hash_value)))
    hash_value = 2;

  *hash_return = hash_value;

  node_index = hash_value % hash_table->mod;     //得到这个hash值对映的下标，比如10%7 =3，则这个下标为3
  node_hash = hash_table->hashes[node_index];    //取出下标为3的hashs数组中的值，判断他是否被占用了

  while (!HASH_IS_UNUSED (node_hash))            //当node_hash不等于0时，进入循环
    {
      /* We first check if our full hash values
       * are equal so we can avoid calling the full-blown
       * key equality function in most cases.
       */
      if (node_hash == hash_value)               //当hash值相等时，判断key是否相等，如果相等则返回下标值。
        {
          gpointer node_key = hash_table->keys[node_index];

          if (hash_table->key_equal_func)
            {
              if (hash_table->key_equal_func (node_key, key))
                return node_index;
            }
          else if (node_key == key)
            {
              return node_index;
            }
        }
      else if (HASH_IS_TOMBSTONE (node_hash) && !have_tombstone)   //当hash值不相等是，判断是否为tombstone
        {
          first_tombstone = node_index;
          have_tombstone = TRUE;
        }

      step++;
      node_index += step;
      node_index &= hash_table->mask;
      node_hash = hash_table->hashes[node_index];
    }

  if (have_tombstone)
    return first_tombstone;

  return node_index;
}

当找到给定的key时，函数返回下标值，如果没有的话，就返回tombstone的位置或者第一个空的位置。

现在我们得到了一个下标值了，我们可以将值插入到hash表中了

static void
g_hash_table_insert_node (GHashTable *hash_table,
                          guint       node_index,
                          guint       key_hash,
                          gpointer    key,
                          gpointer    value,
                          gboolean    keep_new_key,
                          gboolean    reusing_key)
{
  guint old_hash;
  gpointer old_key;
  gpointer old_value;

  if (G_UNLIKELY (hash_table->keys == hash_table->values && key != value))
    hash_table->values = g_memdup (hash_table->keys, sizeof (gpointer) * hash_table->size);

  old_hash = hash_table->hashes[node_index];
  old_key = hash_table->keys[node_index];
  old_value = hash_table->values[node_index];

  if (HASH_IS_REAL (old_hash))     //原hash表中已经被占用了，说明给定的key已经存在在hash表中了
    {
      if (keep_new_key)
        hash_table->keys[node_index] = key;
      hash_table->values[node_index] = value;
    }
  else
    {
      hash_table->keys[node_index] = key;
      hash_table->values[node_index] = value;
      hash_table->hashes[node_index] = key_hash;

      hash_table->nnodes++;

      if (HASH_IS_UNUSED (old_hash))
        {
          /* We replaced an empty node, and not a tombstone */
          hash_table->noccupied++;
          g_hash_table_maybe_resize (hash_table);
        }

#ifndef G_DISABLE_ASSERT
      hash_table->version++;
#endif
    }

  if (HASH_IS_REAL (old_hash))
    {
      if (hash_table->key_destroy_func && !reusing_key)
        hash_table->key_destroy_func (keep_new_key ? old_key : key);
      if (hash_table->value_destroy_func)
        hash_table->value_destroy_func (old_value);
    }
}

上面的代码比较容易理解，就是先判断key是否已经存在了，如果存在就替换到他所对映的value。如果不存在就添加这对(key,value)。再判断是否需要调整hash表的大小。

这就是glib主要实现hash表的方法，当然这只是一小部分，还有很多代码值得我去学习。

glib是如何实现hashtable的

继续阅读

磁盘结构及在Linux中的命名

linux网络编程----发送与接收文件

处理PCX文件

不用iconv函数实现UTF-8编码转换GB2312的PHP函数

D. Ehab the Xorcist(构造+思维)

查找算法之二分查找查找算法之二分查找

JAVA高效编程指南

CQ V1.0分词bates(基于双数组tire树)—应该是目前最快的中文分词算法

【数据结构】医院选址

[转]ISUP信令的IAM消息详细内容

笔试面试题目：滑动窗口(二)

交通/城市相关的公开的数据集上学时整理的Xie et al., 2020收集的

数据结构与算法（27）——排序（二）

无组件上传图片到数据库中，最完整解决方案

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

Linux设备模型（中）之上层容器