天天看點

c++ malloc能給string_分析 Glibc 中的malloc/free 實作

本文使用 Zhihu On VSCode 創作并釋出

malloc 和 free 内部依賴的是GNU Allocator, 也叫Doug Lea的Allocator: http://gee.cs.oswego.edu/dl/html/malloc.html。

這篇分析會主要注意Allocator是怎麼管理記憶體的。它就像作業系統和使用者中間的一層,扮演了一個記憶體池的角色。本篇源碼分析主要基于github上截取的一個glibc 倉庫:https://github.com/sploitfun/lsploits。

總的來說,設計理念是:

首先在allocator會在堆上申請一大塊連續記憶體區域,并且這個區域會被切分為各個部分。對于使用者的請求大小進行分級,并且一層層地從fastbin->unsorted bins找到small bins,large bins。如果還找不到,就隻能去top Chunk。top Chunk即可以通過sbrk拓展記憶體邊界,也可以通過mmap來直接映射記憶體。

接下來深入細節去看一下:

先介紹3個概念,arena,Bins,Chunk,這樣在分析源代碼時會對了解有幫助。

c++ malloc能給string_分析 Glibc 中的malloc/free 實作

free

這張圖比較好的闡述了三者之間的關系,arena 是配置設定在堆上的一個堆區,中間的Free List儲存的是bins,它代表指向Chunk的指針數組。而bins又可以分成3段,每一段儲存了不同大小的Chunk。是以這類似于我之前講述的管理器的設計,通過數組來管理指針,每個指針又是一個連結清單指針。釋放記憶體塊和申請記憶體塊是對連結清單的增删查改。

malloc會對申請的大小進行評估,決定是使用sbrk還是mmap來配置設定堆區 (arena),這樣可以減少後續申請中向作業系統申請記憶體的次數。

三者概念的詳細概念解釋如下:

arena:通過sbrk或mmap系統調用為線程配置設定的堆區,按線程的類型可以分為2類:

  • main arena:主線程建立的arena;
  • thread arena:子線程建立的arena;

chunk:邏輯上劃分的一小塊記憶體,根據作用不同分為4類:

  • Allocated chunk:即配置設定給使用者且未釋放的記憶體塊;
  • Free chunk:即使用者已經釋放的記憶體塊;
  • Top chunk
  • Last Remainder chunk

bin:一個用以儲存Free chunk連結清單的表頭資訊的指針數組,按所懸挂連結清單的類型可以分為4類:

  • Fast bin
  • Unsorted bin
  • Small bin
  • Large bin

說一下核心資料結構:

malloc_state – Arena 頭 – 單個 thread arena 有多個堆,但是對于所有這些堆來說,隻存在一個 arena 頭。arena 頭 包含 bins、top chunk、最後剩餘塊……

malloc_chunk – 塊頭 – 一個堆基于使用者請求被分為很多塊。每個塊有它自己的塊頭。

Malloc_state有點像資料庫的中繼資料,而malloc_chunk真正儲存記憶體塊。

這個是管理的配置設定區的結構體,主要包含bins,top(top Chunk),linked list for free arena,先看一下有個整體印象。

/*
   have_fastchunks indicates that there are probably some fastbin chunks.
   It is set true on entering a chunk into any fastbin, and cleared early in
   malloc_consolidate.  The value is approximate since it may be set when there
   are no fastbin chunks, or it may be clear even if there are fastbin chunks
   available.  Given it's sole purpose is to reduce number of redundant calls to
   malloc_consolidate, it does not affect correctness.  As a result we can safely
   use relaxed atomic accesses.
 */
struct malloc_state
{
    /* Serialize access.  */
    __libc_lock_define (, mutex);

    /* Flags (formerly in max_fast).  */
    int flags;

    /* Set if the fastbin chunks contain recently inserted free blocks.  */
    /* Note this is a bool but not all targets support atomics on booleans.  */
    int have_fastchunks;

    /* Fastbins */
    mfastbinptr fastbinsY[NFASTBINS];

    /* Base of the topmost chunk -- not otherwise kept in a bin */
    mchunkptr top;

    /* The remainder from the most recent split of a small request */
    mchunkptr last_remainder;

    /* Normal bins packed as described above */
    mchunkptr bins[NBINS * 2 - 2];

    /* Bitmap of bins */
    unsigned int binmap[BINMAPSIZE];

    /* Linked list */
    struct malloc_state *next;

    /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
    struct malloc_state *next_free;

    /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
    INTERNAL_SIZE_T attached_threads;

    /* Memory allocated from the system in this arena.  */
    INTERNAL_SIZE_T system_mem;
    INTERNAL_SIZE_T max_system_mem;
};
           

Malloc

先看一張圖對于申請記憶體的過程有個整體感覺:

c++ malloc能給string_分析 Glibc 中的malloc/free 實作

img

Malloc 函數整體結構

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));

  arena_lookup (ar_ptr); // arena 邏輯部分

  arena_lock (ar_ptr, bytes);
  if (!ar_ptr)
    return 0;

  victim = _int_malloc (ar_ptr, bytes);
  if (!victim)
    {
      LIBC_PROBE (memory_malloc_retry, 1, bytes);
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      if (__builtin_expect (ar_ptr != NULL, 1))
        {
          victim = _int_malloc (ar_ptr, bytes);
          (void) mutex_unlock (&ar_ptr->mutex);
        }
    }
  else
    (void) mutex_unlock (&ar_ptr->mutex);
  assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
          ar_ptr == arena_for_chunk (mem2chunk (victim)));
  return victim;
}
           

Arena

先定位分析有arena的地方。Allocator的查找是先會在全局的Arena List裡面找,如果能找到就複用,找不到就new一個新的Arena。

arena_lookup 是搜尋的過程,arena_lock是給這塊區域加上Mutex鎖。arena_lookup在global arena linked list裡面找,但有可能找不到可以複用的arena,于是這個時候arena_lock拿到的ptr就是空的,它會自己new一個arena。

#define arena_lock(ptr, size) do {					      
      if (ptr)								      
        (void) mutex_lock (&ptr->mutex);				      
      else								      
        ptr = arena_get2 (ptr, (size), NULL);				      
  } while (0)
           

arena_get2(省略号省略部分邏輯)就包括了_int_new_arena的邏輯。

.......
​      if (__glibc_unlikely (n <= narenas_limit - 1))

​        {

​          if (catomic_compare_and_exchange_bool_acq (&narenas, n + 1, n))

​            goto repeat;

​          a = _int_new_arena (size);

​    if (__glibc_unlikely (a == NULL))

​            catomic_decrement (&narenas);

​        }

​      else

​        a = reused_arena (avoid_arena);
........
           

_int_new_arena建立了一塊arena,而這個函數會基于mmap來做配置設定。進入裡面并跟蹤到new_heap。由于經過了幾層封裝,隻講核心代碼。可以看到調用了MMAP (mmap) 來配置設定在堆上的記憶體。

..............
if (aligned_heap_area)

​    {

​      p2 = (char *) MMAP (aligned_heap_area, HEAP_MAX_SIZE, PROT_NONE,

​                          MAP_NORESERVE);

​      aligned_heap_area = NULL;

​      if (p2 != MAP_FAILED && ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)))

​        {

​          __munmap (p2, HEAP_MAX_SIZE);

​          p2 = MAP_FAILED;

​        }

​    }

  if (p2 == MAP_FAILED)

​    {

​      p1 = (char *) MMAP (0, HEAP_MAX_SIZE << 1, PROT_NONE, MAP_NORESERVE);
.......
           

總結:

為了達到複用,arena會先從一個list上找可以複用的arena,如果找不到就用mmap建立一個,配置設定在堆上。

_int_malloc

Fast Bins

如果把Bins比作資料庫的話,Fast Bins可以說是它們的緩存。它有着比較快的申請和釋放速度。因為它内部是一個LIFO的連結清單,所有增删差改都在連結清單頭部進行。

接下來分析一下_int_malloc。這裡是主要處理配置設定的函數,會根據要申請的size分成fastbin,bins。

/*
     If the size qualifies as a fastbin, first check corresponding bin.
     This code is safe to execute even if av is not yet initialized, so we
     can try it without checking, which saves some time on this fast path.
   */

  if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
    {
      idx = fastbin_index (nb);
      mfastbinptr *fb = &fastbin (av, idx);
      mchunkptr pp = *fb;
      do
        {
          victim = pp;
          if (victim == NULL)
            break;
        }
      while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
             != victim);
      if (victim != 0)
        {
          if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))
            {
              errstr = "malloc(): memory corruption (fast)";
            errout:
              malloc_printerr (check_action, errstr, chunk2mem (victim));
              return NULL;
            }
          check_remalloced_chunk (av, victim, nb);
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }
    }
           

arena

裡面有一個

fastbinY

數組,一共10個元素,每個元素都相當于連結清單頭部,記錄着對應的堆塊指針

如果對應索引裡面有堆塊,那麼就取出來,然後連結清單頭部指向下一個堆塊(如果有),這個是通過

fd

判斷的。

是以

fastbin

每一條鍊上的堆塊大小都是一樣的,而且屬于先進後出

FILO

結構。

如果

fastbin

對應

idx

裡有元素,才進入其中操作。

這裡重點關注一下mchunkptr pp。可以看到從fastbin中取出對應index的連結清單指針後,把它賦給了pp。pp指向的結構就是最核心的malloc_chunk

/* Forward declarations.  */
struct malloc_chunk;
typedef struct malloc_chunk* mchunkptr;
           
struct malloc_chunk {

  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};
           
c++ malloc能給string_分析 Glibc 中的malloc/free 實作

Free chunk

從這個結構體可以知道,我們是在處理一個雙向連結清單,nextsize是large bins 用來加速相同size的chunk的輔助指針。

最後調用的chunk2mem也比較重要,主要是把這個malloc_chunk結構體轉成使用者可以使用的記憶體指針,然後傳回。

Fast bin最大的特點是無需合并 —— 兩個相鄰 chunk 不會被合并。雖然這可能會加劇記憶體碎片化,但也大大加速了記憶體釋放的速度。

c++ malloc能給string_分析 Glibc 中的malloc/free 實作

img

Bins

先對Bins有個整體的概括:

使用者 free 掉的記憶體并不是都會馬上歸還給系統, 相反, ptmalloc 會統一管理 heap 中的空閑的 chunk, 當使用者進行下一次配置設定請求時, ptmalloc 會首先試圖在 heap 中空閑的 chunk 中挑選一塊給使用者, 這樣就避免了頻繁的系統調用, 降低了記憶體配置設定的開銷。ptmalloc 将 heap 中相似大小的 chunk 用雙向連結清單連結起來, 這樣的一個連結清單被稱為一個bin。 ptmalloc 共維護了128個bin, 并使用一個數組來存儲這些 bin。

c++ malloc能給string_分析 Glibc 中的malloc/free 實作

img

small bins(2-63)

接下來進入bins的數組。由之前的圖可知,bins分為small bins,unsorted bins, large bins。大小小于 512 位元組的 chunk 被稱為 「small chunk」,而儲存 small chunks 的 bin 被稱為 「small bin」。在記憶體配置設定回收的速度上,small bin 比 large bin 更快。unsorted bins隻有一條連結清單,而small bins和 large bins有比較多條連結清單。small bins數組管理的一個bin裡的所有chunk的size都相同,相鄰的bin相差8bytes。

源碼先處理了small bins的情況

if (in_smallbin_range (nb))
{
    idx = smallbin_index (nb);
    bin = bin_at (av, idx);

    if ((victim = last (bin)) != bin)
    {
        if (victim == 0) /* initialization check */
            malloc_consolidate (av);
        else
        {
            bck = victim->bk;
            if (__glibc_unlikely (bck->fd != victim))
            {
                errstr = "malloc(): smallbin double linked list corrupted";
                goto errout;
            }
            set_inuse_bit_at_offset (victim, nb);
            bin->bk = bck;
            bck->fd = bin;

            if (av != &main_arena)
                victim->size |= NON_MAIN_ARENA;
            check_malloced_chunk (av, victim, nb);
            void *p = chunk2mem (victim);
            alloc_perturb (p, bytes);
            return p;
        }
    }
}
           

malloc consolidate是從freebins裡取一些malloc_chunk合并到bins裡,減少記憶體碎片。

總的思路就是先擷取small bin index,然後從數組中取出對應連結清單。

首先我們要取的是victim。由于這個連結清單是個雙向連結清單,是以你在取一個chunck的時候,需要把bk和fd的關系都梳理好。可以看到bin->bk = bck,bck->fd = bin這兩句就是在把victim從連結清單中剝離。

unsorted bins(1)

當 small chunk 和 large chunk 被

free

掉時,它們并非被添加到各自的 bin 中,而是被添加在 「unsorted bin」 中。這使得配置設定器可以重新使用最近

free

掉的 chunk,進而消除了尋找合适 bin 的時間開銷,進而加速了記憶體配置設定及釋放的效率。(也就是說,盡最大限度地複用已經free過的bin,而不用再重新去small bins 和 large bins中找)

這個機制屬于malloc對于釋放記憶體塊的一個緩沖區。如果被使用者釋放的 chunk 大于 max_fast, 它應該會被放到 bins中。但是其實這些chunk會被首先放到unsorted bins中。進行 malloc 操作的時候, 如果在 fastbins 中沒有找到合适的 chunk, 則 ptmalloc 會先在 unsorted bins 中查找合适的空閑 chunk, 然後才查找 bins。如果 unsorted bins 不能滿足配置設定要求, malloc 便會将 unsorted bins 中的 chunk 放到 bins 中, 然後再在 bins 中繼續進行查找和配置設定過程。可見unsorted bins是屬于對malloc的優化,首先它隻有一條鍊,

free

的時候隻要不是

fastbin

,那麼統統先丢到這裡鍊起來,這麼做其實就是為了快,對于了解本質沒有明顯幫助,是以暫時不對它的源代碼進行分析。

它的一個明顯特點是不對chunk的大小做要求,而bins裡的其他都是會要求的。為什麼要轉移到Large bins/small bins呢?其實也就是因為,unsorted bins不能做合并,要移到指定的bin區域,再合并這些碎片。

Large Bins(64-126)

無論是small bins 還是 large bins,他們都需要對free chunk進行合并,此舉主要是減少記憶體碎片,但是會減緩free的速度。是以這也是fast bin的意義所在。

之前分析了Small Bins,這裡再分析Large Bins。Large Bins是ordered bins,也就是說它的每一個Bin裡的chunk都是按大小排列的,而small bin裡的每個chunk都是相同的大小。可以看到,如果在unsotred bins裡面沒有找到我們想要的結果,那麼就會去large bins 或者 small bins裡面找。這個時候,需要把unsorted bins 上的一些chunk取下來放到large bins或者small bins上面,而因為large bins是一個有序的chunk bin,是以在插傳入連結表的過程中需要保持連結清單的有序結構。

......................
          /* place chunk in bin */

          if (in_smallbin_range (size))
            {
              victim_index = smallbin_index (size);
              bck = bin_at (av, victim_index);
              fwd = bck->fd;
            }
          else
            {
              victim_index = largebin_index (size);
              bck = bin_at (av, victim_index);
              fwd = bck->fd;

              /* maintain large bins in sorted order */
              if (fwd != bck)
                {
                  /* Or with inuse bit to speed comparisons */
                  size |= PREV_INUSE;
                  /* if smaller than smallest, bypass loop below */
                  assert ((bck->bk->size & NON_MAIN_ARENA) == 0);
                  if ((unsigned long) (size) < (unsigned long) (bck->bk->size))
                    {
                      fwd = bck;
                      bck = bck->bk;

                      victim->fd_nextsize = fwd->fd;
                      victim->bk_nextsize = fwd->fd->bk_nextsize;
                      fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim;
                    }
                  else
                    {
                      assert ((fwd->size & NON_MAIN_ARENA) == 0);
                      while ((unsigned long) size < fwd->size)
                        {
                          fwd = fwd->fd_nextsize;
                          assert ((fwd->size & NON_MAIN_ARENA) == 0);
                        }

                      if ((unsigned long) size == (unsigned long) fwd->size)
                        /* Always insert in the second position.  */
                        fwd = fwd->fd;
                      else
                        {
                          victim->fd_nextsize = fwd;
                          victim->bk_nextsize = fwd->bk_nextsize;
                          fwd->bk_nextsize = victim;
                          victim->bk_nextsize->fd_nextsize = victim;
                        }
                      bck = fwd->bk;
                    }
                }
              else
                victim->fd_nextsize = victim->bk_nextsize = victim;
            }

          mark_bin (av, victim_index);
          victim->bk = bck;
          victim->fd = fwd;
          fwd->bk = victim;
          bck->fd = victim;
......................
           

這裡的邏輯主要是發現fastbins和unsorted bins不能滿足要求,于是取出全部unsort chunk放進某個bins中,然後再進行malloc。也就是新一輪的查找,不過這次是在large bins裡面找。

這裡想要理清每個細節就比較難了。主要思想是從large bins裡的有序bin中search for a chunk,找到後是正常的splite操作。

/*
         Search for a chunk by scanning bins, starting with next largest
         bin. This search is strictly by best-fit; i.e., the smallest
         (with ties going to approximately the least recently used) chunk
         that fits is selected.

         The bitmap avoids needing to check that most blocks are nonempty.
         The particular case of skipping all bins during warm-up phases
         when no chunks have been returned yet is faster than it might look.
       */

      ++idx;
      bin = bin_at (av, idx);
      block = idx2block (idx);
      map = av->binmap[block];
      bit = idx2bit (idx);

      for (;; )
        {
          /* Skip rest of block if there are no more set bits in this block.  */
          if (bit > map || bit == 0)
            {
              do
                {
                  if (++block >= BINMAPSIZE) /* out of bins */
                    goto use_top;
                }
              while ((map = av->binmap[block]) == 0);

              bin = bin_at (av, (block << BINMAPSHIFT));
              bit = 1;
            }

          /* Advance to bin with set bit. There must be one. */
          while ((bit & map) == 0)
            {
              bin = next_bin (bin);
              bit <<= 1;
              assert (bit != 0);
            }

          /* Inspect the bin. It is likely to be non-empty */
          victim = last (bin);

          /*  If a false alarm (empty bin), clear the bit. */
          if (victim == bin)
            {
              av->binmap[block] = map &= ~bit; /* Write through */
              bin = next_bin (bin);
              bit <<= 1;
            }

          else
            {
              size = chunksize (victim);

              /*  We know the first chunk in this bin is big enough to use. */
              assert ((unsigned long) (size) >= (unsigned long) (nb));

              remainder_size = size - nb;

              /* unlink */
              unlink (victim, bck, fwd);

              /* Exhaust */
              if (remainder_size < MINSIZE)
                {
                  set_inuse_bit_at_offset (victim, size);
                  if (av != &main_arena)
                    victim->size |= NON_MAIN_ARENA;
                }

              /* Split */
              else
                {
                  remainder = chunk_at_offset (victim, nb);

                  /* We cannot assume the unsorted list is empty and therefore
                     have to perform a complete insert here.  */
                  bck = unsorted_chunks (av);
                  fwd = bck->fd;
	  if (__glibc_unlikely (fwd->bk != bck))
                    {
                      errstr = "malloc(): corrupted unsorted chunks 2";
                      goto errout;
                    }
                  remainder->bk = bck;
                  remainder->fd = fwd;
                  bck->fd = remainder;
                  fwd->bk = remainder;

                  /* advertise as last remainder */
                  if (in_smallbin_range (nb))
                    av->last_remainder = remainder;
                  if (!in_smallbin_range (remainder_size))
                    {
                      remainder->fd_nextsize = NULL;
                      remainder->bk_nextsize = NULL;
                    }
                  set_head (victim, nb | PREV_INUSE |
                            (av != &main_arena ? NON_MAIN_ARENA : 0));
                  set_head (remainder, remainder_size | PREV_INUSE);
                  set_foot (remainder, remainder_size);
                }
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }
        }

           

可以看到源碼這裡出現了remainder chunk,這是因為ptmalloc隻會去找一個大小最接近使用者請求的chunk。一旦找到,相應 chunk 就會被切分成兩塊:

  • User chunk(使用者請求大小)—— 傳回給使用者;
  • Remainder chunk (剩餘大小)—— 添加到 unsorted bin。

而且這裡也有比較有趣的細節,就是ptmalloc也使用了bitmap來加速掃描。如果相應 bin 中的最大 chunk 大小小于使用者請求大小,配置設定器就會掃描 binmaps,進而查找最小非空 bin。如果找到了這樣的 bin,就從中選擇合适的 chunk 并切割給使用者;反之就使用 top chunk 響應使用者請求。

Top Chunk

還有可能,已經連Large Bins都無法滿足的記憶體大小,就需要通過Top Chunk來完成。在前面一直提到, ptmalloc 會預先配置設定一塊較大的空閑記憶體(也就是所為的 heap), 而通過管理這塊記憶體來響應使用者的需求, 因為記憶體是按位址從低向高進行配置設定的, 在空閑記憶體的最高處, 必然存在着一塊空閑 chunk, 叫做 “top chunk”. 當 bins 和 fastbins 都不能滿足配置設定需要的時候, ptmalloc 會設法在 “top chunk” 中分出一塊記憶體給使用者, 如果 “top chunk” 本身不夠大, 則 ptmalloc 會适當的增加它的大小(也就增加了 heap 的大小). 以滿足配置設定的需要, 實際上, “top chunk” 在配置設定時總是在 ‘fastbins 和 bins 之後被考慮, 是以, 不論 “top chunk” 有多大, 它都不會被放到 fastbins 或者是 bins 中。

這裡的use_top,其實就是在使用top_chunk部分,也就是配置設定的堆區的頂端的記憶體。依然是best-fit search rule。

use_top:
      /*
         If large enough, split off the chunk bordering the end of memory
         (held in av->top). Note that this is in accord with the best-fit
         search rule.  In effect, av->top is treated as larger (and thus
         less well fitting) than any other available chunk since it can
         be extended to be as large as necessary (up to system
         limitations).

         We require that av->top always exists (i.e., has size >=
         MINSIZE) after initialization, so if it would otherwise be
         exhausted by current request, it is replenished. (The main
         reason for ensuring it exists is that we may need MINSIZE space
         to put in fenceposts in sysmalloc.)
       */

      victim = av->top;
      size = chunksize (victim);

      if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
        {
          remainder_size = size - nb;
          remainder = chunk_at_offset (victim, nb);
          av->top = remainder;
          set_head (victim, nb | PREV_INUSE |
                    (av != &main_arena ? NON_MAIN_ARENA : 0));
          set_head (remainder, remainder_size | PREV_INUSE);

          check_malloced_chunk (av, victim, nb);
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }

      /* When we are using atomic ops to free fast chunks we can get
         here for all block sizes.  */
      else if (have_fastchunks (av))
        {
          malloc_consolidate (av);
          /* restore original bin index */
          if (in_smallbin_range (nb))
            idx = smallbin_index (nb);
          else
            idx = largebin_index (nb);
        }
.................
    }
           

不過,當需要配置設定的 chunk 足夠大, 而且 fastbins 和 bins 都不能滿足要求, 甚至 “top chunk” 本身也不能滿足配置設定需求時, ptmalloc 會使用 mmap 來直接使用記憶體映射來将頁映射到程序空間。 這樣配置設定的 chunk 在被 free 時将直接解除映射, 于是就将記憶體歸還給了系統, 而不是allocator。再次對這樣的記憶體區的引用将導緻一個 “segmentation fault” 錯誤. 這樣的 chunk 也不會包含在任何 bin 中。

可見,對于比較大的記憶體申請,Allocator 會先找Top Chunk,然後還不夠,才需要sysmalloc (sysmalloc裡會做判斷,是調用sbrk擴大top chunk的大小,還是調用mmap來調用一個sub-heap)。

/*
         Otherwise, relay to handle system-dependent cases
       */
      else
        {
          void *p = sysmalloc (nb, av);
          if (p != NULL)
            alloc_perturb (p, bytes);
          return p;
        }
           

Free

void
__libc_free (void *mem)
{
  mstate ar_ptr;
  mchunkptr p;                          /* chunk corresponding to mem */

  void (*hook) (void *, const void *)
    = atomic_forced_read (__free_hook);
  if (__builtin_expect (hook != NULL, 0))
    {
      (*hook)(mem, RETURN_ADDRESS (0));
      return;
    }

  if (mem == 0)                              /* free(0) has no effect */
    return;

  p = mem2chunk (mem);

  if (chunk_is_mmapped (p))                       /* release mmapped memory. */
    {
      /* see if the dynamic brk/mmap threshold needs adjusting */
      if (!mp_.no_dyn_threshold
          && p->size > mp_.mmap_threshold
          && p->size <= DEFAULT_MMAP_THRESHOLD_MAX)
        {
          mp_.mmap_threshold = chunksize (p);
          mp_.trim_threshold = 2 * mp_.mmap_threshold;
          LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
                      mp_.mmap_threshold, mp_.trim_threshold);
        }
      munmap_chunk (p);
      return;
    }

  ar_ptr = arena_for_chunk (p);
  _int_free (ar_ptr, p, 0);
}
libc_hidden_def (__libc_free)
           

free 函數做的事情比較簡單,就是先從memory轉移為内部狀态chunk,然後如果這個區域是mmap的,直接解除映射即可。而如果是其他情況,那麼就定位到對應的arena,然後釋放對應指針。

這個就是核心free chunk的函數,不過由于時間關系,不能很細的分析了。講一下大概流程。

static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
  INTERNAL_SIZE_T size;        /* its size */
  mfastbinptr *fb;             /* associated fastbin */
  mchunkptr nextchunk;         /* next contiguous chunk */
  INTERNAL_SIZE_T nextsize;    /* its size */
  int nextinuse;               /* true if nextchunk is used */
  INTERNAL_SIZE_T prevsize;    /* size of previous contiguous chunk */
  mchunkptr bck;               /* misc temp for linking */
  mchunkptr fwd;               /* misc temp for linking */

  const char *errstr = NULL;
  int locked = 0;

  size = chunksize (p);

  /* Little security check which won't hurt performance: the
     allocator never wrapps around at the end of the address space.
     Therefore we can exclude some size values which might appear
     here by accident or by "design" from some intruder.  */
  if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)
      || __builtin_expect (misaligned_chunk (p), 0))
    {
      errstr = "free(): invalid pointer";
    errout:
      if (!have_lock && locked)
        (void) mutex_unlock (&av->mutex);
      malloc_printerr (check_action, errstr, chunk2mem (p));
      return;
    }
  /* We know that each chunk is at least MINSIZE bytes in size or a
     multiple of MALLOC_ALIGNMENT.  */
  if (__glibc_unlikely (size < MINSIZE || !aligned_OK (size)))
    {
      errstr = "free(): invalid size";
      goto errout;
    }

  check_inuse_chunk(av, p);

  /*
    If eligible, place chunk on a fastbin so it can be found
    and used quickly in malloc.
  */

  if ((unsigned long)(size) <= (unsigned long)(get_max_fast ())

#if TRIM_FASTBINS
      /*
	If TRIM_FASTBINS set, don't place chunks
	bordering top into fastbins
      */
      && (chunk_at_offset(p, size) != av->top)
#endif
      ) {

    if (__builtin_expect (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ, 0)
	|| __builtin_expect (chunksize (chunk_at_offset (p, size))
			     >= av->system_mem, 0))
      {
	/* We might not have a lock at this point and concurrent modifications
	   of system_mem might have let to a false positive.  Redo the test
	   after getting the lock.  */
	if (have_lock
	    || ({ assert (locked == 0);
		  mutex_lock(&av->mutex);
		  locked = 1;
		  chunk_at_offset (p, size)->size <= 2 * SIZE_SZ
		    || chunksize (chunk_at_offset (p, size)) >= av->system_mem;
	      }))
	  {
	    errstr = "free(): invalid next size (fast)";
	    goto errout;
	  }
	if (! have_lock)
	  {
	    (void)mutex_unlock(&av->mutex);
	    locked = 0;
	  }
      }

    free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);

    set_fastchunks(av);
    unsigned int idx = fastbin_index(size);
    fb = &fastbin (av, idx);

    /* Atomically link P to its fastbin: P->FD = *FB; *FB = P;  */
    mchunkptr old = *fb, old2;
    unsigned int old_idx = ~0u;
    do
      {
	/* Check that the top of the bin is not the record we are going to add
	   (i.e., double free).  */
	if (__builtin_expect (old == p, 0))
	  {
	    errstr = "double free or corruption (fasttop)";
	    goto errout;
	  }
	/* Check that size of fastbin chunk at the top is the same as
	   size of the chunk that we are adding.  We can dereference OLD
	   only if we have the lock, otherwise it might have already been
	   deallocated.  See use of OLD_IDX below for the actual check.  */
	if (have_lock && old != NULL)
	  old_idx = fastbin_index(chunksize(old));
	p->fd = old2 = old;
      }
    while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2);

    if (have_lock && old != NULL && __builtin_expect (old_idx != idx, 0))
      {
	errstr = "invalid fastbin entry (free)";
	goto errout;
      }
  }

  /*
    Consolidate other non-mmapped chunks as they arrive.
  */

  else if (!chunk_is_mmapped(p)) {
    if (! have_lock) {
      (void)mutex_lock(&av->mutex);
      locked = 1;
    }

    nextchunk = chunk_at_offset(p, size);

    /* Lightweight tests: check whether the block is already the
       top block.  */
    if (__glibc_unlikely (p == av->top))
      {
	errstr = "double free or corruption (top)";
	goto errout;
      }
    /* Or whether the next chunk is beyond the boundaries of the arena.  */
    if (__builtin_expect (contiguous (av)
			  && (char *) nextchunk
			  >= ((char *) av->top + chunksize(av->top)), 0))
      {
	errstr = "double free or corruption (out)";
	goto errout;
      }
    /* Or whether the block is actually not marked used. */
    if (__glibc_unlikely (!prev_inuse(nextchunk)))
      {
	errstr = "double free or corruption (!prev)";
	goto errout;
      }

    nextsize = chunksize(nextchunk);
    if (__builtin_expect (nextchunk->size <= 2 * SIZE_SZ, 0)
	|| __builtin_expect (nextsize >= av->system_mem, 0))
      {
	errstr = "free(): invalid next size (normal)";
	goto errout;
      }

    free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);

    /* consolidate backward */
    if (!prev_inuse(p)) {
      prevsize = p->prev_size;
      size += prevsize;
      p = chunk_at_offset(p, -((long) prevsize));
      unlink(p, bck, fwd);
    }

    if (nextchunk != av->top) {
      /* get and clear inuse bit */
      nextinuse = inuse_bit_at_offset(nextchunk, nextsize);

      /* consolidate forward */
      if (!nextinuse) {
	unlink(nextchunk, bck, fwd);
	size += nextsize;
      } else
	clear_inuse_bit_at_offset(nextchunk, 0);

      /*
	Place the chunk in unsorted chunk list. Chunks are
	not placed into regular bins until after they have
	been given one chance to be used in malloc.
      */

      bck = unsorted_chunks(av);
      fwd = bck->fd;
      if (__glibc_unlikely (fwd->bk != bck))
	{
	  errstr = "free(): corrupted unsorted chunks";
	  goto errout;
	}
      p->fd = fwd;
      p->bk = bck;
      if (!in_smallbin_range(size))
	{
	  p->fd_nextsize = NULL;
	  p->bk_nextsize = NULL;
	}
      bck->fd = p;
      fwd->bk = p;

      set_head(p, size | PREV_INUSE);
      set_foot(p, size);

      check_free_chunk(av, p);
    }

    /*
      If the chunk borders the current high end of memory,
      consolidate into top
    */

    else {
      size += nextsize;
      set_head(p, size | PREV_INUSE);
      av->top = p;
      check_chunk(av, p);
    }

    /*
      If freeing a large space, consolidate possibly-surrounding
      chunks. Then, if the total unused topmost memory exceeds trim
      threshold, ask malloc_trim to reduce top.

      Unless max_fast is 0, we don't know if there are fastbins
      bordering top, so we cannot tell for sure whether threshold
      has been reached unless fastbins are consolidated.  But we
      don't want to consolidate on each free.  As a compromise,
      consolidation is performed if FASTBIN_CONSOLIDATION_THRESHOLD
      is reached.
    */

    if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) {
      if (have_fastchunks(av))
	malloc_consolidate(av);

      if (av == &main_arena) {
#ifndef MORECORE_CANNOT_TRIM
	if ((unsigned long)(chunksize(av->top)) >=
	    (unsigned long)(mp_.trim_threshold))
	  systrim(mp_.top_pad, av);
#endif
      } else {
	/* Always try heap_trim(), even if the top chunk is not
	   large, because the corresponding heap might go away.  */
	heap_info *heap = heap_for_ptr(top(av));

	assert(heap->ar_ptr == av);
	heap_trim(heap, mp_.top_pad);
      }
    }

    if (! have_lock) {
      assert (locked);
      (void)mutex_unlock(&av->mutex);
    }
  }
  /*
    If the chunk was allocated via mmap, release via munmap().
  */

  else {
    munmap_chunk (p);
  }
}
           

首先會計算free的chunk的size,然後在fast bin中去找這個chunk。因為bin中大小都是有對應的,是以可以根據size找到連結清單。然後主要是檢查相鄰的chunk,看有沒有合并優化的空間。

将合并後的chunk加入unsorted_bin的雙向循環連結清單中,同時調用malloc_consolidate()函數合并 fast bins 中的空閑 chunk 到 unsorted bin 中。

而為什麼free的時候不用申明記憶體大小,之前寫過一篇文章:https://zhuanlan.zhihu.com/p/145953980 簡要講了一下這個點。

對于Libc的malloc/free的學習就到這裡,細節部分還可以研究的更深,礙于時間關系就留着下次,不過對于malloc的整體設計有了比較清晰的了解。

參考來源: https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/comment-page-1/?blogsub=confirming#subscribe-blog%E3%80%82

https://blog.csdn.net/maokelong95/article/details/52006379