天天看點

Open vSwitch系列之資料結構解析深入分析ofpbuf

上一篇我們分析了hmap,hamp可以說是Open vSwitch中基石結構,很多Open vSwitch中資料結構都依賴hmap。本篇我們來分析一下ofpbuf,這個結構,我們從名字上就可得知,此資料結構用于存儲資料的,比如收發OpenFlow封包。

我們首先來看一下,它資料結構定義。(有些内容我是直接寫在代碼注釋中的)

/* Buffer for holding arbitrary data.  An ofpbuf is automatically reallocated
 * as necessary if it grows too large for the available memory.
 *
 * 'frame' and offset conventions:
 *
 * Network frames (aka "packets"): 'frame' MUST be set to the start of the
 *    packet, layer offsets MAY be set as appropriate for the packet.
 *    Additionally, we assume in many places that the 'frame' and 'data' are
 *    the same for packets.
 *
 * OpenFlow messages: 'frame' points to the start of the OpenFlow
 *    header, while 'l3_ofs' is the length of the OpenFlow header.
 *    When parsing, the 'data' will move past these, as data is being
 *    pulled from the OpenFlow message.
 *
 * Actions: When encoding OVS action lists, the 'frame' is used
 *    as a pointer to the beginning of the current action (see ofpact_put()).
 *
 * rconn: Reuses 'frame' as a private pointer while queuing.
 */           

複制

struct ofpbuf {//這個有一個預編譯,為了簡單起見,我們認為DPDK_NETDEV宏無效(關于DPDK網上有很多資料)。

#ifdef DPDK_NETDEV
    struct rte_mbuf mbuf;       /* DPDK mbuf */
#else
    void *base_;                 /* First byte of allocated space. 指向記憶體申請的起始位置。釋放記憶體時候此變量傳給free */
    void *data_;                 /* First byte actually in use. 指向目前可用記憶體起始位置。最開始base_和data_ 是一樣的 */
    uint32_t size_;              /* Number of bytes in use. 表示記憶體已經使用的位元組數 當size_ = allocated時候表示記憶體用完。 */
#endif
    uint32_t allocated;         /* Number of bytes allocated. 表示從系統中申請的記憶體塊大小*/
    void *frame;                /* Packet frame start, or NULL. 這個字段可參考上面注釋*/
    uint16_t l2_5_ofs;          /* MPLS label stack offset from 'frame', or
                                 * UINT16_MAX 2.5層 偏移量 */
    uint16_t l3_ofs;            /* Network-level header offset from 'frame',
                                  or UINT16_MAX. 3層網絡層 偏移量*/
   uint16_t l4_ofs; /* Transport-level header offset from 'frame',
                                  or UINT16_MAX. 4層傳輸層 偏移量*/
    enum ofpbuf_source source;  /* Source of memory allocated as 'base'. 表示該記憶體來自堆、棧,主要用于記憶體釋放。取值為ofpbuf_source枚舉*/
    struct list list_node;      /* Private list element for use by owner. 連結清單節點。 用于将多個ofpbuf關聯在一起 */
};
//枚舉類型
enum OVS_PACKED_ENUM ofpbuf_source {
    OFPBUF_MALLOC,              /* Obtained via malloc(). */
    OFPBUF_STACK,               /* Un-movable stack space or static buffer. */
    OFPBUF_STUB,                /* Starts on stack, may expand into heap. */
    OFPBUF_DPDK,                /* buffer data is from DPDK allocated memory.
                                   ref to build_ofpbuf() in netdev-dpdk. */
};           

複制

下面是可能的存儲結構圖:

Open vSwitch系列之資料結構解析深入分析ofpbuf

上圖表示,配置設定16個位元組空間,灰色部分為預留白間(4位元組),藍色為占用空間(5個位元組),白色為剩餘可用空間(7個位元組)。

資料結構相對簡單,我們看一下主要函數。由代碼中的注釋可知,資料結構ofpbuf支援記憶體空間自動擴充,可以了解為簡單記憶體池。為了深入就了解ofpbuf,我們選擇一個從堆中申請記憶體的例子(Test-sflow.c)進行分析(因為其他記憶體類型是不需要釋放空間的),如下所示:

static void
test_sflow_main(int argc, char *argv[])
{
....
struct ofpbuf buf;
....
ofpbuf_init(&buf, MAX_RECV);
for (;;) {
    int retval;
    unixctl_server_run(server);
    ofpbuf_clear(&buf);
    do {
        retval = read(sock, ofpbuf_data(&buf), buf.allocated);
    } while (retval < 0 && errno == EINTR);
    if (retval > 0) {
       ofpbuf_put_uninit(&buf, retval);
       print_sflow(&buf);
       fflush(stdout);
    }
    if (exiting) {
        break;
    }
    poll_fd_wait(sock, POLLIN);
    unixctl_server_wait(server);
    poll_block();
  }//for exit
}           

複制

1、初始化opfbuf結構

我們可以先申請一個局部變量,然後将該變量位址和要申請的大小傳給函數ofpbuf_init(OpenvSwitch代碼好處是每個函數都是很小,耐心鑽研一定可以看懂)。我們來看一下函數調用關系:

Open vSwitch系列之資料結構解析深入分析ofpbuf

ofpbuf初始化流程,經過的函數依次是,ofpbuf_init,ofpbuf_use,ofpbuf_use__,ofpbuf_init__。(函數命名中最後是兩個下劃線代表是靜态函數)現在我們來看一下各個函數實作。

static void
ofpbuf_init__(struct ofpbuf *b, size_t allocated, enum ofpbuf_source source)
{
    b->allocated = allocated;//設定申請的記憶體大小 即記憶體塊大小
    b->source = source;//記憶體的類型,目前執行個體是malloc類型
    b->frame = NULL;
    b->l2_5_ofs = b->l3_ofs = b->l4_ofs = UINT16_MAX;
    list_poison(&b->list_node);
}
static void
ofpbuf_use__(struct ofpbuf *b, void *base, size_t allocated,
             enum ofpbuf_source source)
{
    ofpbuf_set_base(b, base);//設定base
    ofpbuf_set_data(b, base);//設定data  此時base和data儲存的都是記憶體起始位置,隻不過是data會變化,base不變
    ofpbuf_set_size(b, 0);//設定已經使用的記憶體大小 起初為0

    ofpbuf_init__(b, allocated, source);
}
/* Initializes 'b' as an empty ofpbuf that contains the 'allocated' bytes of
* memory starting at 'base'.  'base' should be the first byte of a region
* obtained from malloc().  It will be freed (with free()) if 'b' is resized or
* freed. */
void
ofpbuf_use(struct ofpbuf *b, void *base, size_t allocated)
{
    ofpbuf_use__(b, base, allocated, OFPBUF_MALLOC);//記憶體類型為malloc類型
}
/* Initializes 'b' as an empty ofpbuf with an initial capacity of 'size'
* bytes. */
void
ofpbuf_init(struct ofpbuf *b, size_t size)
{
    ofpbuf_use(b, size ? xmalloc(size) : NULL, size);
}           

複制

上面是初始化操作流程,邏輯和内容十分簡單。我們現在來看一下put操作,即增加記憶體空間。在介紹put操作之前,我們先來看四個工具函數,也是非常小的函數:

/* Returns the byte following the last byte of data in use in 'b'.
* 傳回第一個可存儲資料位址 針對上圖傳回值是 (0x832200C + 5)
*/
static inline void *ofpbuf_tail(const struct ofpbuf *b)
{
    return (char *) ofpbuf_data(b) + ofpbuf_size(b); /* data_ 指向資料封包起始位置,即上面藍色開始位置 */
}
/* Returns the byte following the last byte allocated for use (but not
* necessarily in use) by 'b'.
* <span style="font-family: Arial, Helvetica, sans-serif;">傳回記憶體區最後一個位元組位址  針對上圖傳回值為 (0x8322008 + 16)</span>
*/
static inline void *ofpbuf_end(const struct ofpbuf *b)
{
    return (char *) ofpbuf_base(b) + b->allocated; /* base_ 指向記憶體區起始位置 */
}
/* Returns the number of bytes of headroom in 'b', that is, the number of bytes
* of unused space in ofpbuf 'b' before the data that is in use.  (Most
* commonly, the data in a ofpbuf is at its beginning, and thus the ofpbuf's
* headroom is 0.)
* 頭部剩餘空間大小。 直接用data_ - base_ 就可以得到。
*/
static inline size_t ofpbuf_headroom(const struct ofpbuf *b)
{
    return (char*)ofpbuf_data(b) - (char*)ofpbuf_base(b);
}
/* Returns the number of bytes that may be appended to the tail end of ofpbuf
* 'b' before the ofpbuf must be reallocated.
* 尾部剩餘空間大小。
*/
static inline size_t ofpbuf_tailroom(const struct ofpbuf *b)
{
    return (char*)ofpbuf_end(b) - (char*)ofpbuf_tail(b);
}           

複制

下面就是擴大記憶體的具體函數:

/* Appends 'size' bytes of data to the tail end of 'b', reallocating and
* copying its data if necessary.  Returns a pointer to the first byte of the
* new data, which is left uninitialized.
* 擴大size大小記憶體空間,但是不初始化
*/
void *
ofpbuf_put_uninit(struct ofpbuf *b, size_t size)
{
    void *p;
    ofpbuf_prealloc_tailroom(b, size); /* 在尾部,擴大記憶體 */
    p = ofpbuf_tail(b); /* 擴充記憶體後,儲存第一個可用記憶體位址 */
    ofpbuf_set_size(b, ofpbuf_size(b) + size); /* 設定已用記憶體空間大小 */
    return p;
}
/* Appends 'size' zeroed bytes to the tail end of 'b'.  Data in 'b' is
* reallocated and copied if necessary.  Returns a pointer to the first byte of
* the data's location in the ofpbuf.
* 擴大size大小記憶體空間,初始化為0
*/
void *
ofpbuf_put_zeros(struct ofpbuf *b, size_t size)
{
    void *dst = ofpbuf_put_uninit(b, size);
    memset(dst, 0, size);
    return dst;
}
/* Appends the 'size' bytes of data in 'p' to the tail end of 'b'.  Data in 'b'
* is reallocated and copied if necessary.  Returns a pointer to the first
* byte of the data's location in the ofpbuf.
* 擴大size大小記憶體空間,用p進行初始化
*/
void *
ofpbuf_put(struct ofpbuf *b, const void *p, size_t size)
{
    void *dst = ofpbuf_put_uninit(b, size);
    memcpy(dst, p, size);
    return dst;
}           

複制

這三個函數功能都是類似的,在原有ofpbuf結構b中增大size大小的記憶體空間。 函數ofpbuf_put_uninit會被其他兩個函數調用。我來分析一下這個函數。

ofpbuf_prealloc_tailroom 在尾部擴充記憶體,這個函數邏輯也是很簡單

/* Returns the number of bytes that may be appended to the tail end of ofpbuf
* 'b' before the ofpbuf must be reallocated.
* 傳回可用記憶體空間,即上圖中白色空間大小
*/
static inline size_t ofpbuf_tailroom(const struct ofpbuf *b)
{
    return (char*)ofpbuf_end(b) - (char*)ofpbuf_tail(b);
}
/* Reallocates 'b' so that it has exactly 'new_headroom' and 'new_tailroom'
* bytes of headroom and tailroom, respectively.
* 記憶體擴充函數  我們隻關注malloc的記憶體 即紅色部分
*/
static void
ofpbuf_resize__(struct ofpbuf *b, size_t new_headroom, size_t new_tailroom)
{
    void *new_base, *new_data;
    size_t new_allocated;
    new_allocated = new_headroom + ofpbuf_size(b) + new_tailroom;
    switch (b->source) {
    case OFPBUF_DPDK:
        OVS_NOT_REACHED();
        case OFPBUF_MALLOC:
        if (new_headroom == ofpbuf_headroom(b)) {//調用realloc申請記憶體
           new_base = xrealloc(ofpbuf_base(b), new_allocated);
        } else {
            new_base = xmalloc(new_allocated);//調用malloc申請記憶體并且修改ofpbuf中相關資料
            ofpbuf_copy__(b, new_base, new_headroom, new_tailroom); /* 将資料複制到新的記憶體空間中 需要注意頭部剩餘空間和使用空間。*/
            free(ofpbuf_base(b));
        }
        break;</span>
    case OFPBUF_STACK:
        OVS_NOT_REACHED();
    case OFPBUF_STUB:
        b->source = OFPBUF_MALLOC;
        new_base = xmalloc(new_allocated);
        ofpbuf_copy__(b, new_base, new_headroom, new_tailroom);
        break;
    default:
        OVS_NOT_REACHED();
    }
    // 重新設定allocated和base_ 指針
    b->allocated = new_allocated;
    ofpbuf_set_base(b, new_base);
    // 重新設定data_ 指針
    new_data = (char *) new_base + new_headroom;
    if (ofpbuf_data(b) != new_data) {
        if (b->frame) {
            uintptr_t data_delta = (char *) new_data - (char *) ofpbuf_data(b);
            b->frame = (char *) b->frame + data_delta;
        }
        ofpbuf_set_data(b, new_data);
    }
}
/* Ensures that 'b' has room for at least 'size' bytes at its tail end,
* reallocating and copying its data if necessary.  Its headroom, if any, is
* preserved.
* 尾部擴充記憶體 首先需要判斷剩餘記憶體是否滿足需求,如果size大于剩餘可用空間則需要重新申請記憶體
* 為了避免記憶體碎片和快速申請,每次至少申請64位元組
*/
void
ofpbuf_prealloc_tailroom(struct ofpbuf *b, size_t size)
{
    if (size > ofpbuf_tailroom(b)) {
        ofpbuf_resize__(b, ofpbuf_headroom(b), MAX(size, 64));
    }
}           

複制

與pull對應函數是push,此類函數主要是在頭部擴充記憶體,這裡我們不在進行讨論。函數ofpbuf_pull主要增大灰色空間大小,即将藍色區域向後移動size大小。

/* Removes 'size' bytes from the head end of 'b', which must contain at least
* 'size' bytes of data.  Returns the first byte of data removed. */
static inline void *ofpbuf_pull(struct ofpbuf *b, size_t size)
{
    void *data = ofpbuf_data(b);
    ovs_assert(ofpbuf_size(b) >= size);
    ofpbuf_set_data(b, (char*)ofpbuf_data(b) + size);
    ofpbuf_set_size(b, ofpbuf_size(b) - size);
    return data;
}           

複制

最後我們來看一下釋放函數,這個函數也是非常簡單的。

/* Frees memory that 'b' points to. */
void
ofpbuf_uninit(struct ofpbuf *b)
{
    if (b) {
        if (b->source == OFPBUF_MALLOC) {
            free(ofpbuf_base(b));
        }
        ovs_assert(b->source != OFPBUF_DPDK);
    }
}           

複制

上面就是本部落客要介紹的記憶體,ofpbuf相對簡單,下面我們會分析Open vSwitch會話相關的資料結構struct connmgr,struct ofconn,struct ofproto等,這部分資料結構屬于Open vSwitch管理層。對于學習Open vSwitch是非常重要。