天天看點

epoll是如何監控多個描述符及如何獲得通知(2)

作者:[email protected]

部落格:blog.focus-linux.net   linuxfocus.blog.chinaunix.net 

微網誌:weibo.com/glinuxer

QQ技術群:4367710

本文的copyleft歸[email protected]所有,使用GPL釋出,可以自由拷貝,轉載。但轉載請保持文檔的完整性,注明原作者及原連結,嚴禁用于任何商業用途。

========================================================================================================

上文書說到,epoll是如何加到每個監控描述符的wait queue中,這隻是第一步。上次也提過,epoll實際上也是一個阻塞操作,隻不過是可以同時監控多個檔案描述符。下面看一下epoll_wait->ep_poll的實作。

epoll既然是阻塞的,必然需要wait queue。但是這個不能使用監控的檔案描述符的wait queue,epoll自己本身也是一個虛拟的檔案系統。epoll_create的傳回值也是一個檔案描述符。Unix下,一切皆是檔案嘛。

是以epoll的實作代碼如下:

        init_waitqueue_entry(&wait, current);

        __add_wait_queue_exclusive(&ep->wq, &wait);

        for (;;) {

            /*

             * We don't want to sleep if the ep_poll_callback() sends us

             * a wakeup in between. That's why we set the task state

             * to TASK_INTERRUPTIBLE before doing the checks.

             */

            set_current_state(TASK_INTERRUPTIBLE);

            if (ep_events_available(ep) || timed_out)

                break;

            if (signal_pending(current)) {

                res = -EINTR;

            }

            spin_unlock_irqrestore(&ep->lock, flags);

            if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))

                timed_out = 1;

            spin_lock_irqsave(&ep->lock, flags);

        }

        __remove_wait_queue(&ep->wq, &wait);

這裡epoll_wait是将目前程序添加到epoll自身的wait queue中。那麼問題來了,前文說到epoll已經将目前程序加到了各個監控描述符的wait queue中。現在這裡又有了一個epoll自身的wait queue。這是為什麼呢?

回答這個問題,需要我們再跳回ep_ptable_queue_proc——不記得這個函數的同學,請翻看前面的文章。這個函數調用init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);,将epoll目前程序的wait queue節點的回調函數設定為ep_poll_callback。對比epoll調用的init_waitqueue_entry函數,這個函數設定wait queue節點的回調函數為default_wake_function。

那麼當監控檔案描述符執行wakeup動作時,比如一個socket收到資料時,調用sk_data_ready->sock_def_readable->wake_up_interruptible_sync_poll->....最終會執行wait_queue節點的回調函數。對于epoll來說,即ep_poll_callback。

static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *key)

{

    int pwake = 0;

    unsigned long flags;

    struct epitem *epi = ep_item_from_wait(wait);

    struct eventpoll *ep = epi->ep;

    spin_lock_irqsave(&ep->lock, flags);

    /*

     * If the event mask does not contain any poll(2) event, we consider the

     * descriptor to be disabled. This condition is likely the effect of the

     * EPOLLONESHOT bit that disables the descriptor when an event is received,

     * until the next EPOLL_CTL_MOD will be issued.

     */

    if (!(epi->event.events & ~EP_PRIVATE_BITS))

        goto out_unlock;

     * Check the events coming with the callback. At this stage, not

     * every device reports the events in the "key" parameter of the

     * callback. We need to be able to handle both cases here, hence the

     * test for "key" != NULL before the event match test.

    if (key && !((unsigned long) key & epi->event.events))

     * If we are transferring events to userspace, we can hold no locks

     * (because we're accessing user memory, and because of linux f_op->poll()

     * semantics). All the events that happen during that period of time are

     * chained in ep->ovflist and requeued later on.

    if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {

        if (epi->next == EP_UNACTIVE_PTR) {

            epi->next = ep->ovflist;

            ep->ovflist = epi;

    }

    /* If this file is already in the ready list we exit soon */

    if (!ep_is_linked(&epi->rdllink))

        list_add_tail(&epi->rdllink, &ep->rdllist);

     * Wake up ( if active ) both the eventpoll wait list and the ->poll()

     * wait list.

    if (waitqueue_active(&ep->wq))

        wake_up_locked(&ep->wq);

    if (waitqueue_active(&ep->poll_wait))

        pwake++;

out_unlock:

    spin_unlock_irqrestore(&ep->lock, flags);

    /* We have to call this outside the lock */

    if (pwake)

        ep_poll_safewake(&ep->poll_wait);

    return 1;

}

這個函數的注釋相當清楚,可以清晰的知道每一行代碼的用途。其中

if (waitqueue_active(&ep->wq))

這兩行代碼,檢測了epoll自身的wait queue上是否有等待的節點,如果有的話,就執行喚醒動作。對于epoll的使用者來說,如果使用者态正阻塞在epoll_wait中,那麼ep->wq一定不為空,這時就會被喚醒。将該程序移到就緒隊列中。

這兩篇文章基本上理清了epoll如何監控多個描述符及如何獲得通知的過程。對于如何監控來說,還欠缺了epoll内部結構,如何儲存的各個描述符,如何維護的資訊等。不過這樣的文章網上已經有了很多。也許以後我會針對這個問題,再寫兩篇文章吧。

繼續閱讀