epoll是如何監控多個描述符及如何獲得通知(2)

作者：[email protected]

部落格：blog.focus-linux.net linuxfocus.blog.chinaunix.net

微網誌：weibo.com/glinuxer

QQ技術群：4367710

本文的copyleft歸[email protected]所有，使用GPL釋出，可以自由拷貝，轉載。但轉載請保持文檔的完整性，注明原作者及原連結，嚴禁用于任何商業用途。

========================================================================================================

上文書說到，epoll是如何加到每個監控描述符的wait queue中，這隻是第一步。上次也提過，epoll實際上也是一個阻塞操作，隻不過是可以同時監控多個檔案描述符。下面看一下epoll_wait->ep_poll的實作。

epoll既然是阻塞的，必然需要wait queue。但是這個不能使用監控的檔案描述符的wait queue，epoll自己本身也是一個虛拟的檔案系統。epoll_create的傳回值也是一個檔案描述符。Unix下，一切皆是檔案嘛。

是以epoll的實作代碼如下：

init_waitqueue_entry(&wait, current);

__add_wait_queue_exclusive(&ep->wq, &wait);

for (;;) {

* We don't want to sleep if the ep_poll_callback() sends us

* a wakeup in between. That's why we set the task state

* to TASK_INTERRUPTIBLE before doing the checks.

set_current_state(TASK_INTERRUPTIBLE);

if (ep_events_available(ep) || timed_out)

break;

if (signal_pending(current)) {

res = -EINTR;

}

spin_unlock_irqrestore(&ep->lock, flags);

if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))

timed_out = 1;

spin_lock_irqsave(&ep->lock, flags);

}

__remove_wait_queue(&ep->wq, &wait);

這裡epoll_wait是将目前程序添加到epoll自身的wait queue中。那麼問題來了，前文說到epoll已經将目前程序加到了各個監控描述符的wait queue中。現在這裡又有了一個epoll自身的wait queue。這是為什麼呢？

回答這個問題，需要我們再跳回ep_ptable_queue_proc——不記得這個函數的同學，請翻看前面的文章。這個函數調用init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);，将epoll目前程序的wait queue節點的回調函數設定為ep_poll_callback。對比epoll調用的init_waitqueue_entry函數，這個函數設定wait queue節點的回調函數為default_wake_function。

那麼當監控檔案描述符執行wakeup動作時，比如一個socket收到資料時，調用sk_data_ready->sock_def_readable->wake_up_interruptible_sync_poll->....最終會執行wait_queue節點的回調函數。對于epoll來說，即ep_poll_callback。

static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *key)

{

int pwake = 0;

unsigned long flags;

struct epitem *epi = ep_item_from_wait(wait);

struct eventpoll *ep = epi->ep;

spin_lock_irqsave(&ep->lock, flags);

* If the event mask does not contain any poll(2) event, we consider the

* descriptor to be disabled. This condition is likely the effect of the

* EPOLLONESHOT bit that disables the descriptor when an event is received,

* until the next EPOLL_CTL_MOD will be issued.

if (!(epi->event.events & ~EP_PRIVATE_BITS))

goto out_unlock;

* Check the events coming with the callback. At this stage, not

* every device reports the events in the "key" parameter of the

* callback. We need to be able to handle both cases here, hence the

* test for "key" != NULL before the event match test.

if (key && !((unsigned long) key & epi->event.events))

* If we are transferring events to userspace, we can hold no locks

* (because we're accessing user memory, and because of linux f_op->poll()

* semantics). All the events that happen during that period of time are

* chained in ep->ovflist and requeued later on.

if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {

if (epi->next == EP_UNACTIVE_PTR) {

epi->next = ep->ovflist;

ep->ovflist = epi;

}

/* If this file is already in the ready list we exit soon */

if (!ep_is_linked(&epi->rdllink))

list_add_tail(&epi->rdllink, &ep->rdllist);

* Wake up ( if active ) both the eventpoll wait list and the ->poll()

* wait list.

if (waitqueue_active(&ep->wq))

wake_up_locked(&ep->wq);

if (waitqueue_active(&ep->poll_wait))

pwake++;

out_unlock:

spin_unlock_irqrestore(&ep->lock, flags);

/* We have to call this outside the lock */

if (pwake)

ep_poll_safewake(&ep->poll_wait);

return 1;

}

這個函數的注釋相當清楚，可以清晰的知道每一行代碼的用途。其中

if (waitqueue_active(&ep->wq))

這兩行代碼，檢測了epoll自身的wait queue上是否有等待的節點，如果有的話，就執行喚醒動作。對于epoll的使用者來說，如果使用者态正阻塞在epoll_wait中，那麼ep->wq一定不為空，這時就會被喚醒。将該程序移到就緒隊列中。

這兩篇文章基本上理清了epoll如何監控多個描述符及如何獲得通知的過程。對于如何監控來說，還欠缺了epoll内部結構，如何儲存的各個描述符，如何維護的資訊等。不過這樣的文章網上已經有了很多。也許以後我會針對這個問題，再寫兩篇文章吧。

epoll是如何監控多個描述符及如何獲得通知(2)

繼續閱讀

Apache (You don't have permission to access / on this server.）

debian9更新4.9.0核心到4.19.2核心過程

centOS7 配置 vsftpd 虛拟使用者及權限Vsftpd配置虛拟使用者及權限

linux-svn解除安裝與安裝

vsftp虛拟多使用者多權限一鍵部署腳本

Ubuntu14.04 LTS下安裝mongodb

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

配置網頁内容通路

手動安裝Intel network I217-LM網卡的Linux驅動

禁止ubuntu系統彈出報錯界面

Ubuntu Linux下Apache的配置檔案

samba伺服器的功能

【Linux】UDP廣播封包接收速率問題

Linux裝置模型（中）之上層容器

PowerPC平台 Linux移植三