linux网卡驱动的加载流程

在linux系统中，其网卡驱动大多通过PCI总线与系统相连，同时，内核对于所有PCI总线上的设备是通过PCI子系统来进行管理，通过PCI子系统提供各种PCI设备驱动程序共同的所有通用功能。因此，作为linux系统中的PCI网卡设备，其加载的流程必然分为两个部分：作为PCI设备向PCI子系统注册；作为网卡设备向网络子系统注册。下面也将从两个方面，分析一下网卡驱动在内核加载的两个流程。

PCI设备驱动程序的注册：

PCI设备驱动程序使用pci_register_driver函数完成内核的注册，其定义在include/linux/pci.h文件内

* pci_register_driver must be a macro so that KBUILD_MODNAME can be expanded

#define pci_register_driver(driver) \

__pci_register_driver(driver, THIS_MODULE, KBUILD_MODNAME)

其主要结构是名为driver的pci_driver结构体，

struct pci_driver {

struct list_head node;

const char *name;

/* 驱动所关联的PCI设备 */

const struct pci_device_id *id_table; /* must be non-NULL for probe to be called */

int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */

void (*remove) (struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */

int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */

int (*suspend_late) (struct pci_dev *dev, pm_message_t state);

int (*resume_early) (struct pci_dev *dev);

int (*resume) (struct pci_dev *dev); /* Device woken up */

void (*shutdown) (struct pci_dev *dev);

struct pci_error_handlers *err_handler;

struct device_driver driver;

struct pci_dynids dynids;

};

在pri_driver结构体中有两个比较关键的函数，分别是struct pci_device_id *id_table、以及probe函数。其中id_table是一个id向量表，内核通过其来把一些相关的设备关联到此驱动程序，PCI设备独一无二的识别方式是通过一些参数的组合，包含开发商以及模型等，这些参数都存储在内核的pci_device_id数据结构中：

struct pci_device_id {

__u32 vendor, device; /* Vendor and device ID or PCI_ANY_ID*/

__u32 subvendor, subdevice; /* Subsystem ID's or PCI_ANY_ID */

__u32 class, class_mask; /* (class,subclass,prog-if) triplet */

kernel_ulong_t driver_data; /* Data private to the driver */

对于每一个PCI设备驱动程序，在注册的时候都会把一个pci_device_id实例注册到内核中，这个实例向量就包含了此驱动程序所能处理的所有设备ID。

另外一个probe函数，主要用于当PCI子系统发现它正在搜寻驱动程序的设备ID与前面所提到的id_table匹配，就会调用此函数。对于网络设备而言，此函数会开启硬件、分配net_device结构、初始化并注册新设备。

总的来说，对于所有的PCI设备，在系统引导时，会建立一种数据库，把每个总线都关联一份已侦测并且使用该总线的设备列表。对于PCI设备来说，系统中就存在着这样一个数据库，其中保存着所有使用PCI总线的设备ID，此ID即上文提到的pci_device_id。以下图为例，来说明当设备驱动程序加载时会发生什么。

此时，(a)图就代表着所有使用PCI总线的设备数据库。当设备驱动程序A被加载时，会调用pci_register_driver并提供pci_driver实例与PCI层注册，同时pci_driver结构中包含一个驱动程序所能处理的设备ID表；接着，PCI子系统使用该表去查在已经保存的设备数据库中是否存在匹配，于是会建立该驱动程序的设备列表，如图(b)所示；此外，对每个匹配的设备而言，PCI层会调用相匹配的驱动程序中的pci_driver结构中所提供的probe函数，由probe函数完成必须的操作，如对于网络设备来说，probe函数就会建立并注册相关联的网络设备。

网络设备的注册：

作为一个PCI网络设备，再其完成PCI总线的注册后，剩下的工作就是通过probe函数来完成作为网络设备的注册流程。对于每一个网络设备在内核中都通过一个net_device结构来表示（此结构涉及内容较多，就不再细说），因此网络设备的注册流程从字面上也应该分为两个部分，net_device结构的分配、完成分配到的dev结构向网络子系统的注册。

下面，首先说一下net_device结构的分配，在内核中通过alloc_netdev函数来完成分配

#define alloc_netdev(sizeof_priv, name, setup) \

alloc_netdev_mqs(sizeof_priv, name, setup, 1, 1)

struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,

void (*setup)(struct net_device *),

unsigned int txqs, unsigned int rxqs)

此函数的参数主要有三个：sizeof_priv，私有数据结构的大小，因为作为任一个网络设备都有一个net_device结构，但又由于设备的多样性，很难在dev结构中包含所有必需的字段，因此每一个设备一般都需要一个私有结构来保存一些自己的信息；name，设备名称，此名称可能只是一部分名称，由内核通过某种规则来完成，比如对于以太网设备，只知道是eth设备，但不知道具体的标号，此时可通过传递eth%d作为name参数，由内核完成具体的分配流程；setup，设置函数，此函数用于初始化net_device的部分字段，主要用于一些具有某种共同特性字段的初始化，比如对于以太网设备，其MTU等字段具有相同的值，因此可通过ether_setup来完成公共字段的初始化。

对于Ethernet设备为例，其注册流程如上图所示，首先通过调用alloc_netdev的包裹函数，alloc_etherdev，来完成net_device函数的分配以及部分数据的初始化；接下来，通过netdev_boot_setup_check函数来检查加载内核时用户是否提供了任何引导期间参数；最终，新的设备net_device实例会利用register_netdevice插入至网络设备数据库。

PS：对于系统中的所有net_device数据结构存在于三张表中，一个是全局的列表dev_base，另外两张hash表dev_name_head、dev_index_head。dev_base实例的全局列表能够方便内核浏览设备；dev_name_head是以设备名为索引，方面通过dev的名称搜寻到设备的dev结构；dev_index_head是以设备的ID，dev->ifindex为索引，方面通过设备的ID来搜寻设备的。

对于一个完整的net_device结构的注册，register_netdevice实际上只完成了一部分工作，而将剩余部分的工作让netdev_run_todo予以完成。

对于register_netdev函数的功能主要包括：

int register_netdev(struct net_device *dev)

{

int err;

rtnl_lock();

* If the name is a format string the caller wants us to do a

* name allocation.

if (strchr(dev->name, '%')) {

err = dev_alloc_name(dev, dev->name);

if (err 0)

goto out;

}

/* register_netdev 的注册流程分为两部分*/

/* 1) register_netdevice*/

/* 2) 在rtnl_unlock 解锁操作中，由netdev_run_todo 函数完成*/

err = register_netdevice(dev);

out:

rtnl_unlock();

return err;

}

void rtnl_unlock(void)

/* This fellow will unlock it for us. */

netdev_run_todo();

下面是register_netdevice的函数过程：

int register_netdevice(struct net_device *dev)

int ret;

struct net *net = dev_net(dev);

BUG_ON(dev_boot_phase);

ASSERT_RTNL();

might_sleep();

/* When net_device's are persistent, this will be fatal. */

BUG_ON(dev->reg_state != NETREG_UNINITIALIZED);

BUG_ON(!net);

/* 初始化相关的锁*/

spin_lock_init(&dev->addr_list_lock);

netdev_set_addr_lockdep_class(dev);

dev->iflink = -1;

/* Init, if this function is available */

/* 如果有初始化init 函数则执行init */

if (dev->netdev_ops->ndo_init) {

ret = dev->netdev_ops->ndo_init(dev);

if (ret) {

if (ret > 0)

ret = -EIO;

}

/* 对dev 的名称进行处理*/

/* 如，命名是否合法*/

/* 对诸如eth%d 的情况分配设备名称*/

/* 对已经指定名称的设备在链表中查询是否重名等等*/

ret = dev_get_valid_name(dev, dev->name, 0);

if (ret)

goto err_uninit;

/* 给设备分配一个全局的identifier */

dev->ifindex = dev_new_index(net);

if (dev->iflink == -1)

dev->iflink = dev->ifindex;

/* Fix illegal checksum combinations */

/* 检测一些特性的组合是否合法*/

if ((dev->features & NETIF_F_HW_CSUM) &&

(dev->features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {

printk(KERN_NOTICE "%s: mixed HW and IP checksum settings.\n",

dev->name);

dev->features &= ~(NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);

if ((dev->features & NETIF_F_NO_CSUM) &&

(dev->features & (NETIF_F_HW_CSUM|NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {

printk(KERN_NOTICE "%s: mixed no checksumming and other settings.\n",

dev->features &= ~(NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM|NETIF_F_HW_CSUM);

dev->features = netdev_fix_features(dev->features, dev->name);

/* Enable software GSO if SG is supported. */

if (dev->features & NETIF_F_SG)

dev->features |= NETIF_F_GSO;

/* Enable GRO and NETIF_F_HIGHDMA for vlans by default,

* vlan_dev_init() will do the dev->features check, so these features

* are enabled only if supported by underlying device.

dev->vlan_features |= (NETIF_F_GRO | NETIF_F_HIGHDMA);

/* 调用netdev_chain 通知链，通知其他设备*/

ret = call_netdevice_notifiers(NETDEV_POST_INIT, dev);

ret = notifier_to_errno(ret);

/* 初始化设备相关的kobject，创建相关sysfs */

ret = netdev_register_kobject(dev);

dev->reg_state = NETREG_REGISTERED;

* Default initial state at registry is that the

* device is present.

set_bit(__LINK_STATE_PRESENT, &dev->state);

/* 初始化流控队列*/

dev_init_scheduler(dev);

dev_hold(dev);

/* 分别将netdevice 加入全局链表以及名称hash 表、index索引哈希表*/

list_netdevice(dev);

/* Notify protocols, that a new device appeared. */

ret = call_netdevice_notifiers(NETDEV_REGISTER, dev);

if (ret) {

rollback_registered(dev);

dev->reg_state = NETREG_UNREGISTERED;

* Prevent userspace races by waiting until the network

* device is fully setup before sending notifications.

/* RT netlink 的相关操作*/

if (!dev->rtnl_link_ops ||

dev->rtnl_link_state == RTNL_LINK_INITIALIZED)

rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U);

return ret;

err_uninit:

if (dev->netdev_ops->ndo_uninit)

dev->netdev_ops->ndo_uninit(dev);

goto out;

从简单来说，网卡驱动加载大致流程就已经结束了，不过再实际中，对于一个具体的网卡来说，比如最近接触到的igb、ixgbe网卡驱动，在probe函数中要进行的初始化操作要多的多，涉及到诸多的硬件寄存器配置、队列设置等等。所以呢，这篇文章总结起来只能算作是入门而已。。。

linux网卡驱动的加载流程

继续阅读

《Linux命令行与Shell脚本编程大全第2版.布卢姆》pdf

禁止ubuntu系统弹出报错界面

MySQL的4种隔离级别？出现问题

Ubuntu Linux下Apache的配置文件

XX系统实施过程问题总结

无组件上传图片到数据库中，最完整解决方案

【MySQL数据库】数据库索引事务1.索引2.事务

neo4j之cypher使用文档

NOSQL安全攻击

mybatis_入门程序Mybatis入门

samba服务器的功能

登录plsql 报错 the account is locked --用户被锁

【Linux】UDP广播报文接收速率问题

SequoiaDB巨杉数据库C++驱动概述

Linux设备模型（中）之上层容器

PowerPC平台 Linux移植三