天天看點

硬體文境

    為了控制程序的執行,核心必須有能力挂起正在CPU上運作的程序,并恢複執行以前挂起的某個程序。這種行為被稱為程序切換(process switch)、任務切換或者文境切換。

    To control the execution of processes, the kernel must be able to suspend the execution of the process running on the CPU and resume the execution of some other process previously suspended. This activity goes variously by the names process switch, task switch, or context switch.

  • 硬體文境(Hardware Context)

    盡管每個程序可以擁有屬于自己的位址空間,但所有程序必須共享CPU寄存器。是以,在恢複一個程序之前,核心必須確定每個寄存器裝入了挂起程序時的值。

    While each process can have its own address space, all processes have to share the CPU registers. So before resuming the execution of a process, the kernel must ensure that each such register is loaded with the value it had when the process was suspended.

    程序恢複執行前必須裝入寄存器的一組資料稱為硬體文境(Hardware Context)。硬體文境是程序執行文境的一個子集,因為執行文境包含程序執行時需要的所有資訊。 在Linux中,程序硬體文境的一部分存放在TSS段,而剩餘部分存放在核心态堆棧中。

    The set of data that must be loaded into the registers before the process resumes its execution on the CPU is called the hardware context . The hardware context is a subset of the process execution context, which includes all information needed for the process execution. In Linux, a part of the hardware context of a process is stored in the process descriptor, while the remaining part is saved in the Kernel Mode stack.

    在下面的描述中,我們假定用prev局部變量表示切換出去的程序的描述符,next表示切換進來的程序的描述符。是以, 我們把程序切換定義為這樣的行為:儲存prev硬體文境,用next硬體文境代替prev。因為程序切換經常發生,是以減少儲存和裝入硬體文境所花費的時間是非常重要的。

    In the description that follows, we will assume the prev local variable refers to the process descriptor of the process being switched out and next refers to the one being switched in to replace it. We can thus define a process switch as the activity consisting of saving the hardware context of prev and replacing it with the hardware context of next. Because process switches occur quite often, it is important to minimize the time spent in saving and loading hardware contexts.

    早期的Linux版本利用80x86體系結構所提供的硬體支援,并通過far jmp 指令跳到next程序的TSS段描述符的選擇符執行程序切換。當執行這條指令時,CPU通過自動儲存原來的硬體文境,裝入新的硬體文境來執行硬體文境切換。但是基于一下原因, Linux 2.6使用軟體執行程序切換:

    Old versions of Linux took advantage of the hardware support offered by the 80x86 architecture and performed a process switch through a far jmp instruction to the selector of the Task State Segment Descriptor of the next process. While executing the instruction, the CPU performs a hardware context switch by automatically saving the old hardware context and loading a new one. But Linux 2.6 uses software to perform a process switch for the following reasons:

    通過一組mov指令逐漸切換能較好地控制被裝入資料的合法性。尤其是,這使檢查段寄存器的值成為可能,當用單獨的far jmp指令時,不可能進行這類檢查。

    Step-by-step switching performed through a sequence of mov instructions allows better control over the validity of the data being loaded. In particular, it is possible to check the values of the ds and es segmentation registers, which might have been forged by a malicious user. This type of checking is not possible when using a single far jmp instruction.

    舊方法和新方法所需時間大緻相同。然而,盡管目前的切換代碼還有改進的餘地,卻不能對硬體文境切換進行優化。

    The amount of time required by the old approach and the new approach is about the same. However, it is not possible to optimize a hardware context switch, while there might be room for improving the current switching code.

    程序切換隻發生在核心态。在執行程序切換之前,使用者态程序使用的所有寄存器内容都已儲存,這也包括ss和esp這對寄存器的内容(存儲使用者态堆棧指針的位址)

    Process switching occurs only in Kernel Mode. The contents of all registers used by a process in User Mode have already been saved on the Kernel Mode stack before performing process switching. This includes the contents of the ss and esp pair that specifies the User Mode stack pointer address.

  • Task State Segment(每一CPU有隻有一個TSS)

    The 80x86 architecture includes a specific segment type called the Task State Segment (TSS), to store hardware contexts. Although Linux doesn't use hardware context switches, it is nonetheless forced to set up a TSS for each distinct CPU in the system. This is done for two main reasons:

    When an 80x86 CPU switches from User Mode to Kernel Mode, it fetches the address of the Kernel Mode stack from the TSS.

    When a User Mode process attempts to access an I/O port by means of an in or out instruction, the CPU may need to access an I/O Permission Bitmap stored in the TSS to verify whether the process is allowed to address the port.

    More precisely, when a process executes an in or out I/O instruction in User Mode, the control unit performs the following operations:

    1.It checks the 2-bit IOPL field in the eflags register. If it is set to 3, the control unit executes the I/O instructions. Otherwise, it performs the next check.

    2.It accesses the tr register to determine the current TSS, and thus the proper I/O Permission Bitmap.

    3.It checks the bit of the I/O Permission Bitmap corresponding to the I/O port specified in the I/O instruction. If it is cleared, the instruction is executed; otherwise, the control unit raises a "General protection " exception.

    tss_struct結構描述TSS的格式,init_tss數組為系統上每個不同的CPU存放一個TSS。在每次程序切換時,核心都更新TSS的某些字段以便相應的CPU控制單元可以安全地檢索到它需要的資訊。是以,雖然TSS反應目前在CPU上運作的程序的權限,但是沒有必要在該程序不運作時為該程序儲存整個TSS。

    The tss_struct structure describes the format of the TSS. The init_tss array stores one TSS for each CPU on the system. At each process switch, the kernel updates some fields of the TSS so that the corresponding CPU's control unit may safely retrieve the information it needs. Thus, the TSS reflects the privilege of the current process on the CPU, but there is no need to maintain TSSs for processes when they're not running.

    TSS有它自己的8位元組的段描述符(TSSD).這個描述符包括指向TSS起始位址的32位base字段,20位limit字段。TSSD的s标志位被清0,以表示相應的TSS是系統段

    Each TSS has its own 8-byte Task State Segment Descriptor (TSSD). This descriptor includes a 32-bit Base field that points to the TSS starting address and a 20-bit Limit field. The S flag of a TSSD is cleared to denote the fact that the corresponding TSS is a System Segment.

    Type字段置為11或者9以表示這個字段實際上是一個TSS。在Intel的最初設計中,系統中的每個程序都應當有自己的TSS;tyep字段的第二個有效位叫做Busy位;如果程序正在被CPU執行,則該位為1,否則為0。在Linux設計中,每個CPU隻有一個TSS。是以,Busy位總置為1。

    The Type field is set to either 9 or 11 to denote that the segment is actually a TSS. In the Intel's original design, each process in the system should refer to its own TSS; the second least significant bit of the Type field is called the Busy bit; it is set to 1 if the process is being executed by a CPU, and to 0 otherwise. In Linux design, there is just one TSS for each CPU, so the Busy bit is always set to 1.

    由Linux建立的TSSD存放在全局描述符表(GDT)中,GDT的基位址存放在每個CPU的gdrt寄存器中。 每個CPU的tr寄存器包含TSS的TSSD選擇符,也包含了兩個隐藏的非程式設計字段:TSSD的Base字段和Limit字段。這樣,處理器就能直接對TSS尋址而不用從GDT中檢索TSS的位址

    The TSSDs created by Linux are stored in the Global Descriptor Table (GDT), whose base address is stored in the gdtr register of each CPU. The tr register of each CPU contains the TSSD Selector of the corresponding TSS. The register also includes two hidden, nonprogrammable fields: the Base and Limit fields of the TSSD. In this way, the processor can address the TSS directly without having to retrieve the TSS address from the GDT. 

硬體文境
硬體文境

struct  tss_struct  ... {

硬體文境

    unsigned short  back_link,__blh;

硬體文境

    unsigned long   esp0;

硬體文境

    unsigned short  ss0,__ss0h;

硬體文境

    unsigned long   esp1;

硬體文境

    unsigned short  ss1,__ss1h;

硬體文境

    unsigned long   esp2;

硬體文境

    unsigned short  ss2,__ss2h;

硬體文境

    unsigned long   __cr3;

硬體文境

    unsigned long   eip;

硬體文境

    unsigned long   eflags;

硬體文境

    unsigned long   eax,ecx,edx,ebx;

硬體文境

    unsigned long   esp;

硬體文境

    unsigned long   ebp;

硬體文境

    unsigned long   esi;

硬體文境

    unsigned long   edi;

硬體文境

    unsigned short  es, __esh;

硬體文境

    unsigned short  cs, __csh;

硬體文境

    unsigned short  ss, __ssh;

硬體文境

    unsigned short  ds, __dsh;

硬體文境

    unsigned short  fs, __fsh;

硬體文境

    unsigned short  gs, __gsh;

硬體文境

    unsigned short  ldt, __ldth;

硬體文境

    unsigned short  trace, io_bitmap_base;

硬體文境

    unsigned long   io_bitmap[IO_BITMAP_LONGS + 1];

硬體文境

    unsigned long io_bitmap_max;

硬體文境

    struct thread_struct *io_bitmap_owner;

硬體文境

    unsigned long __cacheline_filler[35];

硬體文境

    unsigned long stack[64];

硬體文境

}  __attribute__((packed));

硬體文境
  • The thread field (每個程序都有一個thread)

struct task_struct {

    ...

    struct thread_struct thread;

    ...

}

    在每次程序切換時,被替換程序的硬體文境必須儲存在别處。不能像Intel最初設計的那樣把它儲存在TSS中,因為Linux中每個CPU隻有一個TSS段,而不是每個CPU都有一個。

    At every process switch, the hardware context of the process being replaced must be saved somewhere. It cannot be saved on the TSS, as in the original Intel design, because Linux uses a single TSS for each processor, instead of one for every process.

    是以,每個程序描述符包含一個類型為thread_struct的thread字段,隻要程序被切換出去,核心就把其硬體文境儲存在這個結構中。

    Thus, each process descriptor includes a field called thread of type thread_struct, in which the kernel saves the hardware context whenever the process is being switched out.

    As we'll see later, this data structure includes fields for most of the CPU registers, except the general-purpose registers such as eax, ebx, etc., which are stored in the Kernel Mode stack.

硬體文境
硬體文境

struct  thread_struct  ... {

硬體文境

    struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES];

硬體文境

    unsigned long   esp0;

硬體文境

    unsigned long   sysenter_cs;

硬體文境

    unsigned long   eip;

硬體文境

    unsigned long   esp;

硬體文境

    unsigned long   fs;

硬體文境

    unsigned long   gs;

硬體文境

    unsigned long   debugreg[8]; 

硬體文境

    unsigned long   cr2, trap_no, error_code;

硬體文境

    union i387_union    i387;

硬體文境

    struct vm86_struct __user * vm86_info;

硬體文境

    unsigned long       screen_bitmap;

硬體文境

    unsigned long       v86flags, v86mask, saved_esp0;

硬體文境

    unsigned int        saved_fs, saved_gs;

硬體文境

    unsigned long   *io_bitmap_ptr;

硬體文境

    unsigned long   io_bitmap_max;

硬體文境

} ; 

繼續閱讀