天天看點

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

Machine-Level Representation of Programs

 3.1 A Historical Perspective

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

幾乎是每5年,以數量級的速度在增長。。。

3.2 Program Encodings

        Suppose we write a C program as two files p1.cand p2.c. We can then compile this code on an IA32 machine using a Unix command line:

        unix> gcc -O1 -o p p1.c p2.c

        The gcc command actually invokes a sequence of programs to turn the source code into executable code.

         First, the C preprocessor expands the source code to include any files specified with #include commands and to expand any macros, specified with #define declarations. 

         Second, the compiler generates assembly-code versions of the two source files having names p1.s and p2.s. Next, the assembler converts the assembly code into binary object-code files p1.o and p2.o. Object code is one form of machine code—it contains binary representations of all of the instructions, but the addresses of global values are not yet filled in.

3.2.1 Machine-Level Code

       Two of these are especially important for machine-level programming.

       First, the format and behavior of a machine-level program is defined by the instruction set architecture , or “ISA,” defining the processor state, the format of the instructions, and the effect each of these instructions will have on the state. 

      Second, the memory addresses used by a machine-level program are vir-tual addresses, providing a memory model that appears to be a very large byte array. 

       The compiler does most of the work in the overall compilation sequence, transforming programs expressed in the relatively abstract execution model provided by C into the very elementary instructions that the processor executes. 

        IA32 machine code differs greatly from the original C code. Parts of the processor state are visible that normally are hidden from the C programmer:

        The program counter(commonly referred to as the “PC,” and called %eip in  IA32) indicates the address in memory of the next instruction to be executed.

        The integer register file contains eight named locations storing 32-bit values.  These registers can hold addresses (corresponding to C pointers) or integer  data. Some registers are used to keep track of critical parts of the program  state, while others are used to hold temporary data, such as the local variables  of a procedure, and the value to be returned by a function.

         The condition code registers hold status information about the most recently  executed arithmetic or logical instruction. These are used to implement conditional changes in the control or data flow, such as is required to implement  if and while statements.

          A set of floating-point registers store floating-point data.

                  The operating system manages this virtual address space, translating virtual addresses into the physical addresses of values in the actual processor memory.

3.2.2 Code Examples

C代碼:

int accum = 0;

int sum(int x, int y)
{
    int  t=x+y;
    accum += t;
    return t;
}
           

相應的彙編形式:

sum:
pushl %ebp
movl  %esp, %ebp
movl  12(%ebp), %eax
addl   8(%ebp), %eax
addl   %eax, accum
popl   %ebp
ret
           

3.3 Data Formats

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

3.4 Accessing Information

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

3.4.1 Operand Specifiers

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

          The first type, immediate , is for constant values.

          The second type, register, denotes the contents of one of the registers, either one of the eight 32-bit registers (e.g., %eax) for a double-word operation, one of the eight 16-bit registers (e.g., %ax ) for a word operation, or one of the eight single-byte register elements (e.g.,%al ) for a byte operation. 

          The third type of operand is a memory reference, in which we access some memory location according to a computed address, often called the effective ad-dress. 

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

3.4.2 Data Movement Instructions

         IA32 imposes the restriction that a move instruction cannot have both operands refer to memory locations. Copying a value from one memory location to another requires two instructions—the first to load the source

value into a register

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

不要嘗試memory to memory 了解下面這個錯誤說明更能透徹的了解關于mov的用法,o(∩_∩)o ~

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

           A stack is a data structure where values can be added or deleted, but only according to a “last-in, first-out”

discipline. We add data to a stack via a push operation and remove it via a pop operation, with the property that the value popped will always be the value that was most recently pushed and is still on the stack

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

3.5 Arithmetic and Logical Operations

        The operations are divided into four groups: load effective address, unary, binary,and shifts

3.5.1 Load Effective Address

       The load effective address instruction leal is actually a variant of the movl instruction. It has the form of an instruction that reads from memory to a register, but it does not reference memory at all. 

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

注意邏輯位移和算術位移的差別。

      The right shift instructions differ in that sar performs an arithmetic shift (fill with copies of the sign bit), whereas shr performs a logical shift (fill with zeros). The destination operand of a shift operation can be either a register or a memory location. We denote the two different right shift operations in Figure 3.7 as >>A (arithmetic) and >>L (logical)

3.5.5 Special Arithmetic Operations

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

        The product is then stored in registers %edx(high-order 32 bits)and %eax(low-order 32 bits). 

3.6 Control

3.6.1 Condition Codes

CF: Carry Flag.      The most recent operation generated a carry out of the most significant bit. Used to detect                                   overflow for unsigned operations. ZF: Zero Flag.         The most recent operation yielded zero. SF: Sign Flag.         The most recent operation yielded a negative value. OF: Overflow Flag. The most recent operation caused a two’s-complement overflow—either negative or                                             positive.
《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

比較的時候要注意防止資料溢出而造成的無意義比較

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

3.6.2 Accessing the Condition Codes

         Rather than reading the condition codes directly, there are three common ways of using the condition codes: (1) we can set a single byte to 0 or 1 depending on some combination of the condition codes, (2) we can conditionally jump to some other part of the program, or (3) we can conditionally transfer data.

aisin %edx,bisin %eax
cmpl %eax, %edx          Compare a:b
setl %al                         Set low order byte of %eax to 0 or 
movzbl %al, %eax         Set remaining bytes of %eax to 0
           

3.6.3 Jump Instructions and Their Encodings

                 A jump instruction can cause the execution to switch to a completely  new position in the program

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

          Indirect jumps are written using ‘*’ followed by an operand specifier using one of the formats described in Section 3.4.1. As examples, the instruction jmp *%eax uses the value in register %eaxas the jump target, and the instruction jmp *(%eax) reads the jump target from memory, using the value in %eax as the read address.

對于if判斷選擇和while循環之類的,都是用goto來實作的。至于具體細節。。。看書吧。。。巨細

這裡是我看goto的時候遇到的難題以及解答 http://blog.csdn.net/cinmyheart/article/details/24430599

3.6.6 Conditional Move Instructions

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

3.7 Procedures

3.7.1 Stack Frame Structure

      The portion of the stack allocated for a single procedure call is called a stack frame . Figure 3.21 diagrams the general structure of a stack frame. The topmost stack frame is delimited by two pointers, with register %ebp serving as the frame pointer , and register %esp serving as the stack pointer . 

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

3.7.2 Transferring Control

Instruction       Description

call         Label Procedure call

call                 *Operand Procedure call

leave         Prepare stack for retur

ret         Return from call

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

3.7.3 Register Usage Conventions

           Although only one procedure can be active at a given time, we must make sure that when one procedure (the caller) calls another (the callee), the callee does not overwrite some register value that the caller planned to use later. 

3.7.4 Procedure Example

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs
int rfact(int n)
{
       int result;
      if (n <= 1)
             result = 1;
      else
             resul t  =  n* rfact(n-1);
      return result;
}
           

相應的彙編代碼:

rfact:
pushl %ebp 		Save old %ebp
movl %esp, 		%ebp Set %ebp as frame pointer
pushl %ebx 		Save callee save register %e
subl $4, %esp 		Allocate 4 bytes on stack
movl 8(%ebp), 	%ebx Get n
movl $1, %eax 	result = 1
cmpl $1, %ebx 	Compare n:1
jle .L53 			 If <=, goto done
leal -1(%ebx), %eax Compute n-1
movl %eax, (%esp)  Store at top of stack
call rfact 			 Call rfact(n-1)
imull %ebx, %eax  	 Compute result = return valu
.L53: done:
addl $4, %esp 		 Deallocate 4 bytes from stac
popl %ebx 		 Restore %ebx
popl %ebp 		 Restore %ebp
ret 				 Return result
           
《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

3.8 Array Allocation and Access

As an example, consider the following structure declaration:

struct rec 
{
	int i;
	int j;
	int a[3];
	int *p;
};
           
《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

3.9.3 Data Alignment

          The IA32 hardware will work correctly regardless of the alignment of data. However, Intel recommends that data be aligned to improve memory system performance. Linux follows an alignment policy where 2-byte data types (e.g.,short ) must have an address that is a multiple of 2, while any larger data types (e.g., int , int*, float , and double) must have an address that is a multiple of 4. Note that this requirement means that the least significant bit of the address of an object of type short must equal zero. Similarly, any object of type int , or any pointer, must be at an address having the low-order 2 bits equal to zero.

對于alignment了解的至關重要的一段話!!

同樣要弄明白下面這個小題目,分析清楚了才能明白alignment

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

         雖然這章沒什麼程式設計的,感覺有蠻大的收獲,但是彙編是從0開始的,重新翻開linux裡面彙程式設計式的時候最起碼不會畏懼了!讓我有自信隻要啃下去,是可以搞定的。不至于一開始很慌神彙編。我覺得這章的彙編講的正的很好。需要一點基礎開始,是以之前看了趙炯的<linux核心剖析>講彙編的部分,看了一點,大概知道,才開始看這章的。花了大概三四天吧。。。感覺很值得。blog并不會和書上一樣很細緻的講學的内容,僅僅是作為自己的回憶筆記。便于自己快速"恢複記憶"

《CS:APP》 chapter 3 Machine-Level Representation of Programs 筆記 Machine-Level Representation of Programs

繼續閱讀