Terminology of Logic design
asserted signal: Boolean 1
deasserted signal: Boolean 0
Blocks without memory are called combinational
which is compared to sequential logic
Truth table 真值表
DeMorgan’s theorems
- The sum-of-products representation corresponds to a common structured-logic implementation called a programmable logic array (PLA)
don’t cares: 不定态
A bus is a collection of data lines that is treated together as a single logical signal. Most buses are 64 bits wide.
- A behavioral specification describes how a digital system functionally operates. A structural specification describes the detailed organization of a digital system, usually using a hierarchical description.
- sensitivity list: The list of signals that specifies when an always block should be re-evaluated.
-
blocking assignment: In Verilog, an assignment that completes before the execution of the next statement. While nonblocking assignment indicates an assignment that continues after evaluating the righthand side, assigning the left-hand side the value only after all right-hand
sides are evaluated.
- edge-triggered clocking: A clocking scheme in which all state changes occur on a clock edge.
-
A register file consists of a set of registers that can be read and written by supplying a register number to be
accessed.
- wired AND: 線與
- tristate buffer:三态門 (three states: asserted, deasserted, or high impedance(高阻态))
- finite-state machine: 有限狀态機(FSM)
- 數字電路就是由真值表(表達組合邏輯)和有限狀态機(表達時序邏輯)組成的。
MSB(Most Significant Bit)
LSB(Least Significant Bit)
-
RISCV divides the 32 bits of an instruction into “fields”, because:
regular field size ⇒ \Rightarrow ⇒ simpler hardware
immediates 立即數 ,該數值緊跟在操作碼之後,不進入記憶體
6 Instruction formats
1. R-format: using 3 register inputs
-add, xor, mul -arithmetic/logical ops
2. I-format: instructions with immediates, loads
-addi, lw, jalr, slli
3. S-format: store instructions:sw, sb
4. SB-format: branch instuctions: beq, bge
5. U-format: instructions with upper immediates
Upper immediates is 20-bits
6. UJ-format: jump instructions: jal
Using multiple levels of control can reduce the size of the main control unit. Using several smaller control units may also potentially reduce the latency of the control unit.
· In RISCV, word comprises 32 bits, while a group 64 bits was given the name of doubleword
RISCV架構一般中限定寄存器個數為32個,遵照Smaller is Faster的思想
The desire to keep all instructions the same size conflicts with the desire to have as many registers as possible. Any increase in the number of registers uses up at least one more bit in every register field of the instruction format. Given these constraints and the design principle that smaller is faster, most instruction sets today have 16 or 32 general-purpose registers.
The sign and magnitude representation was soon abandoned because:
1. where to put the sign bit?
2. adders need an extra step to set the sign.
3. positive and negative zero issue.
- The data transfer instruction that copies data from memory to a register is traditionally called load.
- The addresses of sequential doublewords differ by 8.
- Two’s compliment representation 用二進制補碼表示負數:逐位求反,末位加1
- Two’s complement representation has the advantage that all negative numbers have a 1 in the most significant bit. Thus, hardware needs to test only this bit to see if a number is positive or negative (with the number 0 is considered positive). This bit is often called the sign bit.
- represent the most negative value by 00 … 000, and the most positive value by 11 … 11, with 0 typically having the value 10 … 00. This representation is called a biased notation
Representing Instructions in the Computer
- we have a conflict between:
- Desire to keep all instructions the same length.
- Desire to have a single instruction format.
Design Principle: Good design demands good compromises.
The compromise chosen by the RISC-V designers is to keep all instructions the same length, thereby requiring distinct instruction formats for different kinds of instructions.
S-type instructions
- The 12-bit immediate in the S-type format is split into two fields, which supply the lower 5 bits and upper 7 bits. The RISC-V architects chose this design because it keeps the rs1 and rs2 fields in the same place in all instruction formats.
each format is assigned a distinct set of opcode values in the first field (opcode) so that the hardware knows how to treat the rest of the instruction.
the stored-program concept: Programs are stored in memory to be read or written, just like data.
-
case/switch statement use unconditional jump to the certain address. It is more efficiently encoded as a table of addresses of alternative instruction sequences, called a branch address table or branch table, and the program needs only to index into the table and then branch
to the appropriate sequence.
jump-and-link instruction: An instruction that branches to an address and simultaneously saves the address of the following instruction in a register (usually x1 in RISC-V).
#for example:
jal x1, ProcedureAddress
#Where ProcedureAddress called the ==return address== is stored in register x1.
caller: The program that instigates a procedure and provides the necessary parameter values.
callee: A procedure that executes a series of stored instructions based on parameters provided by the caller and then returns control to the caller.
Why using stack(棧):
- x10-x17 are eight parameter registers in which to pass parameters or return values in the convention of RISCV. Suppose a compiler needs more registers for a procedure than the eight argument registers. Since we must cover our tracks after our mission is complete, any registers needed by the caller must be restored to the values that they contained before the procedure was invoked. This situation is an example in which we need to spill registers to memory.
- RISCV has a stack pointer: x2(sp)
- placing data onto the stack is called a push, and removing data from the stack is called a pop.
- By historical precedent, stacks “grow” from higher addresses to lower addresses. This convention means that you push values onto the stack by subtracting from the stack pointer. Adding to the stack pointer shrinks the stack, thereby popping values off the stack.
堆棧的關鍵作用在于避免在procedure調用過程中寄存器之間的conflict:
The caller pushes any argument registers (x10–x17) or temporary registers (x5-x7 and x28-x31) that are needed after the call. The callee pushes the return address register x1 and any saved registers (x8x9 and x18-x27) used by the callee. The stack pointer sp is adjusted to account
for the number of registers placed on the stack.
To avoid saving and restoring a register whose value is never used, which might happen with a temporary register, RISC-V software separates 19 of the registers into two groups:
對于factorial彙程式設計式的分析
C:
參數n存儲于參數寄存器x10中
long long int fact(long long int n)
{
if(n < 1) retrun(1);
else return(n * fact(n - 1));
}
彙編:
fact:
addi sp, sp, -16 //入棧
sd x1, 8(sp) //x1中存儲的傳回位址入棧
sd x10, 0(sp) //x10中存儲的參數n入棧
addi x5, x10, -1 // 計算x5 = n - 1
bge x5, x0, L1 // 如果x5 >= 0, 指令寄存器跳轉至L1
addi x10, x0, 1
addi sp, sp, 16
jalr x0, 0(x1)
L1:
addi x10, x10, -1 //計算n-1, 并更新x10寄存器的值
jal x1, fact //跳轉至fact處,并将此時的位址存入x1
addi x6, x10, 0 //
ld x10, 0(sp)
ld x1, 8(sp)
addi sp, sp, 16
mul x10, x10, x6
jalr x0, 0(x1)
以n = 3為例
1.fact第一次調用,位址f1存入x1并入棧2,n=3入棧1
2.x5存入3-1=2
3.2>=0, 跳轉至L1
4.更新x10為x10 - 1=2, PC跳轉至fact處
5.fact第二次調用, 位址f2存入x1并入棧2, n=3入棧1
6.f1壓棧4,n=3壓棧3
7.x5存入2-1=1
8.1>=0, 跳轉至L1
9.更新x10為x10 - 1=1, PC跳轉至fact處
10.fact第三次調用, 位址f3存入x1并入棧2, n=1入棧1
11.f1壓棧6,n=3壓棧5,f2壓棧4,n=2壓棧3
12.x5存入1-1=0
13.1>=0, 跳轉至L1
14.更新x10為x10 - 1=0, PC跳轉至fact處
15.fact第四次調用, 位址f4存入x1并入棧2, n=0入棧1
16.f1壓棧8,n=3壓棧7,f2壓棧6,n=2壓棧5,f3壓棧4, n=1壓棧3
17.x5存入0-1=-1, 不跳轉L1
18.x10存入1, 棧1、2pop, f1出棧6,n=3出棧5,f2出棧4,n=2出棧3,f3出棧2, n=1出棧1
19.程式跳轉回此時x1存儲的位址f4
20.拷貝x10的值進入x6中,x6=1
21.棧1出棧至x10, 即x10=1; 棧2出棧至x1, 即位址f3;
22.f1出棧4,n=3出棧3,f2出棧2, n=2出棧1
23.更新x10的值為1*1
24.程式跳轉回此時x1存儲的位址f3
25.拷貝x10的值進入x6中,x6=1
26.棧1出棧至x10, 即x10=2; 棧2出棧至x1, 即位址f2;
27.f1出棧2,n=3出棧1
28.更新x10的值為1*1*2
29.程式跳轉回此時x1存儲的位址f2
30.拷貝x10的值進入x6中,x6=2
31.棧1出棧至x10, 即x10=3; 棧2出棧至x1, 即位址f3;
32.更新x10的值為1*1*2*3
33.程式跳轉回此時x1存儲的位址f1
C語言中的變量一般表達記憶體中的一個位址
- C has two storage classes: automatic and static. Automatic variables are local to a procedure and are discarded when the procedure exits. Static variables exist across exits from and entries to procedures. C variables declared outside all procedures are considered static, as are any variables declared using the keyword static. The rest are automatic. To simplify access to static data, some RISC-V compilers reserve a register x3 for use as the global pointer, or gp.
The stack starts in the high end of the user addresses space
stack is also used to store variables that are local to the procedure but do not fit in registers, such as local arrays or structures
procedure frame or activation record:
The segment of the stack containing a procedure’s saved registers and local variables
frame pointer: A value denoting the location of the saved registers and local variables for a given procedure.
frame pointer VS. stack pointer:
The stack pointer operates on the stack. The frame pointer operates on the frame. Very often, the frame is located in the stack, but it ain’t necessarily so.
Graphics and Computing GPUs
- The major driving force for improving graphics processing was the computer game industry. Many programmers of scientific and multimedia applications today are pondering whether to use GPUs or CPUs.
-
heterogeneous systems: A system combining different processor types.
Trends:
GPUs and their associated drivers implement the OpenGL and DirectX models of graphics processing.
- OpenGL is an open standard for 3D graphics programming available for most computer.
-
DirectX is a series of Microsoft multimedia programming interfaces.
These API(application programming interfaces) have well-defined behavior.
- visual computing.
- GPU evolves into scalable parallel processor.
-
In the GeForce 8-series generation of GPUs, the geometry, vertex, and pixel processing all run on the same type of processor. This unification allows for dramatic scalability. ⇓ \Downarrow ⇓
A new model of programming for the GPU requires
Compute Unified Device Architecture(CUDA) is a scalable parallel programming model and software platform for the GPU and other parallel processors that allows the programmer to bypass the graphics API and graphics interfaces of the GPU and simply program in C or C++.
-
The CUDA programming model has an SPMD (single-program multiple data) software style, in which a programmer writes a program for one thread that is instanced and executed by many threads in parallel
on the multiple processors of the GPU.
With CUDA and GPU computing, it is now possible to use the GPU as both a graphics processor and a computing processor at the same time