天天看點

為什麼Python代碼在函數中運作得更快?

本文翻譯自:Why does Python code run faster in a function?

def main():
    for i in xrange(10**8):
        pass
main()
           

This piece of code in Python runs in (Note: The timing is done with the time function in BASH in Linux.)

Python中的這段代碼在其中運作(注意:計時是通過Linux中的BASH中的time函數完成的。)
real    0m1.841s
user    0m1.828s
sys     0m0.012s
           

However, if the for loop isn't placed within a function,

但是,如果for循環未放在函數内,
for i in xrange(10**8):
    pass
           

then it runs for a much longer time:

那麼它會運作更長的時間:
real    0m4.543s
user    0m4.524s
sys     0m0.012s
           

Why is this?

為什麼是這樣?

#1樓

參考:https://stackoom.com/question/lAQt/為什麼Python代碼在函數中運作得更快

#2樓

Inside a function, the bytecode is

在函數内部,位元組碼為
2           0 SETUP_LOOP              20 (to 23)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_CONST               3 (100000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_FAST               0 (i)

  3          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK           
        >>   23 LOAD_CONST               0 (None)
             26 RETURN_VALUE        
           

At top level, the bytecode is

在頂層,位元組碼是
1           0 SETUP_LOOP              20 (to 23)
              3 LOAD_NAME                0 (xrange)
              6 LOAD_CONST               3 (100000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_NAME               1 (i)

  2          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK           
        >>   23 LOAD_CONST               2 (None)
             26 RETURN_VALUE        
           

The difference is that

STORE_FAST

is faster (!) than

STORE_NAME

.

差別在于

STORE_FAST

(!)比

STORE_NAME

更快(!)。

This is because in a function,

i

is a local but at toplevel it is a global.

這是因為在函數中,

i

是局部變量,但在頂層是全局變量。

To examine bytecode, use the

dis

module .

要檢查位元組碼,請使用

dis

子產品 。

I was able to disassemble the function directly, but to disassemble the toplevel code I had to use the

compile

builtin .

我可以直接反彙編該函數,但是要反彙編頂層代碼,我必須使用

compile

Builtin 。

#3樓

You might ask why it is faster to store local variables than globals.

您可能會問為什麼存儲局部變量比全局變量更快。

This is a CPython implementation detail.

這是CPython實作的細節。

Remember that CPython is compiled to bytecode, which the interpreter runs.

請記住,CPython被編譯為解釋器運作的位元組碼。

When a function is compiled, the local variables are stored in a fixed-size array ( not a

dict

) and variable names are assigned to indexes.

編譯函數時,局部變量存儲在固定大小的數組( 不是

dict

)中,并且變量名稱配置設定給索引。

This is possible because you can't dynamically add local variables to a function.

這是可能的,因為您無法将局部變量動态添加到函數中。

Then retrieving a local variable is literally a pointer lookup into the list and a refcount increase on the

PyObject

which is trivial.

然後檢索一個局部變量實際上是對清單的指針查找和對

PyObject

的引用計數的增加,這是微不足道的。

Contrast this to a global lookup (

LOAD_GLOBAL

), which is a true

dict

search involving a hash and so on.

将此與全局查找(

LOAD_GLOBAL

)進行對比,這是一個真正的

dict

搜尋,涉及哈希等等。

Incidentally, this is why you need to specify

global i

if you want it to be global: if you ever assign to a variable inside a scope, the compiler will issue

STORE_FAST

s for its access unless you tell it not to.

順便說一句,這就是為什麼如果要使其成為全局變量,則需要指定

global i

變量

global i

原因:如果您在作用域内配置設定了變量,則編譯器将發出

STORE_FAST

進行通路,除非您告知不

STORE_FAST

By the way, global lookups are still pretty optimised.

順便說一句,全局查找仍然非常優化。

Attribute lookups

foo.bar

are the really slow ones!

屬性查詢

foo.bar

真的很慢!

Here is small illustration on local variable efficiency.

這是關于局部變量效率的小插圖 。

#4樓

Aside from local/global variable store times, opcode prediction makes the function faster.

除了局部/全局變量存儲時間外, 操作碼預測還使函數運作更快。

As the other answers explain, the function uses the

STORE_FAST

opcode in the loop.

正如其他答案所解釋的,該函數在循環中使用

STORE_FAST

操作碼。

Here's the bytecode for the function's loop:

這是函數循環的位元組碼:
>>   13 FOR_ITER                 6 (to 22)   # get next value from iterator
         16 STORE_FAST               0 (x)       # set local variable
         19 JUMP_ABSOLUTE           13           # back to FOR_ITER
           

Normally when a program is run, Python executes each opcode one after the other, keeping track of the a stack and preforming other checks on the stack frame after each opcode is executed.

通常,在運作程式時,Python會依次執行每個操作碼,跟蹤堆棧并在執行每個操作碼後對堆棧幀執行其他檢查。

Opcode prediction means that in certain cases Python is able to jump directly to the next opcode, thus avoiding some of this overhead.

操作碼預測意味着在某些情況下,Python能夠直接跳轉到下一個操作碼,進而避免了其中的一些開銷。

In this case, every time Python sees

FOR_ITER

(the top of the loop), it will "predict" that

STORE_FAST

is the next opcode it has to execute.

在這種情況下,每次Python看到

FOR_ITER

(循環的頂部)時,它将“預測”

STORE_FAST

是它必須執行的下一個操作碼。

Python then peeks at the next opcode and, if the prediction was correct, it jumps straight to

STORE_FAST

.

然後,Python窺視下一個操作碼,如果預測正确,它将直接跳轉到

STORE_FAST

This has the effect of squeezing the two opcodes into a single opcode.

這具有将兩個操作碼壓縮為單個操作碼的效果。

On the other hand, the

STORE_NAME

opcode is used in the loop at the global level.

另一方面,在全局級别的循環中使用了

STORE_NAME

操作碼。

Python does *not* make similar predictions when it sees this opcode.

看到此操作碼時,Python *不會*做出類似的預測。

Instead, it must go back to the top of the evaluation-loop which has obvious implications for the speed at which the loop is executed.

相反,它必須傳回到評估循環的頂部,該循環對循環的執行速度有明顯的影響。

To give some more technical detail about this optimization, here's a quote from the

ceval.c

file (the "engine" of Python's virtual machine):

為了提供有關此優化的更多技術細節,以下是

ceval.c

檔案(Python虛拟機的“引擎”)的

ceval.c

Some opcodes tend to come in pairs thus making it possible to predict the second code when the first is run. 一些操作碼往往成對出現,是以可以在運作第一個代碼時預測第二個代碼。 For example,

GET_ITER

is often followed by

FOR_ITER

. 例如,

GET_ITER

之後通常是

FOR_ITER

And

FOR_ITER

is often followed by

STORE_FAST

or

UNPACK_SEQUENCE

. 并且

FOR_ITER

後面通常​​是

STORE_FAST

UNPACK_SEQUENCE

Verifying the prediction costs a single high-speed test of a register variable against a constant. 驗證預測需要對寄存器變量進行一個針對常數的高速測試。 If the pairing was good, then the processor's own internal branch predication has a high likelihood of success, resulting in a nearly zero-overhead transition to the next opcode. 如果配對良好,則處理器自己的内部分支謂詞成功的可能性很高,進而導緻到下一個操作碼的開銷幾乎為零。 A successful prediction saves a trip through the eval-loop including its two unpredictable branches, the

HAS_ARG

test and the switch-case. 成功的預測可以節省通過評估循環的旅程,該評估循環包括其兩個不可預測的分支,

HAS_ARG

測試和開關情況。
Combined with the processor's internal branch prediction, a successful

PREDICT

has the effect of making the two opcodes run as if they were a single new opcode with the bodies combined. 結合處理器的内部分支預測,成功的

PREDICT

可以使兩個操作碼像合并了主體的單個新操作碼一樣運作。

We can see in the source code for the

FOR_ITER

opcode exactly where the prediction for

STORE_FAST

is made:

我們可以在

FOR_ITER

操作碼的源代碼中看到準确的

FOR_ITER

預測

STORE_FAST

case FOR_ITER:                         // the FOR_ITER opcode case
    v = TOP();
    x = (*v->ob_type->tp_iternext)(v); // x is the next value from iterator
    if (x != NULL) {                     
        PUSH(x);                       // put x on top of the stack
        PREDICT(STORE_FAST);           // predict STORE_FAST will follow - success!
        PREDICT(UNPACK_SEQUENCE);      // this and everything below is skipped
        continue;
    }
    // error-checking and more code for when the iterator ends normally                                     
           

The

PREDICT

function expands to

if (*next_instr == op) goto PRED_##op

ie we just jump to the start of the predicted opcode.

PREDICT

函數擴充為

if (*next_instr == op) goto PRED_##op

即我們隻是跳轉到預測操作碼的開頭。

In this case, we jump here:

在這種情況下,我們跳到這裡:
PREDICTED_WITH_ARG(STORE_FAST);
case STORE_FAST:
    v = POP();                     // pop x back off the stack
    SETLOCAL(oparg, v);            // set it as the new local variable
    goto fast_next_opcode;
           

The local variable is now set and the next opcode is up for execution.

現在設定了局部變量,下一個操作碼可以執行了。

Python continues through the iterable until it reaches the end, making the successful prediction each time.

Python繼續執行疊代直到到達終點,每次都成功進行預測。

The Python wiki page has more information about how CPython's virtual machine works.

Python Wiki頁面包含有關CPython虛拟機如何工作的更多資訊。