為什麼Python代碼在函數中運作得更快？

本文翻譯自：Why does Python code run faster in a function?

def main():
    for i in xrange(10**8):
        pass
main()

This piece of code in Python runs in (Note: The timing is done with the time function in BASH in Linux.)

Python中的這段代碼在其中運作（注意：計時是通過Linux中的BASH中的time函數完成的。）

real    0m1.841s
user    0m1.828s
sys     0m0.012s

However, if the for loop isn't placed within a function,

但是，如果for循環未放在函數内，

for i in xrange(10**8):
    pass

then it runs for a much longer time:

那麼它會運作更長的時間：

real    0m4.543s
user    0m4.524s
sys     0m0.012s

Why is this?

為什麼是這樣？

#1樓

參考：https://stackoom.com/question/lAQt/為什麼Python代碼在函數中運作得更快

#2樓

Inside a function, the bytecode is

在函數内部，位元組碼為

2           0 SETUP_LOOP              20 (to 23)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_CONST               3 (100000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_FAST               0 (i)

  3          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK           
        >>   23 LOAD_CONST               0 (None)
             26 RETURN_VALUE

At top level, the bytecode is

在頂層，位元組碼是

1           0 SETUP_LOOP              20 (to 23)
              3 LOAD_NAME                0 (xrange)
              6 LOAD_CONST               3 (100000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_NAME               1 (i)

  2          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK           
        >>   23 LOAD_CONST               2 (None)
             26 RETURN_VALUE

The difference is that

STORE_FAST

is faster (!) than

STORE_NAME

差別在于 STORE_FAST （！）比 STORE_NAME 更快（！）。

This is because in a function,

is a local but at toplevel it is a global.

這是因為在函數中， i 是局部變量，但在頂層是全局變量。

To examine bytecode, use the

dis

module .

要檢查位元組碼，請使用 dis 子產品。

I was able to disassemble the function directly, but to disassemble the toplevel code I had to use the

compile

builtin .

我可以直接反彙編該函數，但是要反彙編頂層代碼，我必須使用 compile Builtin 。

#3樓

You might ask why it is faster to store local variables than globals.

您可能會問為什麼存儲局部變量比全局變量更快。

This is a CPython implementation detail.

這是CPython實作的細節。

Remember that CPython is compiled to bytecode, which the interpreter runs.

請記住，CPython被編譯為解釋器運作的位元組碼。

When a function is compiled, the local variables are stored in a fixed-size array ( not a

dict

) and variable names are assigned to indexes.

編譯函數時，局部變量存儲在固定大小的數組（不是 dict ）中，并且變量名稱配置設定給索引。

This is possible because you can't dynamically add local variables to a function.

這是可能的，因為您無法将局部變量動态添加到函數中。

Then retrieving a local variable is literally a pointer lookup into the list and a refcount increase on the

PyObject

which is trivial.

然後檢索一個局部變量實際上是對清單的指針查找和對 PyObject 的引用計數的增加，這是微不足道的。

Contrast this to a global lookup (

LOAD_GLOBAL

), which is a true

dict

search involving a hash and so on.

将此與全局查找（ LOAD_GLOBAL ）進行對比，這是一個真正的 dict 搜尋，涉及哈希等等。

Incidentally, this is why you need to specify

global i

if you want it to be global: if you ever assign to a variable inside a scope, the compiler will issue

STORE_FAST

s for its access unless you tell it not to.

順便說一句，這就是為什麼如果要使其成為全局變量，則需要指定 global i 變量 global i 原因：如果您在作用域内配置設定了變量，則編譯器将發出 STORE_FAST 進行通路，除非您告知不 STORE_FAST 。

By the way, global lookups are still pretty optimised.

順便說一句，全局查找仍然非常優化。

Attribute lookups

foo.bar

are the really slow ones!

屬性查詢 foo.bar 真的很慢！

Here is small illustration on local variable efficiency.

這是關于局部變量效率的小插圖。

#4樓

Aside from local/global variable store times, opcode prediction makes the function faster.

除了局部/全局變量存儲時間外，操作碼預測還使函數運作更快。

As the other answers explain, the function uses the

STORE_FAST

opcode in the loop.

正如其他答案所解釋的，該函數在循環中使用 STORE_FAST 操作碼。

Here's the bytecode for the function's loop:

這是函數循環的位元組碼：

>>   13 FOR_ITER                 6 (to 22)   # get next value from iterator
         16 STORE_FAST               0 (x)       # set local variable
         19 JUMP_ABSOLUTE           13           # back to FOR_ITER

Normally when a program is run, Python executes each opcode one after the other, keeping track of the a stack and preforming other checks on the stack frame after each opcode is executed.

通常，在運作程式時，Python會依次執行每個操作碼，跟蹤堆棧并在執行每個操作碼後對堆棧幀執行其他檢查。

Opcode prediction means that in certain cases Python is able to jump directly to the next opcode, thus avoiding some of this overhead.

操作碼預測意味着在某些情況下，Python能夠直接跳轉到下一個操作碼，進而避免了其中的一些開銷。

In this case, every time Python sees

FOR_ITER

(the top of the loop), it will "predict" that

STORE_FAST

is the next opcode it has to execute.

在這種情況下，每次Python看到 FOR_ITER （循環的頂部）時，它将“預測” STORE_FAST 是它必須執行的下一個操作碼。

Python then peeks at the next opcode and, if the prediction was correct, it jumps straight to

STORE_FAST

然後，Python窺視下一個操作碼，如果預測正确，它将直接跳轉到 STORE_FAST 。

This has the effect of squeezing the two opcodes into a single opcode.

這具有将兩個操作碼壓縮為單個操作碼的效果。

On the other hand, the

STORE_NAME

opcode is used in the loop at the global level.

另一方面，在全局級别的循環中使用了 STORE_NAME 操作碼。

Python does *not* make similar predictions when it sees this opcode.

看到此操作碼時，Python *不會*做出類似的預測。

Instead, it must go back to the top of the evaluation-loop which has obvious implications for the speed at which the loop is executed.

相反，它必須傳回到評估循環的頂部，該循環對循環的執行速度有明顯的影響。

To give some more technical detail about this optimization, here's a quote from the

ceval.c

file (the "engine" of Python's virtual machine):

為了提供有關此優化的更多技術細節，以下是 ceval.c 檔案（Python虛拟機的“引擎”）的 ceval.c ：

Some opcodes tend to come in pairs thus making it possible to predict the second code when the first is run. 一些操作碼往往成對出現，是以可以在運作第一個代碼時預測第二個代碼。 For example, GET_ITER is often followed by FOR_ITER . 例如， GET_ITER 之後通常是 FOR_ITER 。 And FOR_ITER is often followed by STORE_FAST or UNPACK_SEQUENCE . 并且 FOR_ITER 後面通常是 STORE_FAST 或 UNPACK_SEQUENCE 。 Verifying the prediction costs a single high-speed test of a register variable against a constant. 驗證預測需要對寄存器變量進行一個針對常數的高速測試。 If the pairing was good, then the processor's own internal branch predication has a high likelihood of success, resulting in a nearly zero-overhead transition to the next opcode. 如果配對良好，則處理器自己的内部分支謂詞成功的可能性很高，進而導緻到下一個操作碼的開銷幾乎為零。 A successful prediction saves a trip through the eval-loop including its two unpredictable branches, the HAS_ARG test and the switch-case. 成功的預測可以節省通過評估循環的旅程，該評估循環包括其兩個不可預測的分支， HAS_ARG 測試和開關情況。 Combined with the processor's internal branch prediction, a successful PREDICT has the effect of making the two opcodes run as if they were a single new opcode with the bodies combined. 結合處理器的内部分支預測，成功的 PREDICT 可以使兩個操作碼像合并了主體的單個新操作碼一樣運作。

We can see in the source code for the

FOR_ITER

opcode exactly where the prediction for

STORE_FAST

is made:

我們可以在 FOR_ITER 操作碼的源代碼中看到準确的 FOR_ITER 預測 STORE_FAST ：

case FOR_ITER:                         // the FOR_ITER opcode case
    v = TOP();
    x = (*v->ob_type->tp_iternext)(v); // x is the next value from iterator
    if (x != NULL) {                     
        PUSH(x);                       // put x on top of the stack
        PREDICT(STORE_FAST);           // predict STORE_FAST will follow - success!
        PREDICT(UNPACK_SEQUENCE);      // this and everything below is skipped
        continue;
    }
    // error-checking and more code for when the iterator ends normally

The

PREDICT

function expands to

if (*next_instr == op) goto PRED_##op

ie we just jump to the start of the predicted opcode.

PREDICT 函數擴充為 if (*next_instr == op) goto PRED_##op 即我們隻是跳轉到預測操作碼的開頭。

In this case, we jump here:

在這種情況下，我們跳到這裡：

PREDICTED_WITH_ARG(STORE_FAST);
case STORE_FAST:
    v = POP();                     // pop x back off the stack
    SETLOCAL(oparg, v);            // set it as the new local variable
    goto fast_next_opcode;

The local variable is now set and the next opcode is up for execution.

現在設定了局部變量，下一個操作碼可以執行了。

Python continues through the iterable until it reaches the end, making the successful prediction each time.

Python繼續執行疊代直到到達終點，每次都成功進行預測。

The Python wiki page has more information about how CPython's virtual machine works.

Python Wiki頁面包含有關CPython虛拟機如何工作的更多資訊。

為什麼Python代碼在函數中運作得更快？

#1樓

#2樓

#3樓

#4樓

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入