本文翻譯自:Why does Python code run faster in a function?
def main():
for i in xrange(10**8):
pass
main()
This piece of code in Python runs in (Note: The timing is done with the time function in BASH in Linux.)
Python中的這段代碼在其中運作(注意:計時是通過Linux中的BASH中的time函數完成的。)real 0m1.841s
user 0m1.828s
sys 0m0.012s
However, if the for loop isn't placed within a function,
但是,如果for循環未放在函數内,for i in xrange(10**8):
pass
then it runs for a much longer time:
那麼它會運作更長的時間:real 0m4.543s
user 0m4.524s
sys 0m0.012s
Why is this?
為什麼是這樣?#1樓
參考:https://stackoom.com/question/lAQt/為什麼Python代碼在函數中運作得更快
#2樓
Inside a function, the bytecode is
在函數内部,位元組碼為2 0 SETUP_LOOP 20 (to 23)
3 LOAD_GLOBAL 0 (xrange)
6 LOAD_CONST 3 (100000000)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_FAST 0 (i)
3 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 0 (None)
26 RETURN_VALUE
At top level, the bytecode is
在頂層,位元組碼是1 0 SETUP_LOOP 20 (to 23)
3 LOAD_NAME 0 (xrange)
6 LOAD_CONST 3 (100000000)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_NAME 1 (i)
2 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 2 (None)
26 RETURN_VALUE
The difference is that
STORE_FAST
is faster (!) than
STORE_NAME
.
差別在于STORE_FAST
(!)比 STORE_NAME
更快(!)。 This is because in a function,
i
is a local but at toplevel it is a global.
這是因為在函數中,i
是局部變量,但在頂層是全局變量。 To examine bytecode, use the
dis
module .
要檢查位元組碼,請使用dis
子產品 。 I was able to disassemble the function directly, but to disassemble the toplevel code I had to use the
compile
builtin .
我可以直接反彙編該函數,但是要反彙編頂層代碼,我必須使用compile
Builtin 。 #3樓
You might ask why it is faster to store local variables than globals.
您可能會問為什麼存儲局部變量比全局變量更快。This is a CPython implementation detail.
這是CPython實作的細節。Remember that CPython is compiled to bytecode, which the interpreter runs.
請記住,CPython被編譯為解釋器運作的位元組碼。When a function is compiled, the local variables are stored in a fixed-size array ( not a
dict
) and variable names are assigned to indexes.
編譯函數時,局部變量存儲在固定大小的數組( 不是dict
)中,并且變量名稱配置設定給索引。 This is possible because you can't dynamically add local variables to a function.
這是可能的,因為您無法将局部變量動态添加到函數中。Then retrieving a local variable is literally a pointer lookup into the list and a refcount increase on the
PyObject
which is trivial.
然後檢索一個局部變量實際上是對清單的指針查找和對PyObject
的引用計數的增加,這是微不足道的。 Contrast this to a global lookup (
LOAD_GLOBAL
), which is a true
dict
search involving a hash and so on.
将此與全局查找(LOAD_GLOBAL
)進行對比,這是一個真正的 dict
搜尋,涉及哈希等等。 Incidentally, this is why you need to specify
global i
if you want it to be global: if you ever assign to a variable inside a scope, the compiler will issue
STORE_FAST
s for its access unless you tell it not to.
順便說一句,這就是為什麼如果要使其成為全局變量,則需要指定global i
變量 global i
原因:如果您在作用域内配置設定了變量,則編譯器将發出 STORE_FAST
進行通路,除非您告知不 STORE_FAST
。 By the way, global lookups are still pretty optimised.
順便說一句,全局查找仍然非常優化。Attribute lookups
foo.bar
are the really slow ones!
屬性查詢foo.bar
真的很慢! Here is small illustration on local variable efficiency.
這是關于局部變量效率的小插圖 。#4樓
Aside from local/global variable store times, opcode prediction makes the function faster.
除了局部/全局變量存儲時間外, 操作碼預測還使函數運作更快。As the other answers explain, the function uses the
STORE_FAST
opcode in the loop.
正如其他答案所解釋的,該函數在循環中使用STORE_FAST
操作碼。 Here's the bytecode for the function's loop:
這是函數循環的位元組碼:>> 13 FOR_ITER 6 (to 22) # get next value from iterator
16 STORE_FAST 0 (x) # set local variable
19 JUMP_ABSOLUTE 13 # back to FOR_ITER
Normally when a program is run, Python executes each opcode one after the other, keeping track of the a stack and preforming other checks on the stack frame after each opcode is executed.
通常,在運作程式時,Python會依次執行每個操作碼,跟蹤堆棧并在執行每個操作碼後對堆棧幀執行其他檢查。Opcode prediction means that in certain cases Python is able to jump directly to the next opcode, thus avoiding some of this overhead.
操作碼預測意味着在某些情況下,Python能夠直接跳轉到下一個操作碼,進而避免了其中的一些開銷。In this case, every time Python sees
FOR_ITER
(the top of the loop), it will "predict" that
STORE_FAST
is the next opcode it has to execute.
在這種情況下,每次Python看到FOR_ITER
(循環的頂部)時,它将“預測” STORE_FAST
是它必須執行的下一個操作碼。 Python then peeks at the next opcode and, if the prediction was correct, it jumps straight to
STORE_FAST
.
然後,Python窺視下一個操作碼,如果預測正确,它将直接跳轉到STORE_FAST
。 This has the effect of squeezing the two opcodes into a single opcode.
這具有将兩個操作碼壓縮為單個操作碼的效果。On the other hand, the
STORE_NAME
opcode is used in the loop at the global level.
另一方面,在全局級别的循環中使用了STORE_NAME
操作碼。 Python does *not* make similar predictions when it sees this opcode.
看到此操作碼時,Python *不會*做出類似的預測。Instead, it must go back to the top of the evaluation-loop which has obvious implications for the speed at which the loop is executed.
相反,它必須傳回到評估循環的頂部,該循環對循環的執行速度有明顯的影響。To give some more technical detail about this optimization, here's a quote from the
ceval.c
file (the "engine" of Python's virtual machine):
為了提供有關此優化的更多技術細節,以下是ceval.c
檔案(Python虛拟機的“引擎”)的 ceval.c
: Some opcodes tend to come in pairs thus making it possible to predict the second code when the first is run. 一些操作碼往往成對出現,是以可以在運作第一個代碼時預測第二個代碼。 For example,is often followed by
GET_ITER
. 例如,
FOR_ITER
之後通常是
GET_ITER
。 And
FOR_ITER
is often followed by
FOR_ITER
or
STORE_FAST
. 并且
UNPACK_SEQUENCE
後面通常是
FOR_ITER
或
STORE_FAST
。 Verifying the prediction costs a single high-speed test of a register variable against a constant. 驗證預測需要對寄存器變量進行一個針對常數的高速測試。 If the pairing was good, then the processor's own internal branch predication has a high likelihood of success, resulting in a nearly zero-overhead transition to the next opcode. 如果配對良好,則處理器自己的内部分支謂詞成功的可能性很高,進而導緻到下一個操作碼的開銷幾乎為零。 A successful prediction saves a trip through the eval-loop including its two unpredictable branches, the
UNPACK_SEQUENCE
test and the switch-case. 成功的預測可以節省通過評估循環的旅程,該評估循環包括其兩個不可預測的分支,
HAS_ARG
測試和開關情況。 Combined with the processor's internal branch prediction, a successful
HAS_ARG
has the effect of making the two opcodes run as if they were a single new opcode with the bodies combined. 結合處理器的内部分支預測,成功的
PREDICT
可以使兩個操作碼像合并了主體的單個新操作碼一樣運作。
PREDICT
We can see in the source code for the
FOR_ITER
opcode exactly where the prediction for
STORE_FAST
is made:
我們可以在FOR_ITER
操作碼的源代碼中看到準确的 FOR_ITER
預測 STORE_FAST
: case FOR_ITER: // the FOR_ITER opcode case
v = TOP();
x = (*v->ob_type->tp_iternext)(v); // x is the next value from iterator
if (x != NULL) {
PUSH(x); // put x on top of the stack
PREDICT(STORE_FAST); // predict STORE_FAST will follow - success!
PREDICT(UNPACK_SEQUENCE); // this and everything below is skipped
continue;
}
// error-checking and more code for when the iterator ends normally
The
PREDICT
function expands to
if (*next_instr == op) goto PRED_##op
ie we just jump to the start of the predicted opcode.
PREDICT
函數擴充為 if (*next_instr == op) goto PRED_##op
即我們隻是跳轉到預測操作碼的開頭。 In this case, we jump here:
在這種情況下,我們跳到這裡:PREDICTED_WITH_ARG(STORE_FAST);
case STORE_FAST:
v = POP(); // pop x back off the stack
SETLOCAL(oparg, v); // set it as the new local variable
goto fast_next_opcode;
The local variable is now set and the next opcode is up for execution.
現在設定了局部變量,下一個操作碼可以執行了。Python continues through the iterable until it reaches the end, making the successful prediction each time.
Python繼續執行疊代直到到達終點,每次都成功進行預測。The Python wiki page has more information about how CPython's virtual machine works.
Python Wiki頁面包含有關CPython虛拟機如何工作的更多資訊。