golang 1.8工具鍊改進
本文是關于1.8開發周期中Go工具鍊改進的進度報告。
現在已經是11月了,golang1.8隻剩最後的幾個修改清單,剩餘的會移到1.9釋出,1.8版本于2017年2月釋出。
對于本系列的更多内容,請閱讀我上一篇關于9月份的Go 1.8工具鍊改進的文章,以及我在1.7版對Go工具鍊改進的文章。
更快編譯
自從2015年8月釋出的Go 1.5以來,編譯時間顯着長于Go 1.4。 解決這種緩慢的工作開始于Go 1.7,并且仍在進行。
Robert Griesemer和Matthew Dempsky緻力于重寫一個更快的解析器,并删除了許多從以前版本遺留的基于yacc的解析器繼承的包級别的變量。 這個解析器産生一個新的抽象文法樹,而編譯器的其餘部分需先前的yacc文法樹。 對于1.8,新的解析器必須将其輸出轉換為先前的文法樹,以供編譯器的其餘部分使用。 即使使用這個額外的轉換步驟,新的解析器也不比以前的版本慢,并且計劃在Go 1.9中删除這個轉換過程。
Compile time for full build relative to Go 1.4.3
Go 1.8的目标是比1.7平均減少15%的編譯時間。 與兩個月前報告的3-5%的改進相比,還需要作出更多努力。
注意:jujud,kube-controller-manager和gogs的基準腳本線上。 請自己嘗試并報告您的發現。
代碼生成提速
go1.7周期的主要特點是64位Intel的新SSA後端。 在Go 1.8中,SSA後端已經推廣到Go支援的所有其他架構,并且舊的後端代碼已被删除。
amd64憑借是最受歡迎的生産架構,一直是新特性最快支援的。 正如我在幾個月前報道的,比較Go 1.8和Go 1.7在英特爾架構上的結果顯示,代碼生成,轉義分析和對std庫的優化有很大改進。
name old time/op new time/op delta
BinaryTree17-4 3.04s ± 1% 3.03s ± 0% ~ (p=0.222 n=5+5)
Fannkuch11-4 3.27s ± 0% 3.39s ± 1% +3.74% (p=0.008 n=5+5)
FmtFprintfEmpty-4 60.0ns ± 3% 58.3ns ± 1% -2.70% (p=0.008 n=5+5)
FmtFprintfString-4 177ns ± 2% 164ns ± 2% -7.47% (p=0.008 n=5+5)
FmtFprintfInt-4 169ns ± 2% 157ns ± 1% -7.22% (p=0.008 n=5+5)
FmtFprintfIntInt-4 264ns ± 1% 243ns ± 1% -8.10% (p=0.008 n=5+5)
FmtFprintfPrefixedInt-4 254ns ± 2% 244ns ± 1% -4.02% (p=0.008 n=5+5)
FmtFprintfFloat-4 357ns ± 1% 348ns ± 2% -2.35% (p=0.032 n=5+5)
FmtManyArgs-4 1.10µs ± 1% 0.97µs ± 1% -11.03% (p=0.008 n=5+5)
GobDecode-4 9.85ms ± 1% 9.31ms ± 1% -5.51% (p=0.008 n=5+5)
GobEncode-4 8.75ms ± 1% 8.17ms ± 1% -6.67% (p=0.008 n=5+5)
Gzip-4 282ms ± 0% 289ms ± 1% +2.32% (p=0.008 n=5+5)
Gunzip-4 50.9ms ± 1% 51.7ms ± 0% +1.67% (p=0.008 n=5+5)
HTTPClientServer-4 195µs ± 1% 196µs ± 1% ~ (p=0.095 n=5+5)
JSONEncode-4 21.6ms ± 6% 19.8ms ± 3% -8.37% (p=0.008 n=5+5)
JSONDecode-4 70.2ms ± 3% 71.0ms ± 1% ~ (p=0.310 n=5+5)
Mandelbrot200-4 5.20ms ± 0% 4.73ms ± 1% -9.05% (p=0.008 n=5+5)
GoParse-4 4.38ms ± 3% 4.28ms ± 2% ~ (p=0.056 n=5+5)
RegexpMatchEasy0_32-4 96.7ns ± 2% 98.1ns ± 0% ~ (p=0.127 n=5+5)
RegexpMatchEasy0_1K-4 311ns ± 1% 313ns ± 0% ~ (p=0.214 n=5+5)
RegexpMatchEasy1_32-4 97.9ns ± 2% 89.8ns ± 2% -8.33% (p=0.008 n=5+5)
RegexpMatchEasy1_1K-4 519ns ± 0% 510ns ± 2% -1.70% (p=0.040 n=5+5)
RegexpMatchMedium_32-4 158ns ± 2% 146ns ± 0% -7.71% (p=0.016 n=5+4)
RegexpMatchMedium_1K-4 46.3µs ± 1% 47.8µs ± 2% +3.12% (p=0.008 n=5+5)
RegexpMatchHard_32-4 2.53µs ± 3% 2.46µs ± 0% -2.91% (p=0.008 n=5+5)
RegexpMatchHard_1K-4 76.1µs ± 0% 74.5µs ± 2% -2.12% (p=0.008 n=5+5)
Revcomp-4 563ms ± 2% 531ms ± 1% -5.78% (p=0.008 n=5+5)
Template-4 86.7ms ± 1% 82.2ms ± 1% -5.16% (p=0.008 n=5+5)
TimeParse-4 433ns ± 3% 399ns ± 4% -7.90% (p=0.008 n=5+5)
TimeFormat-4 467ns ± 2% 430ns ± 1% -7.76% (p=0.008 n=5+5)
name old speed new speed delta
GobDecode-4 77.9MB/s ± 1% 82.5MB/s ± 1% +5.84% (p=0.008 n=5+5)
GobEncode-4 87.7MB/s ± 1% 94.0MB/s ± 1% +7.15% (p=0.008 n=5+5)
Gzip-4 68.8MB/s ± 0% 67.2MB/s ± 1% -2.27% (p=0.008 n=5+5)
Gunzip-4 381MB/s ± 1% 375MB/s ± 0% -1.65% (p=0.008 n=5+5)
JSONEncode-4 89.9MB/s ± 5% 98.1MB/s ± 3% +9.11% (p=0.008 n=5+5)
JSONDecode-4 27.6MB/s ± 3% 27.3MB/s ± 1% ~ (p=0.310 n=5+5)
GoParse-4 13.2MB/s ± 3% 13.5MB/s ± 2% ~ (p=0.056 n=5+5)
RegexpMatchEasy0_32-4 331MB/s ± 2% 326MB/s ± 0% ~ (p=0.151 n=5+5)
RegexpMatchEasy0_1K-4 3.29GB/s ± 1% 3.27GB/s ± 0% ~ (p=0.222 n=5+5)
RegexpMatchEasy1_32-4 327MB/s ± 2% 357MB/s ± 2% +9.20% (p=0.008 n=5+5)
RegexpMatchEasy1_1K-4 1.97GB/s ± 0% 2.01GB/s ± 2% +1.76% (p=0.032 n=5+5)
RegexpMatchMedium_32-4 6.31MB/s ± 2% 6.83MB/s ± 1% +8.31% (p=0.008 n=5+5)
RegexpMatchMedium_1K-4 22.1MB/s ± 1% 21.4MB/s ± 2% -3.01% (p=0.008 n=5+5)
RegexpMatchHard_32-4 12.6MB/s ± 3% 13.0MB/s ± 0% +2.98% (p=0.008 n=5+5)
RegexpMatchHard_1K-4 13.4MB/s ± 0% 13.7MB/s ± 2% +2.19% (p=0.008 n=5+5)
Revcomp-4 451MB/s ± 2% 479MB/s ± 1% +6.12% (p=0.008 n=5+5)
Template-4 22.4MB/s ± 1% 23.6MB/s ± 1% +5.43% (p=0.008 n=5+5)
從切換到SSA後端的重大改進出現在非intel體系結構上。 這裡是Arm64的結果:
name old time/op new time/op delta
BinaryTree17-8 10.6s ± 0% 8.1s ± 1% -23.62% (p=0.016 n=4+5)
Fannkuch11-8 9.19s ± 0% 5.95s ± 0% -35.27% (p=0.008 n=5+5)
FmtFprintfEmpty-8 136ns ± 0% 118ns ± 1% -13.53% (p=0.008 n=5+5)
FmtFprintfString-8 472ns ± 1% 331ns ± 1% -29.82% (p=0.008 n=5+5)
FmtFprintfInt-8 388ns ± 3% 273ns ± 0% -29.61% (p=0.008 n=5+5)
FmtFprintfIntInt-8 640ns ± 2% 438ns ± 0% -31.61% (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8 580ns ± 0% 423ns ± 0% -27.09% (p=0.008 n=5+5)
FmtFprintfFloat-8 823ns ± 0% 613ns ± 1% -25.57% (p=0.008 n=5+5)
FmtManyArgs-8 2.69µs ± 0% 1.96µs ± 0% -27.12% (p=0.016 n=4+5)
GobDecode-8 24.4ms ± 0% 17.3ms ± 0% -28.88% (p=0.008 n=5+5)
GobEncode-8 18.6ms ± 0% 15.1ms ± 1% -18.65% (p=0.008 n=5+5)
Gzip-8 1.20s ± 0% 0.74s ± 0% -38.02% (p=0.008 n=5+5)
Gunzip-8 190ms ± 0% 130ms ± 0% -31.73% (p=0.008 n=5+5)
HTTPClientServer-8 205µs ± 1% 166µs ± 2% -19.27% (p=0.008 n=5+5)
JSONEncode-8 50.7ms ± 0% 41.5ms ± 0% -18.10% (p=0.008 n=5+5)
JSONDecode-8 201ms ± 0% 155ms ± 1% -22.93% (p=0.008 n=5+5)
Mandelbrot200-8 13.0ms ± 0% 10.1ms ± 0% -22.78% (p=0.008 n=5+5)
GoParse-8 11.4ms ± 0% 8.5ms ± 0% -24.80% (p=0.008 n=5+5)
RegexpMatchEasy0_32-8 271ns ± 0% 225ns ± 0% -16.97% (p=0.008 n=5+5)
RegexpMatchEasy0_1K-8 1.69µs ± 0% 1.92µs ± 0% +13.42% (p=0.008 n=5+5)
RegexpMatchEasy1_32-8 292ns ± 0% 255ns ± 0% -12.60% (p=0.000 n=4+5)
RegexpMatchEasy1_1K-8 2.20µs ± 0% 2.38µs ± 0% +8.38% (p=0.008 n=5+5)
RegexpMatchMedium_32-8 411ns ± 0% 360ns ± 0% -12.41% (p=0.000 n=5+4)
RegexpMatchMedium_1K-8 118µs ± 0% 104µs ± 0% -12.07% (p=0.008 n=5+5)
RegexpMatchHard_32-8 6.83µs ± 0% 5.79µs ± 0% -15.27% (p=0.016 n=4+5)
RegexpMatchHard_1K-8 205µs ± 0% 176µs ± 0% -14.19% (p=0.008 n=5+5)
Revcomp-8 2.01s ± 0% 1.43s ± 0% -29.02% (p=0.008 n=5+5)
Template-8 259ms ± 0% 158ms ± 0% -38.93% (p=0.008 n=5+5)
TimeParse-8 874ns ± 1% 733ns ± 1% -16.16% (p=0.008 n=5+5)
TimeFormat-8 1.00µs ± 1% 0.86µs ± 1% -13.88% (p=0.008 n=5+5)
name old speed new speed delta
GobDecode-8 31.5MB/s ± 0% 44.3MB/s ± 0% +40.61% (p=0.008 n=5+5)
GobEncode-8 41.3MB/s ± 0% 50.7MB/s ± 1% +22.92% (p=0.008 n=5+5)
Gzip-8 16.2MB/s ± 0% 26.1MB/s ± 0% +61.33% (p=0.008 n=5+5)
Gunzip-8 102MB/s ± 0% 150MB/s ± 0% +46.45% (p=0.016 n=4+5)
JSONEncode-8 38.3MB/s ± 0% 46.7MB/s ± 0% +22.10% (p=0.008 n=5+5)
JSONDecode-8 9.64MB/s ± 0% 12.49MB/s ± 0% +29.54% (p=0.016 n=5+4)
GoParse-8 5.09MB/s ± 0% 6.78MB/s ± 0% +33.02% (p=0.008 n=5+5)
RegexpMatchEasy0_32-8 118MB/s ± 0% 142MB/s ± 0% +20.29% (p=0.008 n=5+5)
RegexpMatchEasy0_1K-8 605MB/s ± 0% 534MB/s ± 0% -11.85% (p=0.016 n=5+4)
RegexpMatchEasy1_32-8 110MB/s ± 0% 125MB/s ± 0% +14.23% (p=0.029 n=4+4)
RegexpMatchEasy1_1K-8 465MB/s ± 0% 430MB/s ± 0% -7.72% (p=0.008 n=5+5)
RegexpMatchMedium_32-8 2.43MB/s ± 0% 2.77MB/s ± 0% +13.99% (p=0.016 n=5+4)
RegexpMatchMedium_1K-8 8.68MB/s ± 0% 9.87MB/s ± 0% +13.71% (p=0.008 n=5+5)
RegexpMatchHard_32-8 4.68MB/s ± 0% 5.53MB/s ± 0% +18.08% (p=0.016 n=4+5)
RegexpMatchHard_1K-8 5.00MB/s ± 0% 5.83MB/s ± 0% +16.60% (p=0.008 n=5+5)
Revcomp-8 126MB/s ± 0% 178MB/s ± 0% +40.88% (p=0.008 n=5+5)
Template-8 7.48MB/s ± 0% 12.25MB/s ± 0% +63.74% (p=0.008 n=5+5)
隻是重新編譯代碼就能有相當大的改進。
Defer and cgo的提升
如果,在go1.8,根據一些基準測試顯示,Austin将使用延遲的開銷減少了一半。
即使defer是否可用于熱代碼路徑的問題依然沒有完全解決,但是根據一些基準測試,在go1.8 Austin(開發者)将使用延遲的開銷減少了一半。
runtime包基準測試結果不太好。
name old time/op new time/op delta
Defer-4 101ns ± 1% 66ns ± 0% -34.73% (p=0.000 n=20+20)
Defer10-4 93.2ns ± 1% 62.5ns ± 8% -33.02% (p=0.000 n=20+20)
DeferMany-4 148ns ± 3% 131ns ± 3% -11.42% (p=0.000 n=19+19)
在最常見的情況下,defer提高了三分之一性能,其中語句關閉不超過一個單一的變量。
此外,David Crawshaw的一項優化降低了cgo代碼路徑近一半的開銷。
name old time/op new time/op delta
CgoNoop-8 93.5ns ± 0% 51.1ns ± 1% -45.34% (p=0.016 n=4+5)
% env GOARCH=mips go build -o godoc.mips golang.org/x/tools/cmd/godoc
% file godoc.mips
godoc.mips: ELF 32-bit MSB executable, MIPS, MIPS32 version 1 (SYSV), statically linked, not stripped