Pytorch訓練時顯存配置設定過程探究

代碼：

import torch 
from torch import cuda 
import time
 
x = torch.zeros([1,1024,1024,128*2], requires_grad=True, device='cuda:0') 

print("1", cuda.memory_allocated()/1024**2)  

y = 5 * x 
# y.retain_grad()
print("2", cuda.memory_allocated()/1024**2)  


torch.mean(y).backward()     
print("3", cuda.memory_allocated()/1024**2)    
print(cuda.memory_summary())


time.sleep(60)

可以看到pytorch占顯存共4777MB空間，其中變量及緩存共占4096空間。可以知道其中1024MB空間為緩存，可以手動釋放，改代碼：

import torch 
from torch import cuda 
import time
 
x = torch.zeros([1,1024,1024,128*2], requires_grad=True, device='cuda:0') 

print("1", cuda.memory_allocated()/1024**2)  

y = 5 * x 
# y.retain_grad()
print("2", cuda.memory_allocated()/1024**2)  


torch.mean(y).backward()     
print("3", cuda.memory_allocated()/1024**2)    


torch.cuda.empty_cache()
print(cuda.memory_summary())



time.sleep(60)

根據參考文章可知，1024*3MB是變量記憶體，其餘700MB為其他記憶體，其中變量記憶體中有1024為x.grad，而且程式運作過程中顯存配置設定峰值為4096MB，如下圖：

其中包括 x.grad 和 y.grad 各1024MB空間。

如果儲存非葉子節點的grad值，即儲存y.grad，運作：

import torch 
from torch import cuda 
import time
 
x = torch.zeros([1,1024,1024,128*2], requires_grad=True, device='cuda:0') 

print("1", cuda.memory_allocated()/1024**2)  

y = 5 * x 
y.retain_grad()
print("2", cuda.memory_allocated()/1024**2)  


torch.mean(y).backward()     
print("3", cuda.memory_allocated()/1024**2)    


torch.cuda.empty_cache()
print(cuda.memory_summary())



time.sleep(60)

發現顯存不夠用了，也就是說儲存y.grad後整體顯存已經快達到5.9GB了，于是相同代碼再Titan上運作：

發現總顯存：

運作結果：

================================================

給出參考文章中給出的分析圖：

一個個人體會就是如果是在多人使用共享顯示卡的時候，手動進行顯存釋放操作：

torch.cuda.empty_cache()

是非常不明智的，如果手動執行該操作後就把中間變量梯度給釋放了，此時如果其他程序申請顯存空間就有可能把剛才釋放的那部分梯度的顯存空間給占有了，如果原程式再次進行求梯度而此時系統顯存已經不夠就會導緻系統崩潰。

======================================

不過對于原文的分析本人還是有一定的懷疑的，在泰坦上的運作代碼和結果：

import torch 
from torch import cuda 
import time
 
x = torch.ones([1,1024,1024,128*2], requires_grad=True, device='cuda:0') 

print("1", cuda.memory_allocated()/1024**2)  

y = 5 * x 
y.retain_grad()
print("2", cuda.memory_allocated()/1024**2)  


print(cuda.memory_summary())
#time.sleep(30)


torch.mean(y).backward()     
print("3", cuda.memory_allocated()/1024**2)    


# torch.cuda.empty_cache()
print(cuda.memory_summary())


print(y.grad)
print("."*100)
print(x.grad)
time.sleep(60)

=================================================

代碼：

import torch 
from torch import cuda 
import time
 
x = torch.ones([1,1024,1024,128*2], requires_grad=True, device='cuda:0') 

print("1", cuda.memory_allocated()/1024**2)  

y = 5 * x 
# y.retain_grad()
print("2", cuda.memory_allocated()/1024**2)  


print(cuda.memory_summary())
#time.sleep(30)


torch.mean(y).backward()     
print("3", cuda.memory_allocated()/1024**2)    


# torch.cuda.empty_cache()
print(cuda.memory_summary())


print(y.grad)
print("."*100)
print(x.grad)
time.sleep(60)

可以看到在titan顯示卡上如果儲存y.grad，那麼顯存最後會儲存5個1024MB的顯存，其中，x,y,x.grad,y.grad變量各占1個1024MB顯存，那多出的那1024MB顯存又是怎麼回事呢，這裡假設這部分神奇緩存為X。

那麼原文中如果不儲存y.grad，那可能不可能是在y.grad的基礎上進行in-replace操作呢，最後釋放的緩存可能不可能是那個神奇X空間而不是y.grad空間呢，我想這也是有可能的。

個人還是比較支援 y.grad 空間被x.grad 空間覆寫這個觀點的，至于多出來的1024MB的神奇X顯存空間隻能說這可能是pytorch在反向傳導過程中求梯度隐含操作所産生的緩存空間。

Pytorch訓練時顯存配置設定過程探究

繼續閱讀

JavaScript自學筆記【4】函數的聲明與調用目錄二、函數的聲明三、函數的調用

請求逾時VUE axios重新再次請求

nodejs微信開發---授權登入+擷取使用者資訊微信網頁授權

debian9更新4.9.0核心到4.19.2核心過程

Javascript建構Bingo卡片遊戲

JavaScript的那些坑之事件代理事件代理事件階段

javascript的for (var i in data)慎用javascript中的for (var i in data)謹慎用

模拟A卷二、6 unix系統中tail指令實作

tab滑鼠經過菜單切換

vue （vue2.0）使用總結(從大體結構總結)

vue搭建過程及出現問題

/\B(?=(?:\d{3})+$)/g 一條令人費解的正規表達式

适用于JavaScript的ECMAScript 2020規範向前發展

JS生成uuid的四種方法

layui多任務上傳添加進度條