Show and Tell: A Neural Image Caption Generator

2023-06-27 06:55:34

[Show and Tell: A Neural Image Caption Generator][https://arxiv.org/pdf/1411.4555v1.pdf]

概要

本文介紹了NIC算法，将CNN與LSTM結合，做了一件什麼事呢。就是國小時的看圖說話，利用CNN提取圖檔特征，并作為 t − 1 t_{-1} t−1輸入LSTM中，将描述性詞彙轉化為獨熱編碼，利用嵌入模型做為 S t S_t St輸入LSTM中。

公式

最大化似然函數：

θ ⋆ = arg ⁡ max ⁡ θ ∑ ( I , S ) log ⁡ p ( S ∣ I ; θ ) \theta^{\star}=\arg \max _{\theta} \sum_{(I, S)} \log p(S | I ; \theta) θ⋆=argθmax(I,S)∑logp(S∣I;θ)

可能性：

log ⁡ p ( S ∣ I ) = ∑ t = 0 N log ⁡ p ( S t ∣ I , S 0 , … , S t − 1 ) \log p(S | I)=\sum_{t=0}^{N} \log p\left(S_{t} | I, S_{0}, \ldots, S_{t-1}\right) logp(S∣I)=t=0∑Nlogp(St∣I,S0,…,St−1)

在LSTM中：

h t + 1 = f ( h t , x t ) h_{t+1}=f\left(h_{t}, x_{t}\right) ht+1=f(ht,xt)

i t = σ ( W i x x t + W i m m t − 1 ) f t = σ ( W f x x t + W f m m t − 1 ) o t = σ ( W o x x t + W o m m t − 1 ) c t = f t ⊙ c t − 1 + i t ⊙ h ( W c x x t + W c m m t − 1 ) ( 7 ) m t = o t ⊙ c t p t + 1 = Softmax ⁡ ( m t ) \begin{aligned} i_{t} &=\sigma\left(W_{i x} x_{t}+W_{i m} m_{t-1}\right) \\ f_{t} &=\sigma\left(W_{f x} x_{t}+W_{f m} m_{t-1}\right) \\ o_{t} &=\sigma\left(W_{o x} x_{t}+W_{o m} m_{t-1}\right) \\ c_{t} &=f_{t} \odot c_{t-1}+i_{t} \odot h\left(W_{c x} x_{t}+W_{c m} m_{t-1}\right)(7) \\ m_{t} &=o_{t} \odot c_{t} \\ p_{t+1} &=\operatorname{Softmax}\left(m_{t}\right) \end{aligned} itftotctmtpt+1=σ(Wixxt+Wimmt−1)=σ(Wfxxt+Wfmmt−1)=σ(Woxxt+Wommt−1)=ft⊙ct−1+it⊙h(Wcxxt+Wcmmt−1)(7)=ot⊙ct=Softmax(mt)

輸入和輸出：

x − 1 = C N N ( I ) x t = W e S t , t ∈ { 0 … N − 1 } p t + 1 = LSTM ⁡ ( x t ) , t ∈ { 0 … N − 1 } \begin{aligned} x_{-1} &=\mathrm{CNN}(I) \\ x_{t} &=W_{e} S_{t}, \quad t \in\{0 \ldots N-1\} \\ p_{t+1} &=\operatorname{LSTM}\left(x_{t}\right), \quad t \in\{0 \ldots N-1\} \end{aligned} x−1xtpt+1=CNN(I)=WeSt,t∈{0…N−1}=LSTM(xt),t∈{0…N−1}

網絡架構

Show and Tell: A Neural Image Caption Generator

Show and Tell: A Neural Image Caption Generator

[Show and Tell: A Neural Image Caption Generator][https://arxiv.org/pdf/1411.4555v1.pdf]

概要

公式

網絡架構

繼續閱讀

考證大全 | 證券從業資格考試

敲黑闆！2021年證券從業考試考點預測

2021年銀行從業考試考情介紹,果斷收藏!

證券從業合格證書什麼時候列印？有哪些注意事項？

【幹貨滿滿】初級銀行從業考試《個人理财》重點梳理

2020年經濟師考試，難嗎？

初級銀行從業資格證有什麼用？

MBA提前面試純幹貨分享

MBA值得學麼

吳恩達logistic回歸實作

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

深度學習模型分析人類複雜疾病的準确性

【趨高機器視覺】機器視覺技術原了解析及解決方案

阿裡、百度、搜狐等公司社招面試記錄與總結

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡