Our paper: https://arxiv.org/pdf/2105.08330.pdf.

Our code: https://github.com/ytchx1999/PyG-OGB-Tricks/tree/main

ogbn_arxiv_GCN_res

This is an improvement of baesline on the ogbn-arxiv dataset.

我的代碼：https://github.com/ytchx1999/ogbn_arxiv_GCN_res

ogbn-arxiv

The framework is shown in the figure.

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

Improvement Strategy：

add skip-connection inspired by DeeperGCN

按比例把前一層的結果加到目前層的結果上，既可以加速模型收斂速度，又可以緩解過平滑問題。
add initial residual connection inspired by GCNII

主要是借鑒了初始剩餘連接配接的思路，按照比例在後面層的結果中加上了 X ( 0 ) X^{(0)} X(0)，可以緩解模型過平滑問題，不并且能夠小幅提升模型的acc。
add jumping knowledge inspired by JKNet

類似JKNet，先儲存每一層的結果，然後經過softmax後進行sum，得到最終的節點特征表示，可以有效緩解過平滑問題。

Experiment Setup：

The model is 8 layers, and runs 500 epochs.

python ogbn_gcn_res.py

Detailed Hyperparameter:

num_layers = 8
hidden_dim = 128
dropout = 0.5
lr = 0.01
runs = 10
epochs = 500
alpha = 0.2
beta = 0.5

Result:

All runs:
Highest Train: 77.94 ± 0.50
Highest Valid: 73.69 ± 0.21
  Final Train: 77.72 ± 0.46
   Final Test: 72.62 ± 0.37

Model	Test Accuracy	Valid Accuracy	Parameters	Hardware
GCN_res	0.7262 ± 0.0037	0.7369 ± 0.0021	155824	Tesla T4（16GB）

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

已經向OGB排行榜送出了申請，OGB團隊正在驗證我的model，看看能不能選上吧。

2021.2.22更新

OGB團隊接受了我的代碼，暫時位列19，開森。

https://ogb.stanford.edu/docs/leader_nodeprop/#ogbn-arxiv

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

已經超過了GCN、GraphSAGE等baseline的表現，算是畢設的目标初步達成？

歡迎各位大佬批評指正！

2021.2.25更新

後面發現，我的model标準差std稍微有點大，是以采用了FLAG方法進行對抗性資料增廣，以穩定模型并小幅提升準确率。

GCN_res-FLAG

This is an improvement of the (GCN_res + 8 layers) model, using the FLAG method.

我的代碼：https://github.com/ytchx1999/GCN_res-FLAG

ogbn-arxiv

Check out the model：(GCN_res + 8 layers)
Check out the FLAG method：FLAG

Improvement Strategy：

add FLAG method

Environmental Requirements

pytorch == 1.7.1
pytorch_geometric == 1.6.3
ogb == 1.2.4

Experiment Setup：

The model is 8 layers, 10 runs which conclude 500 epochs.

python ogbn_gcn_res_flag.py

Detailed Hyperparameter:

num_layers = 8
hidden_dim = 128
dropout = 0.5
lr = 0.01
runs = 10
epochs = 500
alpha = 0.2
beta = 0.7

Result:

All runs:
Highest Train: 78.61 ± 0.49
Highest Valid: 73.89 ± 0.12
  Final Train: 78.44 ± 0.46
   Final Test: 72.76 ± 0.24

Model	Test Accuracy	Valid Accuracy	Parameters	Hardware
GCN_res + FLAG	0.7276 ± 0.0024	0.7389 ± 0.0012	155824	Tesla T4（16GB）

可以發現，模型準确率有了小幅度的提升，并且标準差也降下來了，算是達到了預期的目标。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

向OGB排行榜送出了代碼之後，等了大約1天吧，代碼被團隊接受。表現超過了GCNII，位列18。

https://ogb.stanford.edu/docs/leader_nodeprop/#ogbn-arxiv

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.3.25更新

将CorrectAndSmooth（C&S）方法應用到了我的模型GCN_res當中，經過了一番機械調參後，模型取得了不錯的效果！

我的代碼：https://github.com/ytchx1999/PyG-GCN_res-CS

PyG-GCN_res-CS

This is an improvement of the (GCN_res + 8 layers) model, using the C&S method.

ogbn-arxiv

Check out the model：(GCN_res + 8 layers)
Check out the C&S method：C&S

Improvement Strategy：

add C&S method

Environmental Requirements

pytorch == 1.7.1
pytorch_geometric == 1.6.3
ogb == 1.2.4

Experiment Setup：

The model is 8 layers, 10 runs which conclude 500 epochs.

python gcn_res_cs.py

Detailed Hyperparameter:

num_layers = 8
hidden_dim = 128
dropout = 0.5
lr = 0.01
runs = 10
epochs = 500
alpha = 0.2
beta = 0.7
num_correction_layers = 50
correction_alpha = 0.8
num_smoothing_layers = 50
smoothing_alpha = 0.8
scale = 1.
A1 = 'DAD'
A2 = 'DAD'

Result:

All runs:
Highest Train: 95.40 ± 0.02
Highest Valid: 74.23 ± 0.14
  Final Train: 95.40 ± 0.02
   Final Test: 72.97 ± 0.22

Model	Test Accuracy	Valid Accuracy	Parameters	Hardware
GCN_res + C&S	0.7297 ± 0.0022	0.7423 ± 0.0014	155824	Tesla T4（16GB）

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

代碼很快就被接受了，連1個小時都不到，這速度真實杠杠的！這樣的話我的排名又往前提升了幾個位次，GCN_res + C&S暫時位列第16（19–>16）。

每次進步一點點吧。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.3.27更新

昨天去所裡把伺服器的環境配好了，然後用一塊Tesla V100對GCN_res + C&S模型的C&S部分進行了機械調參（白嫖的快樂），然後成功獲得了更好性能的v2版本（GCN_res + C&S_v2），之後便送出了代碼。

代碼：https://github.com/ytchx1999/GCN_res-CS-v2

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

相比之前的排名又有所進步，目前為止排名13，已經達到了GCN核心目前為止的最好性能了！後面估計要針對GAT或UniMP這種注意力機制的模型進行改進了。

畢設主要是針對GCN核心做出的改進，論文也是基于這個寫的，感覺隻看畢設的話可能已經到頭了。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

畢設差不多可以了，新開一個副本，另起爐竈。。後面向ogbn-products、ogbn-mag資料集送出代碼。當然，就是練練手，學習學習别人的方法和代碼，做實驗的同時順便送出一下代碼。

2021.4.5更新

用GAT + C&S跑了一下ogbn-products資料集，當然，C&S是我自己手動加上去的，并沒有找到直接可以跑的版本。

GAT w/NS + C&S

This is an improvement of the (GAT with NeighborSampling) model, using the C&S method.

我的代碼：https://github.com/ytchx1999/PyG-ogbn-products/tree/main/gat

ogbn-products

Check out the model： (GAT with NeighborSampling)
Check out the C&S method：C&S

Improvement Strategy：

add C&S method
add BatchNorm

Environmental Requirements

pytorch == 1.8.1
pytorch_geometric == 1.6.3
ogb == 1.3.0

Experiment Setup：

Let the program run in the foreground.

python gat_cs_mini.py

Or let the program run in the background and save the results to a log file.

Detailed Hyperparameter:

num_layers = 3
hidden_dim = 128
heads = 4
dropout = 0.5
lr = 0.001
batch_size = 512
sizes = [10, 10, 10]
runs = 10
epochs = 100
num_correction_layers = 100
correction_alpha = 0.8
num_smoothing_layers = 100
smoothing_alpha = 0.8
scale = 10.
A1 = 'DAD'
A2 = 'DAD'

Result:

All runs:
Highest Train: 97.28 ± 0.06
Highest Valid: 92.63 ± 0.08
  Final Train: 97.28 ± 0.06
   Final Test: 80.92 ± 0.37

Model	Test Accuracy	Valid Accuracy	Parameters	Hardware
GAT w/NS + C&S	0.8092 ± 0.0037	0.9263 ± 0.0008	753622	Tesla V100 (32GB)

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

暫時位列第8名。MLP+C&S的表現過于的好，感覺模型越簡單，C&S發揮的作用就越大。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

同一天再更新一波吧。有用GraphSAGE做了實驗，雖然比不上GAT，但是相對來說排名的提高已經很多了。

我的代碼：https://github.com/ytchx1999/PyG-ogbn-products/tree/main/sage

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.4.7更新

把GraphSAGE + C&S模型加了node2vec的嵌入表示，又向ogbn-products排行榜送出了一下。

我的代碼：https://github.com/ytchx1999/PyG-ogbn-products/tree/main/sage%2Bnode2vec

GraphSAGE w/NS + C&S + node2vec

This is an improvement of the (NeighborSampling (SAGE aggr)) model, using the C&S method and node2vec embedding.

ogbn-products

Check out the model： (NeighborSampling (SAGE aggr))
Check out the C&S method：C&S
Check out node2vec model：node2vec

Improvement Strategy：

add C&S method
add BatchNorm
add node2vec embedding

Environmental Requirements

pytorch == 1.8.1
pytorch_geometric == 1.6.3
ogb == 1.3.0

Experiment Setup：

Generate node2vec embeddings, which save in embedding.pt
```
python node2vec_products.py
           
```
Run the real model
- Let the program run in the foreground.
```
python sage_cs_em.py
           
```
- Or let the program run in the background and save the results to a log file.

Detailed Hyperparameter:

num_layers = 3
hidden_dim = 256
dropout = 0.5
lr = 0.003
batch_size = 1024
sizes = [15, 10, 5]
runs = 10
epochs = 20
num_correction_layers = 100
correction_alpha = 0.8
num_smoothing_layers = 100
smoothing_alpha = 0.8
scale = 10.
A1 = 'DAD'
A2 = 'DAD'

Result:

All runs:
Highest Train: 97.13 ± 0.07
Highest Valid: 92.38 ± 0.06
  Final Train: 97.13 ± 0.07
   Final Test: 81.54 ± 0.50

Model	Test Accuracy	Valid Accuracy	Parameters	Hardware
GraphSAGE w/NS + C&S + node2vec	0.8154 ± 0.0050	0.9238 ± 0.0006	103983	Tesla V100 (32GB)

最終結果排名第7，超過了GAT + C&S的表現。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.4.10更新

又換了一個資料集：ogbn-mag。這是一個異構圖，十分的複雜。

我的代碼：https://github.com/ytchx1999/PyG-ogbn-mag/tree/main/saint%2Bmetapath2vec

GraphSAINT + metapath2vec

This is an improvement of the (GraphSAINT (R-GCN aggr)) model, using metapath2vec embedding.

ogbn-products

Check out the model： (GraphSAINT (R-GCN aggr))
Check out metapath2vec model：metapath2vec

Improvement Strategy：

adjust hidden_dim
add metapath2vec embedding

Environmental Requirements

pytorch == 1.8.1
pytorch_geometric == 1.6.3
ogb == 1.3.0

Experiment Setup：

Generate metapath2vec embeddings, which save in mag_embedding.pt
```
python metapath2vec.py
           
```
Run the real model
```
python rgcn_saint.py
           
```

Detailed Hyperparameter:

GrapgSAINT:

num_layers = 2
hidden_dim = 256
dropout = 0.5
lr = 0.005
batch_size = 20000
walk_length = 2
runs = 10
epochs = 30
num_steps = 30

Metapath2vec:

embedding_dim = 128
lr = 0.01
batch_size = 20000
walk_length = 64
epochs = 5

Result:

All runs:
Highest Train: 84.01 ± 2.72
Highest Valid: 50.66 ± 0.17
  Final Train: 84.01 ± 2.72
   Final Test: 49.66 ± 0.22

Model	Test Accuracy	Valid Accuracy	Parameters	Hardware
GraphSAINT + metapath2vec	0.4966 ± 0.0022	0.5066 ± 0.0017	309764724	Tesla V100 (32GB)

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

對GraphSAINT模型進行了改進，目前排名第5。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.5.24更新

給R-GSN模型增加了metapath2vec嵌入表示，成功沖到了ogbn-mag榜的第二名。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.6.8更新

又沖了一下ogbn-proteins榜。

代碼：https://github.com/ytchx1999/PyG-OGB-Tricks/tree/main/DGL-ogbn-proteins

GAT + labels + node2vec

This is an improvement of the GAT model by Wang (DGL), using node2vec embedding.

Our paper is available at https://arxiv.org/pdf/2105.08330.pdf.

ogbn-proteins

Improvement Strategy：

adjust hidden and embedding dim.
add node2vec embedding ---- the usage of node2vec greatly accelerates the convergence of GAT.

Environmental Requirements

dgl >= 0.5.0
torch >= 1.6.0
torch_geometric >= 1.6.0
ogb == 1.3.0

Experiment Setup：

Generate node2vec embeddings, which save in proteins_embedding.pt
```
python node2vec_proteins.py
           
```
Run the real model
- Let the program run in the foreground.
```
python gat.py --use-labels
           
```
- Or let the program run in the background and save the results to a log file.

Detailed Hyperparameter:

GAT:

Namespace(attn_drop=0.0, cpu=False, dropout=0.25, edge_drop=0.1, eval_every=5, gpu=0, input_drop=0.1, log_every=5, lr=0.01, n_epochs=1200, n_heads=6, n_hidden=128, n_layers=6, n_runs=10, no_attn_dst=False, plot_curves=False, save_pred=False, seed=0, use_embed=True, use_labels=True, wd=0)

--n-runs N_RUNS         running times (default: 10)
--n-epochs N_EPOCHS     number of epochs (default: 1200)
--use-labels            Use labels in the training set as input features. (default: False)
--lr LR                 learning rate (default: 0.01)
--n-layers N_LAYERS     number of layers (default: 6)
--n-heads N_HEADS       number of heads (default: 6)
--n-hidden N_HIDDEN     number of hidden units (default: 128)
--dropout DROPOUT       dropout rate (default: 0.25)
--input-drop INPUT_DROP input drop rate (default: 0.1)

node2vec:

embedding_dim = 16
lr = 0.01
batch_size = 256
walk_length = 80
epochs = 5

Result:

Val scores: [0.9229285934246892, 0.9211608885028892, 0.9213509308888836, 0.9219311666881109, 0.922188157691978, 0.9233155178378067, 0.9226761093114175, 0.9207967425451954, 0.9192225312946334, 0.9216411187053957]
Test scores: [0.8705177963169082, 0.8718678325708628, 0.871026339976343, 0.8713582109483052, 0.8706036035560922, 0.8709027982169764, 0.8704158483168263, 0.8704708862546975, 0.8713362807645616, 0.8726814140948117]

Average val score: 0.9217211756890998 ± 0.0011282315196969204
Average test score: 0.8711181011016385 ± 0.0006857984340481437

Model	Test Accuracy	Valid Accuracy	Parameters	Hardware
GAT + labels + node2vec	0.8711 ± 0.0007	0.9217 ± 0.0011	6360470	Tesla V100 (32GB)

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.7.27更新

用PyG重寫了MLP+C&S，好像還比原版還一點點

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&amp;SGraphSAGE w/NS + C&amp;S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

ogbn_arxiv_GCN_res

ogbn-arxiv

Improvement Strategy：

Experiment Setup：

Detailed Hyperparameter:

Result:

GCN_res-FLAG

ogbn-arxiv

Improvement Strategy：

Environmental Requirements

Experiment Setup：

Detailed Hyperparameter:

Result:

PyG-GCN_res-CS

ogbn-arxiv

Improvement Strategy：

Environmental Requirements

Experiment Setup：

Detailed Hyperparameter:

Result:

GAT w/NS + C&S

ogbn-products

Improvement Strategy：

Environmental Requirements

Experiment Setup：

Detailed Hyperparameter:

Result:

GraphSAGE w/NS + C&S + node2vec

ogbn-products

Improvement Strategy：

Environmental Requirements

Experiment Setup：

Detailed Hyperparameter:

Result:

GraphSAINT + metapath2vec

ogbn-products

Improvement Strategy：

Environmental Requirements

Experiment Setup：

Detailed Hyperparameter:

Result:

GAT + labels + node2vec

ogbn-proteins

Improvement Strategy：

Environmental Requirements

Experiment Setup：

Detailed Hyperparameter:

Result:

繼續閱讀

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec