天天看點

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

Our paper: https://arxiv.org/pdf/2105.08330.pdf.

Our code: https://github.com/ytchx1999/PyG-OGB-Tricks/tree/main

ogbn_arxiv_GCN_res

This is an improvement of baesline on the ogbn-arxiv dataset.

我的代碼:https://github.com/ytchx1999/ogbn_arxiv_GCN_res

ogbn-arxiv

The framework is shown in the figure.

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

Improvement Strategy:

  • add skip-connection inspired by DeeperGCN

    按比例把前一層的結果加到目前層的結果上,既可以加速模型收斂速度,又可以緩解過平滑問題。

  • add initial residual connection inspired by GCNII

    主要是借鑒了初始剩餘連接配接的思路,按照比例在後面層的結果中加上了 X ( 0 ) X^{(0)} X(0),可以緩解模型過平滑問題,不并且能夠小幅提升模型的acc。

  • add jumping knowledge inspired by JKNet

    類似JKNet,先儲存每一層的結果,然後經過softmax後進行sum,得到最終的節點特征表示,可以有效緩解過平滑問題。

Experiment Setup:

The model is 8 layers, and runs 500 epochs.

python ogbn_gcn_res.py
           

Detailed Hyperparameter:

num_layers = 8
hidden_dim = 128
dropout = 0.5
lr = 0.01
runs = 10
epochs = 500
alpha = 0.2
beta = 0.5
           

Result:

All runs:
Highest Train: 77.94 ± 0.50
Highest Valid: 73.69 ± 0.21
  Final Train: 77.72 ± 0.46
   Final Test: 72.62 ± 0.37
           
Model Test Accuracy Valid Accuracy Parameters Hardware
GCN_res 0.7262 ± 0.0037 0.7369 ± 0.0021 155824 Tesla T4(16GB)
我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

已經向OGB排行榜送出了申請,OGB團隊正在驗證我的model,看看能不能選上吧。

2021.2.22更新

OGB團隊接受了我的代碼,暫時位列19,開森。

https://ogb.stanford.edu/docs/leader_nodeprop/#ogbn-arxiv

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

已經超過了GCN、GraphSAGE等baseline的表現,算是畢設的目标初步達成?

歡迎各位大佬批評指正!

2021.2.25更新

後面發現,我的model标準差std稍微有點大,是以采用了FLAG方法進行對抗性資料增廣,以穩定模型并小幅提升準确率。

GCN_res-FLAG

This is an improvement of the (GCN_res + 8 layers) model, using the FLAG method.

我的代碼:https://github.com/ytchx1999/GCN_res-FLAG

ogbn-arxiv

  • Check out the model:(GCN_res + 8 layers)
  • Check out the FLAG method:FLAG

Improvement Strategy:

  • add FLAG method

Environmental Requirements

  • pytorch == 1.7.1
  • pytorch_geometric == 1.6.3
  • ogb == 1.2.4

Experiment Setup:

The model is 8 layers, 10 runs which conclude 500 epochs.

python ogbn_gcn_res_flag.py
           

Detailed Hyperparameter:

num_layers = 8
hidden_dim = 128
dropout = 0.5
lr = 0.01
runs = 10
epochs = 500
alpha = 0.2
beta = 0.7
           

Result:

All runs:
Highest Train: 78.61 ± 0.49
Highest Valid: 73.89 ± 0.12
  Final Train: 78.44 ± 0.46
   Final Test: 72.76 ± 0.24
           
Model Test Accuracy Valid Accuracy Parameters Hardware
GCN_res + FLAG 0.7276 ± 0.0024 0.7389 ± 0.0012 155824 Tesla T4(16GB)

可以發現,模型準确率有了小幅度的提升,并且标準差也降下來了,算是達到了預期的目标。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

向OGB排行榜送出了代碼之後,等了大約1天吧,代碼被團隊接受。表現超過了GCNII,位列18。

https://ogb.stanford.edu/docs/leader_nodeprop/#ogbn-arxiv

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.3.25更新

将CorrectAndSmooth(C&S)方法應用到了我的模型GCN_res當中,經過了一番機械調參後,模型取得了不錯的效果!

我的代碼:https://github.com/ytchx1999/PyG-GCN_res-CS

PyG-GCN_res-CS

This is an improvement of the (GCN_res + 8 layers) model, using the C&S method.

ogbn-arxiv

  • Check out the model:(GCN_res + 8 layers)
  • Check out the C&S method:C&S

Improvement Strategy:

  • add C&S method

Environmental Requirements

  • pytorch == 1.7.1
  • pytorch_geometric == 1.6.3
  • ogb == 1.2.4

Experiment Setup:

The model is 8 layers, 10 runs which conclude 500 epochs.

python gcn_res_cs.py
           

Detailed Hyperparameter:

num_layers = 8
hidden_dim = 128
dropout = 0.5
lr = 0.01
runs = 10
epochs = 500
alpha = 0.2
beta = 0.7
num_correction_layers = 50
correction_alpha = 0.8
num_smoothing_layers = 50
smoothing_alpha = 0.8
scale = 1.
A1 = 'DAD'
A2 = 'DAD'
           

Result:

All runs:
Highest Train: 95.40 ± 0.02
Highest Valid: 74.23 ± 0.14
  Final Train: 95.40 ± 0.02
   Final Test: 72.97 ± 0.22
           
Model Test Accuracy Valid Accuracy Parameters Hardware
GCN_res + C&S 0.7297 ± 0.0022 0.7423 ± 0.0014 155824 Tesla T4(16GB)
我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

代碼很快就被接受了,連1個小時都不到,這速度真實杠杠的!這樣的話我的排名又往前提升了幾個位次,GCN_res + C&S暫時位列第16(19–>16)。

每次進步一點點吧。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.3.27更新

昨天去所裡把伺服器的環境配好了,然後用一塊Tesla V100對GCN_res + C&S模型的C&S部分進行了機械調參(白嫖的快樂),然後成功獲得了更好性能的v2版本(GCN_res + C&S_v2),之後便送出了代碼。

代碼:https://github.com/ytchx1999/GCN_res-CS-v2

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

相比之前的排名又有所進步,目前為止排名13,已經達到了GCN核心目前為止的最好性能了!後面估計要針對GAT或UniMP這種注意力機制的模型進行改進了。

畢設主要是針對GCN核心做出的改進,論文也是基于這個寫的,感覺隻看畢設的話可能已經到頭了。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

畢設差不多可以了,新開一個副本,另起爐竈。。後面向ogbn-products、ogbn-mag資料集送出代碼。當然,就是練練手,學習學習别人的方法和代碼,做實驗的同時順便送出一下代碼。

2021.4.5更新

用GAT + C&S跑了一下ogbn-products資料集,當然,C&S是我自己手動加上去的,并沒有找到直接可以跑的版本。

GAT w/NS + C&S

This is an improvement of the (GAT with NeighborSampling) model, using the C&S method.

我的代碼:https://github.com/ytchx1999/PyG-ogbn-products/tree/main/gat

ogbn-products

  • Check out the model: (GAT with NeighborSampling)
  • Check out the C&S method:C&S

Improvement Strategy:

  • add C&S method
  • add BatchNorm

Environmental Requirements

  • pytorch == 1.8.1
  • pytorch_geometric == 1.6.3
  • ogb == 1.3.0

Experiment Setup:

  • Let the program run in the foreground.
python gat_cs_mini.py
           
  • Or let the program run in the background and save the results to a log file.

Detailed Hyperparameter:

num_layers = 3
hidden_dim = 128
heads = 4
dropout = 0.5
lr = 0.001
batch_size = 512
sizes = [10, 10, 10]
runs = 10
epochs = 100
num_correction_layers = 100
correction_alpha = 0.8
num_smoothing_layers = 100
smoothing_alpha = 0.8
scale = 10.
A1 = 'DAD'
A2 = 'DAD'
           

Result:

All runs:
Highest Train: 97.28 ± 0.06
Highest Valid: 92.63 ± 0.08
  Final Train: 97.28 ± 0.06
   Final Test: 80.92 ± 0.37
           
Model Test Accuracy Valid Accuracy Parameters Hardware
GAT w/NS + C&S 0.8092 ± 0.0037 0.9263 ± 0.0008 753622 Tesla V100 (32GB)
我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

暫時位列第8名。MLP+C&S的表現過于的好,感覺模型越簡單,C&S發揮的作用就越大。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

同一天再更新一波吧。有用GraphSAGE做了實驗,雖然比不上GAT,但是相對來說排名的提高已經很多了。

我的代碼:https://github.com/ytchx1999/PyG-ogbn-products/tree/main/sage

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec
我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.4.7更新

把GraphSAGE + C&S模型加了node2vec的嵌入表示,又向ogbn-products排行榜送出了一下。

我的代碼:https://github.com/ytchx1999/PyG-ogbn-products/tree/main/sage%2Bnode2vec

GraphSAGE w/NS + C&S + node2vec

This is an improvement of the (NeighborSampling (SAGE aggr)) model, using the C&S method and node2vec embedding.

ogbn-products

  • Check out the model: (NeighborSampling (SAGE aggr))
  • Check out the C&S method:C&S
  • Check out node2vec model:node2vec

Improvement Strategy:

  • add C&S method
  • add BatchNorm
  • add node2vec embedding

Environmental Requirements

  • pytorch == 1.8.1
  • pytorch_geometric == 1.6.3
  • ogb == 1.3.0

Experiment Setup:

  1. Generate node2vec embeddings, which save in

    embedding.pt

    python node2vec_products.py
               
  2. Run the real model
    • Let the program run in the foreground.
    python sage_cs_em.py
               
    • Or let the program run in the background and save the results to a log file.

Detailed Hyperparameter:

num_layers = 3
hidden_dim = 256
dropout = 0.5
lr = 0.003
batch_size = 1024
sizes = [15, 10, 5]
runs = 10
epochs = 20
num_correction_layers = 100
correction_alpha = 0.8
num_smoothing_layers = 100
smoothing_alpha = 0.8
scale = 10.
A1 = 'DAD'
A2 = 'DAD'
           

Result:

All runs:
Highest Train: 97.13 ± 0.07
Highest Valid: 92.38 ± 0.06
  Final Train: 97.13 ± 0.07
   Final Test: 81.54 ± 0.50
           
Model Test Accuracy Valid Accuracy Parameters Hardware
GraphSAGE w/NS + C&S + node2vec 0.8154 ± 0.0050 0.9238 ± 0.0006 103983 Tesla V100 (32GB)

最終結果排名第7,超過了GAT + C&S的表現。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.4.10更新

又換了一個資料集:ogbn-mag。這是一個異構圖,十分的複雜。

我的代碼:https://github.com/ytchx1999/PyG-ogbn-mag/tree/main/saint%2Bmetapath2vec

GraphSAINT + metapath2vec

This is an improvement of the (GraphSAINT (R-GCN aggr)) model, using metapath2vec embedding.

ogbn-products

  • Check out the model: (GraphSAINT (R-GCN aggr))
  • Check out metapath2vec model:metapath2vec

Improvement Strategy:

  • adjust hidden_dim
  • add metapath2vec embedding

Environmental Requirements

  • pytorch == 1.8.1
  • pytorch_geometric == 1.6.3
  • ogb == 1.3.0

Experiment Setup:

  1. Generate metapath2vec embeddings, which save in

    mag_embedding.pt

    python metapath2vec.py
               
  2. Run the real model
    python rgcn_saint.py
               

Detailed Hyperparameter:

GrapgSAINT:

num_layers = 2
hidden_dim = 256
dropout = 0.5
lr = 0.005
batch_size = 20000
walk_length = 2
runs = 10
epochs = 30
num_steps = 30
           

Metapath2vec:

embedding_dim = 128
lr = 0.01
batch_size = 20000
walk_length = 64
epochs = 5
           

Result:

All runs:
Highest Train: 84.01 ± 2.72
Highest Valid: 50.66 ± 0.17
  Final Train: 84.01 ± 2.72
   Final Test: 49.66 ± 0.22
           
Model Test Accuracy Valid Accuracy Parameters Hardware
GraphSAINT + metapath2vec 0.4966 ± 0.0022 0.5066 ± 0.0017 309764724 Tesla V100 (32GB)
我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

對GraphSAINT模型進行了改進,目前排名第5。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.5.24更新

給R-GSN模型增加了metapath2vec嵌入表示,成功沖到了ogbn-mag榜的第二名。

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec
我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.6.8更新

又沖了一下ogbn-proteins榜。

代碼:https://github.com/ytchx1999/PyG-OGB-Tricks/tree/main/DGL-ogbn-proteins

GAT + labels + node2vec

This is an improvement of the GAT model by Wang (DGL), using node2vec embedding.

Our paper is available at https://arxiv.org/pdf/2105.08330.pdf.

ogbn-proteins

Improvement Strategy:

  • adjust hidden and embedding dim.
  • add node2vec embedding ---- the usage of node2vec greatly accelerates the convergence of GAT.

Environmental Requirements

  • dgl >= 0.5.0
  • torch >= 1.6.0
  • torch_geometric >= 1.6.0
  • ogb == 1.3.0

Experiment Setup:

  1. Generate node2vec embeddings, which save in

    proteins_embedding.pt

    python node2vec_proteins.py
               
  2. Run the real model
    • Let the program run in the foreground.
    python gat.py --use-labels
               
    • Or let the program run in the background and save the results to a log file.

Detailed Hyperparameter:

GAT:

Namespace(attn_drop=0.0, cpu=False, dropout=0.25, edge_drop=0.1, eval_every=5, gpu=0, input_drop=0.1, log_every=5, lr=0.01, n_epochs=1200, n_heads=6, n_hidden=128, n_layers=6, n_runs=10, no_attn_dst=False, plot_curves=False, save_pred=False, seed=0, use_embed=True, use_labels=True, wd=0)

--n-runs N_RUNS         running times (default: 10)
--n-epochs N_EPOCHS     number of epochs (default: 1200)
--use-labels            Use labels in the training set as input features. (default: False)
--lr LR                 learning rate (default: 0.01)
--n-layers N_LAYERS     number of layers (default: 6)
--n-heads N_HEADS       number of heads (default: 6)
--n-hidden N_HIDDEN     number of hidden units (default: 128)
--dropout DROPOUT       dropout rate (default: 0.25)
--input-drop INPUT_DROP input drop rate (default: 0.1)
           

node2vec:

embedding_dim = 16
lr = 0.01
batch_size = 256
walk_length = 80
epochs = 5
           

Result:

Val scores: [0.9229285934246892, 0.9211608885028892, 0.9213509308888836, 0.9219311666881109, 0.922188157691978, 0.9233155178378067, 0.9226761093114175, 0.9207967425451954, 0.9192225312946334, 0.9216411187053957]
Test scores: [0.8705177963169082, 0.8718678325708628, 0.871026339976343, 0.8713582109483052, 0.8706036035560922, 0.8709027982169764, 0.8704158483168263, 0.8704708862546975, 0.8713362807645616, 0.8726814140948117]

Average val score: 0.9217211756890998 ± 0.0011282315196969204
Average test score: 0.8711181011016385 ± 0.0006857984340481437
           
Model Test Accuracy Valid Accuracy Parameters Hardware
GAT + labels + node2vec 0.8711 ± 0.0007 0.9217 ± 0.0011 6360470 Tesla V100 (32GB)
我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec
我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

2021.7.27更新

用PyG重寫了MLP+C&S,好像還比原版還一點點

我向OGB排行榜送出代碼的經曆ogbn_arxiv_GCN_resGCN_res-FLAGPyG-GCN_res-CSGAT w/NS + C&SGraphSAGE w/NS + C&S + node2vecGraphSAINT + metapath2vecGAT + labels + node2vec

繼續閱讀