天天看點

(轉載)YOLO配置檔案了解

YOLO配置檔案了解

轉載自

[net]
batch=64                           每batch個樣本更新一次參數。
subdivisions=8                     如果記憶體不夠大,将batch分割為subdivisions個子batch,每個子batch的大小為batch/subdivisions。
                                   在darknet代碼中,會将batch/subdivisions命名為batch。
height=416                         input圖像的高
width=416                          Input圖像的寬
channels=3                         Input圖像的通道數
momentum=0.9                       動量
decay=0.0005                       權重衰減正則項,防止過拟合
angle=0                            通過旋轉角度來生成更多訓練樣本
saturation = 1.5                   通過調整飽和度來生成更多訓練樣本
exposure = 1.5                     通過調整曝光量來生成更多訓練樣本
hue=.1                             通過調整色調來生成更多訓練樣本

learning_rate=0.0001               初始學習率
max_batches = 45000                訓練達到max_batches後停止學習
policy=steps                       調整學習率的policy,有如下policy:CONSTANT, STEP, EXP, POLY, STEPS, SIG, RANDOM
steps=100,25000,35000              根據batch_num調整學習率
scales=10,.1,.1                    學習率變化的比例,累計相乘

[convolutional]
batch_normalize=1                  是否做BN
filters=32                         輸出多少個特征圖
size=3                             卷積核的尺寸
stride=1                           做卷積運算的步長
pad=1                              如果pad為0,padding由 padding參數指定。如果pad為1,padding大小為size/2
activation=leaky                   激活函數:
                                   logistic,loggy,relu,elu,relie,plse,hardtan,lhtan,linear,ramp,leaky,tanh,stair

[maxpool]
size=2                             池化層尺寸
stride=2                           池化步進

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

......
......


#######

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[route]                            the route layer is to bring finer grained features in from earlier in the network
layers=-9

[reorg]                            the reorg layer is to make these features match the feature map size at the later layer. 
                                   The end feature map is 13x13, the feature map from earlier is 26x26x512. 
                                   The reorg layer maps the 26x26x512 feature map onto a 13x13x2048 feature map 
                                   so that it can be concatenated with the feature maps at 13x13 resolution.
stride=2

[route]
layers=-1,-3

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=125                        region前最後一個卷積層的filters數是特定的,計算公式為filter=num*(classes+5) 
                                   5的意義是5個坐标,論文中的tx,ty,tw,th,to
activation=linear

[region]
anchors = 1.08,1.19,  3.42,4.41,  6.63,11.38,  9.42,5.11,  16.62,10.52          預選框,可以手工挑選,
                                                                                也可以通過k means 從訓練樣本中學出
bias_match=1
classes=20                         網絡需要識别的物體種類數
coords=4                           每個box的4個坐标tx,ty,tw,th
num=5                              每個grid cell預測幾個box,和anchors的數量一緻。當想要使用更多anchors時需要調大num,且如果調大num後訓練時Obj趨近0的話可以嘗試調大object_scale
softmax=1                          使用softmax做激活函數
jitter=.2                          通過抖動增加噪聲來抑制過拟合
rescore=1                          暫了解為一個開關,非0時通過重打分來調整l.delta(預測值與真實值的差)

object_scale=5                     栅格中有物體時,bbox的confidence loss對總loss計算貢獻的權重
noobject_scale=1                   栅格中沒有物體時,bbox的confidence loss對總loss計算貢獻的權重
class_scale=1                      類别loss對總loss計算貢獻的權重                      
coord_scale=1                      bbox坐标預測loss對總loss計算貢獻的權重

absolute=1
thresh = .6
random=0                           random為1時會啟用Multi-Scale Training,随機使用不同尺寸的圖檔進行訓練。
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115
           

darknet對應代碼

找到cfg檔案解析的代碼,選擇detector demo 作為入口

darknet.c檔案 main 函數開始

} else if (0 == strcmp(argv[1], "detector")){
    run_detector(argc, argv);
123
           

Detector.c檔案 run_detector函數

char *prefix = find_char_arg(argc, argv, "-prefix", 0);
float thresh = find_float_arg(argc, argv, "-thresh", .24);
float hier_thresh = find_float_arg(argc, argv, "-hier", .5);
int cam_index = find_int_arg(argc, argv, "-c", 0);
int frame_skip = find_int_arg(argc, argv, "-s", 0);
if(argc < 4){
    fprintf(stderr, "usage: %s %s [train/test/valid] [cfg] [weights (optional)]\n", argv[0], argv[1]);
    return;
}
char *gpu_list = find_char_arg(argc, argv, "-gpus", 0);
char *outfile = find_char_arg(argc, argv, "-out", 0);

......
......

else if(0==strcmp(argv[2], "demo")) {
    list *options = read_data_cfg(datacfg);
    int classes = option_find_int(options, "classes", 20);
    char *name_list = option_find_str(options, "names", "data/names.list");
    char **names = get_labels(name_list);
    demo(cfg, weights, thresh, cam_index, filename, names, classes, frame_skip, prefix, hier_thresh);
}
1234567891011121314151617181920212223
           

read_data_cfg函數解析配置檔案,儲存到options指針。

class

int classes = option_find_int(options, "classes", 20);
12
           

classes為YOLO可識别的種類數

batch、learning_rate、momentum、decay和 subdivisions

demo.c檔案demo函數

net = parse_network_cfg(cfgfile);
12
           

Parser.c檔案 parse_network_cfg函數

list *sections = read_cfg(filename);
node *n = sections->front;
if(!n) error("Config file has no sections");
network net = make_network(sections->size - 1);
net.gpu_index = gpu_index;
size_params params;

section *s = (section *)n->val;
list *options = s->options;
if(!is_network(s)) error("First section must be [net] or [network]");
parse_net_options(options, &net);
123456789101112
           

parse_net_options函數

net->batch = option_find_int(options, "batch",1);
net->learning_rate = option_find_float(options, "learning_rate", .001);
net->momentum = option_find_float(options, "momentum", .9);
net->decay = option_find_float(options, "decay", .0001);
int subdivs = option_find_int(options, "subdivisions",1);
net->time_steps = option_find_int_quiet(options, "time_steps",1);
net->batch /= subdivs;
net->batch *= net->time_steps;
net->subdivisions = subdivs;
12345678910
           

learning_rate為初始學習率,訓練時的真正學習率和學習率的政策及初始學習率有關。

momentum為動量,在訓練時加入動量可以幫助走出local minima 以及saddle point。

decay是權重衰減正則項,用來防止過拟合。

batch的值等于cfg檔案中的batch/subdivisions 再乘以time_steps。

time_steps在yolo預設的cfg中是沒有配置的,是以是預設值1。

是以batch可以認為就是cfg檔案中的batch/subdivisions。

前面有提到batch的意義是每batch個樣本更新一次參數。

而subdivisions的意義在于降低對GPU memory的要求。

darknet将batch分割為subdivisions個子batch,每個子batch的大小為batch/subdivisions,并将子batch命名為batch。

我們看下訓練時和batch有關的代碼

Detector.c檔案的train_detector函數

#ifdef GPU
    if(ngpus == 1){
        loss = train_network(net, train);
    } else {
        loss = train_networks(nets, ngpus, train, 4);
    }
#else
    loss = train_network(net, train);
#endif
12345678910
           

Network.c檔案的train_network函數

int batch = net.batch;
int n = d.X.rows / batch;
float *X = calloc(batch*d.X.cols, sizeof(float));
float *y = calloc(batch*d.y.cols, sizeof(float));

int i;
float sum = 0;
for(i = 0; i < n; ++i){
    get_next_batch(d, batch, i*batch, X, y);
    float err = train_network_datum(net, X, y);
    sum += err;
}
12345678910111213
           

train_network_datum函數

*net.seen += net.batch;
......
......
forward_network(net, state);
backward_network(net, state);
float error = get_network_cost(net);
if(((*net.seen)/net.batch)%net.subdivisions == 0) update_network(net);
12345678
           

我們看到,隻有((*net.seen)/net.batch)%net.subdivisions == 0時才會更新網絡參數。

net.seen是已經訓練過的子batch數,((net.seen)/net.batch)%net.subdivisions的意義正是已經訓練過了多少個真正的batch。

policy、steps和scales

char *policy_s = option_find_str(options, "policy", "constant");
net->policy = get_policy(policy_s);
net->burn_in = option_find_int_quiet(options, "burn_in", 0);
if(net->policy == STEP){
    net->step = option_find_int(options, "step", 1);
    net->scale = option_find_float(options, "scale", 1);
} else if (net->policy == STEPS){
    char *l = option_find(options, "steps");   
    char *p = option_find(options, "scales");   
    if(!l || !p) error("STEPS policy must have steps and scales in cfg file");

    int len = strlen(l);
    int n = 1;
    int i;
    for(i = 0; i < len; ++i){
        if (l[i] == ',') ++n;
    }
    int *steps = calloc(n, sizeof(int));
    float *scales = calloc(n, sizeof(float));
    for(i = 0; i < n; ++i){
        int step    = atoi(l);
        float scale = atof(p);
        l = strchr(l, ',')+1;
        p = strchr(p, ',')+1;
        steps[i] = step;
        scales[i] = scale;
    }
    net->scales = scales;
    net->steps = steps;
    net->num_steps = n;
} else if (net->policy == EXP){
    net->gamma = option_find_float(options, "gamma", 1);
} else if (net->policy == SIG){
    net->gamma = option_find_float(options, "gamma", 1);
    net->step = option_find_int(options, "step", 1);
} else if (net->policy == POLY || net->policy == RANDOM){
    net->power = option_find_float(options, "power", 1);
}
123456789101112131415161718192021222324252627282930313233343536373839
           

get_policy函數

if (strcmp(s, "random")==0) return RANDOM;
if (strcmp(s, "poly")==0) return POLY;
if (strcmp(s, "constant")==0) return CONSTANT;
if (strcmp(s, "step")==0) return STEP;
if (strcmp(s, "exp")==0) return EXP;
if (strcmp(s, "sigmoid")==0) return SIG;
if (strcmp(s, "steps")==0) return STEPS;
fprintf(stderr, "Couldn't find policy %s, going with constant\n", s);
return CONSTANT;
12345678910
           

學習率動态調整的政策有多種,YOLO預設使用的是steps。

yolo-voc.cfg檔案:

steps=100,25000,35000

scales=10,.1,.1

Network.c檔案get_current_rate函數

int batch_num = get_current_batch(net);
int i;
float rate;
switch (net.policy) {
    case CONSTANT:
        return net.learning_rate;
    case STEP:
        return net.learning_rate * pow(net.scale, batch_num/net.step);
    case STEPS:
        rate = net.learning_rate;
        for(i = 0; i < net.num_steps; ++i){
            if(net.steps[i] > batch_num) return rate;
            rate *= net.scales[i];
            //if(net.steps[i] > batch_num - 1 && net.scales[i] > 1) reset_momentum(net);
        }
        return rate;
1234567891011121314151617
           

get_current_batch擷取的是(net.seen)/(net.batchnet.subdivisions),即真正的batch。

steps的每個階段是根據batch_num劃分的,根據配置檔案,學習率會在batch_num達到100、25000、35000時發生改變。

目前的學習率是初始學習率與目前階段及之前所有階段對應的scale的總乘積。

convolutional超參數加載

Parser.c檔案parse_network_cfg函數

LAYER_TYPE lt = string_to_layer_type(s->type);
        if(lt == CONVOLUTIONAL){
            l = parse_convolutional(options, params);
1234
           

parse_convolutional函數

int n = option_find_int(options, "filters",1);
int size = option_find_int(options, "size",1);
int stride = option_find_int(options, "stride",1);
int pad = option_find_int_quiet(options, "pad",0);
int padding = option_find_int_quiet(options, "padding",0);
if(pad) padding = size/2;

char *activation_s = option_find_str(options, "activation", "logistic");
ACTIVATION activation = get_activation(activation_s);

int batch,h,w,c;
h = params.h;
w = params.w;
c = params.c;
batch=params.batch;
if(!(h && w && c)) error("Layer before convolutional layer must output image.");
int batch_normalize = option_find_int_quiet(options, "batch_normalize", 0);
123456789101112131415161718
           

需要注意的是如果enable了pad,cfg檔案中的padding不會生效,實際的padding值為size/2。

random

YOLOv2新增了一些訓練技巧,Multi-Scale Training就是其中之一,如果random置為1,會啟用Multi-Scale Training。

啟用Multi-Scale Training時每10個Batch,網絡會随機地選擇一個新的圖檔尺寸,由于使用的down samples是32,是以不同的尺寸大小也選擇為32的倍數{320,352…..608},最小320320,最大608608,網絡會自動改變尺寸,并繼續訓練的過程。

這一政策讓網絡在不同的輸入尺寸上都能達到一個很好的預測效果,同一網絡能在不同分辨率上進行檢測。當輸入圖檔尺寸比較小的時候跑的比較快,輸入圖檔尺寸比較大的時候精度高。

route 和 reorg

YOLOv2新增了Fine-Grained Features技巧,參考特征金字塔和ResNet,把高分辨率特征與低分辨率特征聯系在一起,進而增加對小物體的識别精度。

借用一下ResNet的identity mappings示意圖

YOLOv2加上了一個Passthrough Layer來取得之前的某個2626分辨率的層的特征。這個Passthrough layer把26 * 26的特征圖與13 * 13的特征圖聯系在一起,把相鄰的特征堆積在不同的Channel之中,類似與Resnet的Identity Mapping,進而把2626512變成1313*2048。

route層起連接配接作用,reorg層來match特征圖尺寸。

代碼改變世界

繼續閱讀