天天看點

TCP CDG算法與TCP Westwood算法

自從我修改了CDG算法以及Westwood算法之後,通過微信,QQ還有郵件與很多志同道合的朋友進行了交流,本文将這些交流的結論總結一下。

TCP CDG算法有多好?

很多人不屑于這個算法,包括社群的一些自诩清高的人,然而把問題細化的話,如果無法回答“TCP CDG算法有多不好?”這個問題,那就是無知了。CDG有多好取決于你的網絡品質,你的網絡有多大的機率是線路噪聲引發的丢包決定了這個問題的答案。

1.抗噪聲丢包

修改過的CDG對噪聲丢包的抵抗性非常強,在它明确判斷出是由于噪聲而發生丢包而不是由于擁塞丢包的時候,在重傳期間(TCP的快速恢複期間),仍然可以保持幾乎不變的視窗發送重傳資料段以及新資料。

        可以多說一句,CDG抗噪聲的能力甚至比BBR都好。首先BBR+RACK可能會在多個短連接配接流競争隊列時帶來頻繁的盲目重傳,其次,BBR可能根本就不适合普通營運商網絡,是以很多關于BBR的迷信都是盲目的。然而CDG由傳統的擁塞控制算法衍生而來(還包括Westwood也是這麼衍生過來的),非常容易切進沒有Google B4基因的國内營運商網絡。

2.公平性相關

公平收斂性是所有TCP擁塞控制算法必須遵守的戒律,凡違反該戒律的,都是傻逼,數不清的越界行為就是讓我越發惡心TCP的首要原因。CDG的公平性由其Backoff機制來保障,即在RTT持續增大但尚未丢包的期間,CDG預判此時開始有擁塞迹象,大家都開始搶節點隊列緩存了,此時CDG會主動随機退避降低發送速率,即降窗。然而這隻是CDG單方面的君子行為,在降窗期間,一個shadow視窗默默地繼續增加,如果再發生丢包,CDG就不再謙讓了,其不會像Reno或者CUBIC那樣把目前視窗降低一定比例,而是将已經在Backoff階段默默增長的shadow視窗降低一定比例作為新的擁塞視窗。

3.擁塞導緻的丢包處理

這裡你會發現,實際上CDG是在傳統的NewReno/CUBIC算法上做了以上兩項優化,首先它跟蹤RTT變化梯度以差別擁塞與噪聲,其次在噪聲情況下不降窗,而在擁塞未丢包情況下随機降窗并與此同時保持shadow增長。然而在擁塞丢包時,CDG的行為會像NewReno那樣乘性減半目前視窗并設定新的ssthresh值。

--------------------------------

上面描述的CDG使用場景看明白了嗎?如果你的網絡經常是擁塞發生丢包,那麼CDG的表現其實幾乎與NewReno無異,大量的測試資料已經表明NewReno已經是要被淘汰的算法了,CUBIC早就取代了NewReno。

        然而我接下來并無意要移植CUBIC的邏輯到CDG。

        注意分割線以上的第3點,這個第3點明确的抛出一個問題,顯然在擁塞丢包發生的時候,NewReno對這個問題回答的并不好,CUBIC似乎比NewReno要好些,但是顯然二者師出同門,直線變成了三次曲線而已,一個回答不好,能指望另一個回答好嗎,顯然這是徒勞的。

        Westwood算法完美回答了這個問題。

        接下來,讓我們簡單看一下Westwood。

        在CDG與Westwood的PK中,你可能會發現Westwood會持續性占據優勢,然而當你用tc工具設定x%丢包率(注意這是随機的丢包)的時候,CDG的優勢便表現了出來。這正是CDG發力的地方,然而一旦真正發生了擁塞而頻繁丢包,特别是在淺隊列環境中,Westwood算法将壓倒性超越CDG,即便我對CDG進行了修改,所占優勢也隻是略微。現在的問題是,是什麼機制讓Westwood可以如此坦然面對擁塞?!

        事實上,Westwood什麼新東西都沒有,它的核心就是精确定位擁塞丢包重傳恢複後視窗應該降低到多少(當然必須使用我修改過的版本),除此之外,沒别的什麼了。它是怎麼做到的呢?

4.[這裡前面的标号4沒有任何問題]Westwood在TCP連接配接采樣了最小的RTT,同時周期性采集帶寬,然後Westwood會認為二者的乘積就是不排隊時的BDP,即擁塞丢包發生後比較合理的擁塞視窗值,其實更加合理的是應該在一個時間視窗内采樣最小RTT和最大帶寬,以應對波動,然而在CDG和BBR之前,幾乎所有算法都是采用“移動指數平均”的方式來進行低通濾波,包括Fast TCP。現在,我們知道了,Westwood的核心在于“丢包并處理完重傳後cwnd以及ssthresh的設定”:

case CA_EVENT_COMPLETE_CWR:
    tp->snd_cwnd = tp->snd_ssthresh = tcp_westwood_bw_rttmin(sk);
    break;
           

遺憾的是,Westwood并無法區分丢包是由于噪聲引起的,還是由于擁塞引起的,是以即便是“加性減”,也扛不住頻繁加性減。幸運的是,CDG可以來補充!

        現在你應該已經知道了Westwood的應用場景,總結起來就是Westwood試圖用加性增,加性減來應對擁塞避免和快速恢複的過程。是以在正常的擁塞避免階段,Westwood同樣采用NewReno算法,這點和CDG是完全一樣的。

--------------------------------

你還在想用CUBIC替換Westwood或者CDG裡預設的NewReno算法嗎?嗯,這樣想是對的,但是收益不大。

        更加合理的做法難道不是将上面的1,2,3,4結合起來嗎?!如果抛開BBR的BW & RTT模型不論,其基本思路實際上就是CDG+Westwood的公平收斂版:

TCP CDG算法與TCP Westwood算法

比BBR稍微有點過頭,且好像丢失了乘性減的公平性,但我依然稱其為公平收斂版本。不管怎麼說,誰在乎呢?隻要保證自己不吃虧即可吧。

--------------------------------

經常有人覺得BBR非常複雜,事實上其特别簡單,同樣簡單的還有CDG和Westwood。

--------------------------------

溫州皮鞋廠老闆的測試結論非常好,我把初版代碼貼在本文的最後,特意修改了名字,該算法改名為bugs,非常兇險的一個名字。該代碼也放在 github上:

#include <linux/kernel.h>
#include <linux/random.h>
#include <linux/module.h>
#include <net/tcp.h>


#define HYSTART_ACK_TRAIN	1
#define HYSTART_DELAY		2
#define TCP_EASTWOOD_RTT_MIN   (HZ/20)	/* 50ms */
#define TCP_EASTWOOD_INIT_RTT  (20*HZ)	/* maybe too conservative?! */

static int window __read_mostly = 8;
static unsigned int backoff_beta __read_mostly = 0.7071 * 1024; /* sqrt 0.5 */
static unsigned int backoff_factor __read_mostly = 42;
static unsigned int hystart_detect __read_mostly = 3;
static unsigned int use_ineff __read_mostly = 5;
static unsigned int use_shadow __read_mostly = 1;
static unsigned int backoff __read_mostly = 0;
static unsigned int use_tolerance __read_mostly = 1;
static unsigned int shadow_fast __read_mostly = 1;
static unsigned int shadow_grow __read_mostly = 1;
static unsigned int recovery_restore __read_mostly = 0;
static unsigned int loss_push __read_mostly = 1;
static unsigned int use_sack __read_mostly = 1;

module_param(window, int, 0444);
MODULE_PARM_DESC(window, "gradient window size (power of two <= 256)");
module_param(backoff_beta, uint, 0644);
MODULE_PARM_DESC(backoff_beta, "backoff beta (0-1024)");
module_param(backoff_factor, uint, 0644);
MODULE_PARM_DESC(backoff_factor, "backoff probability scale factor");
module_param(hystart_detect, uint, 0644);
MODULE_PARM_DESC(hystart_detect, "use Hybrid Slow start "
		 "(0: disabled, 1: ACK train, 2: delay threshold, 3: both)");
module_param(use_ineff, uint, 0644);
MODULE_PARM_DESC(use_ineff, "use ineffectual backoff detection (threshold)");
module_param(use_shadow, uint, 0644);
MODULE_PARM_DESC(use_shadow, "use shadow window heuristic");
module_param(backoff, uint, 0644);
MODULE_PARM_DESC(backoff, "back");
module_param(use_tolerance, uint, 0644);
MODULE_PARM_DESC(use_tolerance, "use loss tolerance heuristic");
module_param(shadow_fast, uint, 0644);
MODULE_PARM_DESC(shadow_fast, "back");
module_param(shadow_grow, uint, 0644);
MODULE_PARM_DESC(shadow_grow, "back");
module_param(recovery_restore, uint, 0644);
MODULE_PARM_DESC(recovery_restore, "back");
module_param(loss_push, uint, 0644);
MODULE_PARM_DESC(loss_push, "back");
module_param(use_sack, uint, 0644);
MODULE_PARM_DESC(use_sack, "back");

struct bugs_minmax {
    union {
	struct {
	    s32 min;
	    s32 max;
	};
	u64 v64;
    };
};

enum bugs_state {
    BU_GS_UNKNOWN = 0,
    BU_GS_NONFULL = 1,
    BU_GS_FULL    = 2,
    BU_GS_BACKOFF = 3,
};

struct bugs {
    struct bugs_minmax rtt;
    struct bugs_minmax rtt_prev;
    struct bugs_minmax *gradients;
    struct bugs_minmax gsum;
    bool gfilled;
    u8  tail;
    u8  state;
    u8  delack;
    u8     first_ack;        /* flag which infers that this is the first ack */
    u8     reset_rtt_min;    /* Reset RTT min to next RTT sample*/
    u32 rtt_seq;
    u32 undo_cwnd;
    u32 shadow_wnd;
    u32 snd_cwnd_cnt;
    u16 backoff_cnt;
    u16 sample_cnt;
    s32 delay_min;
    s32 ack_sack_cnt;
    u32 last_ack;
    u32 round_start;

    u32    bw_ns_est;        /* first bandwidth estimation..not too smoothed 8) */
    u32    bw_est;           /* bandwidth estimate */
    u32    rtt_win_sx;       /* here starts a new evaluation... */
    u32    bk;
    u32    snd_una;          /* used for evaluating the number of acked bytes */
    u32    cumul_ack;
    u32    accounted;
    u32    east_rtt;
    u32    rtt_min;          /* minimum observed RTT */
};

/**
 * nexp_u32 - negative base-e exponential
 * @ux: x in units of micro
 *
 * Returns exp(ux * -1e-6) * U32_MAX.
 */
static u32 __pure nexp_u32(u32 ux)
{
    static const u16 v[] = {
	/* exp(-x)*65536-1 for x = 0, 0.000256, 0.000512, ... */
	65535,
	65518, 65501, 65468, 65401, 65267, 65001, 64470, 63422,
	61378, 57484, 50423, 38795, 22965, 8047,  987,   14,
    };
    u32 msb = ux >> 8;
    u32 res;
    int i;
    /* Cut off when ux >= 2^24 (actual result is <= 222/U32_MAX). */
    if (msb > U16_MAX)
	return 0;

    /* Scale first eight bits linearly: */
    res = U32_MAX - (ux & 0xff) * (U32_MAX / 1000000);

    /* Obtain e^(x + y + ...) by computing e^x * e^y * ...: */
    for (i = 1; msb; i++, msb >>= 1) {
	u32 y = v[i & -(msb & 1)] + U32_C(1);

	res = ((u64)res * y) >> 16;
    }

    return res;
}

/* Based on the HyStart algorithm (by Ha et al.) that is implemented in
 * tcp_cubic. Differences/experimental changes:
 *   o Using Hayes' delayed ACK filter.
 *   o Using a usec clock for the ACK train.
 *   o Reset ACK train when application limited.
 *   o Invoked at any cwnd (i.e. also when cwnd < 16).
 *   o Invoked only when cwnd < ssthresh (i.e. not when cwnd == ssthresh).
 */
static void tcp_bugs_hystart_update(struct sock *sk)
{
    struct bugs *ca = inet_csk_ca(sk);
    struct tcp_sock *tp = tcp_sk(sk);

    ca->delay_min = min_not_zero(ca->delay_min, ca->rtt.min);
    if (ca->delay_min == 0)
	return;

    if (hystart_detect & HYSTART_ACK_TRAIN) {
	u32 now_us = div_u64(local_clock(), NSEC_PER_USEC);

	if (ca->last_ack == 0 || !tcp_is_cwnd_limited(sk, tcp_packets_in_flight(tp))) {
	    ca->last_ack = now_us;
	    ca->round_start = now_us;
	} else if (before(now_us, ca->last_ack + 3000)) {
	    u32 base_owd = max(ca->delay_min / 2U, 125U);

	    ca->last_ack = now_us;
	    if (after(now_us, ca->round_start + base_owd)) {
		tp->snd_ssthresh = tp->snd_cwnd;
		return;
	    }
	}
    }

    if (hystart_detect & HYSTART_DELAY) {
	if (ca->sample_cnt < 8) {
	    ca->sample_cnt++;
	} else {
	    s32 thresh = max(ca->delay_min + ca->delay_min / 8U, 125U);

	    if (ca->rtt.min > thresh) {
		tp->snd_ssthresh = tp->snd_cwnd;
	    }
	}
    }
}

static s32 tcp_bugs_grad(struct bugs *ca)
{
    s32 gmin = ca->rtt.min - ca->rtt_prev.min;
    s32 gmax = ca->rtt.max - ca->rtt_prev.max;
    s32 grad;

    if (ca->gradients) {
	ca->gsum.min += gmin - ca->gradients[ca->tail].min;
	ca->gsum.max += gmax - ca->gradients[ca->tail].max;
	ca->gradients[ca->tail].min = gmin;
	ca->gradients[ca->tail].max = gmax;
	ca->tail = (ca->tail + 1) & (window - 1);
	gmin = ca->gsum.min;
	gmax = ca->gsum.max;
    }

    /* We keep sums to ignore gradients during cwnd reductions;
     * the paper's smoothed gradients otherwise simplify to:
     * (rtt_latest - rtt_oldest) / window.
     *
     * We also drop division by window here.
     */
    grad = gmin > 0 ? gmin : gmax;

    /* Extrapolate missing values in gradient window: */
    if (!ca->gfilled) {
	if (!ca->gradients && window > 1)
	    grad *= window; /* Memory allocation failed. */
	else if (ca->tail == 0)
	    ca->gfilled = true;
	else
	    grad = (grad * window) / (int)ca->tail;
    }

    /* Backoff was effectual: */
    if (gmin <= -32 || gmax <= -32)
	ca->backoff_cnt = 0;

    if (use_tolerance) {
	/* Reduce small variations to zero: */
	gmin = DIV_ROUND_CLOSEST(gmin, 64);
	gmax = DIV_ROUND_CLOSEST(gmax, 64);
	if (gmin > 0 && gmax <= 0)
	    ca->state = BU_GS_FULL;
	else if ((gmin > 0 && gmax > 0) || gmax < 0)
	    ca->state = BU_GS_NONFULL;
    }
    return grad;
}

void tcp_enter_cwr_1(struct sock *sk)
{
    struct tcp_sock *tp = tcp_sk(sk);

    tp->prior_ssthresh = 0;
    if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR) {
	tp->undo_marker = 0;
	tp->high_seq = tp->snd_nxt;
	tp->tlp_high_seq = 0;
	tp->snd_cwnd_cnt = 0;
	tp->prior_cwnd = tp->snd_cwnd;
	tp->prr_delivered = 0;
	tp->prr_out = 0;
	tp->snd_ssthresh = inet_csk(sk)->icsk_ca_ops->ssthresh(sk);
	if (tp->ecn_flags & TCP_ECN_OK)
	    tp->ecn_flags |= TCP_ECN_QUEUE_CWR;
	tcp_set_ca_state(sk, TCP_CA_CWR);
    }
}

static bool tcp_bugs_backoff(struct sock *sk, u32 grad)
{
    struct bugs *ca = inet_csk_ca(sk);
    struct tcp_sock *tp = tcp_sk(sk);

    if (prandom_u32() <= nexp_u32(grad * backoff_factor))
	return false;

    if (use_ineff) {
	ca->backoff_cnt++;
	if (ca->backoff_cnt > use_ineff)
	    return false;
    }

    ca->shadow_wnd = max(ca->shadow_wnd, tp->snd_cwnd);
    ca->state = BU_GS_BACKOFF;
    tcp_enter_cwr_1(sk);
    return true;
}

void tcp_cong_avoid_ai_shadow(struct sock *sk, u32 w, u32 acked)
{
    struct tcp_sock *tp = tcp_sk(sk);
    struct bugs *ca = inet_csk_ca(sk);
    if (ca->snd_cwnd_cnt >= w) {
	ca->snd_cwnd_cnt = 0;
	ca->shadow_wnd ++;
    }

    ca->snd_cwnd_cnt += acked;
    if (ca->snd_cwnd_cnt >= w) {
	u32 delta = ca->snd_cwnd_cnt / w;

	ca->snd_cwnd_cnt -= delta * w;
	ca->shadow_wnd += delta;
    }
    ca->shadow_wnd = min(ca->shadow_wnd, tp->snd_cwnd_clamp);
}

/* Not called in CWR or Recovery state. */
static void tcp_bugs_cong_avoid(struct sock *sk, u32 ack, u32 acked)
{
    struct bugs *ca = inet_csk_ca(sk);
    struct tcp_sock *tp = tcp_sk(sk);
    u32 prior_snd_cwnd;
    u32 incr;

    if (tp->snd_cwnd <= tp->snd_ssthresh && hystart_detect)
	tcp_bugs_hystart_update(sk);

    if (after(ack, ca->rtt_seq) && ca->rtt.v64) {
	s32 grad = 0;

	if (ca->rtt_prev.v64)
	    grad = tcp_bugs_grad(ca);
	ca->rtt_seq = tp->snd_nxt;
	ca->rtt_prev = ca->rtt;
	ca->rtt.v64 = 0;
	ca->last_ack = 0;
	ca->sample_cnt = 0;

	if (backoff && grad > 0 && tcp_bugs_backoff(sk, grad))
	    return;
    }

    if (!tcp_is_cwnd_limited(sk, tcp_packets_in_flight(tp))) {
	ca->shadow_wnd = min(ca->shadow_wnd, tp->snd_cwnd);
	return;
    }

    prior_snd_cwnd = tp->snd_cwnd;
    tcp_reno_cong_avoid(sk, ack, acked);

    incr = tp->snd_cwnd - prior_snd_cwnd;
    ca->shadow_wnd = max(ca->shadow_wnd, ca->shadow_wnd + incr);
}

static void tcp_bugs_acked(struct sock *sk, u32 num_acked, s32 rtt_us)
{
    struct bugs *ca = inet_csk_ca(sk);
    struct tcp_sock *tp = tcp_sk(sk);

    if (rtt_us <= 0)
	return;

    /* A heuristic for filtering delayed ACKs, adapted from:
     * D.A. Hayes. "Timing enhancements to the FreeBSD kernel to support
     * delay and rate based TCP mechanisms." TR 100219A. CAIA, 2010.
     */
    if (tp->sacked_out == 0) {
	if (num_acked == 1 && ca->delack) {
	/* A delayed ACK is only used for the minimum if it is
	 * provenly lower than an existing non-zero minimum.
	 */
	ca->rtt.min = min(ca->rtt.min, rtt_us);
	ca->delack--;
	return;
	} else if (num_acked > 1 && ca->delack < 5) {
	    ca->delack++;
	}
    }

    ca->rtt.min = min_not_zero(ca->rtt.min, rtt_us);
    ca->rtt.max = max(ca->rtt.max, rtt_us);

    if (rtt_us > 0)
	ca->east_rtt = usecs_to_jiffies(rtt_us);
}

static u32 tcp_bugs_ssthresh(struct sock *sk)
{
    struct bugs *ca = inet_csk_ca(sk);
    struct tcp_sock *tp = tcp_sk(sk);

    ca->undo_cwnd = tp->snd_cwnd;
    ca->snd_cwnd_cnt = 0;
    ca->ack_sack_cnt = tcp_packets_in_flight(tp);

    if (ca->state == BU_GS_BACKOFF)
	return max(2U, (tp->snd_cwnd * min(1024U, backoff_beta)) >> 10);

    if (ca->state == BU_GS_NONFULL && use_tolerance)
	return tp->snd_cwnd;

    ca->shadow_wnd = max(min(ca->shadow_wnd >> 1, tp->snd_cwnd), 2U);
    if (use_shadow)
	return max3(2U, ca->shadow_wnd, tp->snd_cwnd >> 1);
    return max(2U, tp->snd_cwnd >> 1);
}

static u32 tcp_bugs_undo_cwnd(struct sock *sk)
{
    struct bugs *ca = inet_csk_ca(sk);
    struct tcp_sock *tp = tcp_sk(sk);
    return max3(2U, ca->shadow_wnd, max(tp->snd_cwnd, ca->undo_cwnd));
}

static void tcp_bugs_init(struct sock *sk)
{
    struct bugs *ca = inet_csk_ca(sk);
    struct tcp_sock *tp = tcp_sk(sk);

    if (window > 1)
	ca->gradients = kcalloc(window, sizeof(ca->gradients[0]),
					GFP_NOWAIT | __GFP_NOWARN);
    ca->rtt_seq = tp->snd_nxt;
    ca->shadow_wnd = tp->snd_cwnd;
    ca->ack_sack_cnt = 0;
    // eastwood
    ca->bk = 0;
    ca->bw_ns_est = 0;
    ca->bw_est = 0;
    ca->accounted = 0;
    ca->cumul_ack = 0;
    ca->reset_rtt_min = 1;
    ca->rtt_min = ca->east_rtt = TCP_EASTWOOD_INIT_RTT;
    ca->rtt_win_sx = tcp_time_stamp;
    ca->snd_una = tcp_sk(sk)->snd_una;
    ca->first_ack = 1;
}

static void tcp_bugs_release(struct sock *sk)
{
    struct bugs *ca = inet_csk_ca(sk);

    kfree(ca->gradients);
}

static inline u32 eastwood_do_filter(u32 a, u32 b)
{
    return ((7 * a) + b) >> 3;
}

static void eastwood_filter(struct bugs *ca, u32 delta)
{
    if (ca->bw_ns_est == 0 && ca->bw_est == 0) {
	ca->bw_ns_est = ca->bk / delta;
	ca->bw_est = ca->bw_ns_est;
    } else {
	ca->bw_ns_est = eastwood_do_filter(ca->bw_ns_est, ca->bk / delta);
	ca->bw_est = eastwood_do_filter(ca->bw_est, ca->bw_ns_est);
    }
}

static void eastwood_update_window(struct sock *sk)
{
    struct bugs *ca = inet_csk_ca(sk);
    s32 delta = tcp_time_stamp - ca->rtt_win_sx;

    if (ca->first_ack) {
	ca->snd_una = tcp_sk(sk)->snd_una;
	ca->first_ack = 0;
    }

    if (ca->east_rtt && delta > max_t(u32, ca->east_rtt, TCP_EASTWOOD_RTT_MIN)) {
	eastwood_filter(ca, delta);
	ca->bk = 0;
	ca->rtt_win_sx = tcp_time_stamp;
    }
}

static inline void update_rtt_min(struct bugs *ca)
{
    if (ca->reset_rtt_min) {
	ca->rtt_min = ca->east_rtt;
	ca->reset_rtt_min = 0;
    } else
	ca->rtt_min = min(ca->east_rtt, ca->rtt_min);
}

static inline void eastwood_fast_bw(struct sock *sk)
{
    const struct tcp_sock *tp = tcp_sk(sk);
    struct bugs *ca = inet_csk_ca(sk);
    
    eastwood_update_window(sk);

    ca->bk += tp->snd_una - ca->snd_una;
    ca->snd_una = tp->snd_una;
    update_rtt_min(ca);
}

static inline u32 eastwood_acked_count(struct sock *sk)
{
    const struct tcp_sock *tp = tcp_sk(sk);
    struct bugs *ca = inet_csk_ca(sk);

    ca->cumul_ack = tp->snd_una - ca->snd_una;

    if (!ca->cumul_ack) {
	ca->accounted += tp->mss_cache;
	ca->cumul_ack = tp->mss_cache;
    }

    if (ca->cumul_ack > tp->mss_cache) {
	/* Partial or delayed ack */
	if (ca->accounted >= ca->cumul_ack) {
	    ca->accounted -= ca->cumul_ack;
	    ca->cumul_ack = tp->mss_cache;
	} else {
	    ca->cumul_ack -= ca->accounted;
	    ca->accounted = 0;
	}
    }
    ca->snd_una = tp->snd_una;
    return ca->cumul_ack;
}

static u32 tcp_eastwood_bw_rttmin(const struct sock *sk)
{
    const struct tcp_sock *tp = tcp_sk(sk);
    const struct bugs *ca = inet_csk_ca(sk);
    return max_t(u32, (ca->bw_est * ca->rtt_min) / tp->mss_cache, 2);
}

static int bugs_main(struct sock *sk, struct rate_sample *rs)
{
    struct inet_connection_sock *icsk = inet_csk(sk);
    struct tcp_sock *tp = tcp_sk(sk);
    struct bugs *ca = inet_csk_ca(sk);
	
    if (!shadow_grow) {
	goto eastwood;
    }
		
    if (icsk->icsk_ca_state != TCP_CA_Open) {
	if (rs->rtt_us) {
	    ca->rtt.min = min_not_zero(ca->rtt.min, (s32)rs->rtt_us);
	    ca->rtt.max = max(ca->rtt.max, (s32)rs->rtt_us);
	}
		
	if (ca->state == BU_GS_NONFULL && use_tolerance) {	
	    if (!shadow_fast && (ca->ack_sack_cnt < 0 || ca->ack_sack_cnt == 0) && ca->rtt.v64) {
		s32 grad = 0;

		if (ca->rtt_prev.v64)
		    grad = tcp_bugs_grad(ca);
		ca->rtt_prev = ca->rtt;
		ca->ack_sack_cnt = tcp_packets_in_flight(tp);
		ca->rtt.v64 = 0;
	    }
	    ca->ack_sack_cnt -= rs->acked_sacked;
	    if (ca->state == BU_GS_NONFULL || shadow_fast) {
		tcp_cong_avoid_ai_shadow(sk, ca->shadow_wnd, rs->acked_sacked);	
		tp->snd_cwnd = ca->shadow_wnd;
	    }
	}
    } 

eastwood:
    if (use_sack) {
	eastwood_update_window(sk);	
	ca->bk += rs->acked_sacked * tp->mss_cache;
	update_rtt_min(ca);
    }
    return CONG_CONT;
}

static void bugs_state(struct sock *sk, u8 new_state)
{
    struct inet_connection_sock *icsk = inet_csk(sk);
    struct bugs *ca = inet_csk_ca(sk);
    struct tcp_sock *tp = tcp_sk(sk);

    if (!recovery_restore) {
	return;
    }

    if (icsk->icsk_ca_state && new_state == TCP_CA_Open) {
	ca->rtt_seq = tp->snd_nxt;
	ca->rtt_prev = ca->rtt;
	ca->rtt.v64 = 0;
	ca->state = BU_GS_UNKNOWN;
    }
    if (new_state == TCP_CA_Open && ca->state == BU_GS_NONFULL) {
	tp->snd_cwnd = max(max(tp->snd_cwnd, ca->shadow_wnd), 2U);
    }
    if (new_state == TCP_CA_Loss) {
	if (ca->state == BU_GS_NONFULL && use_tolerance) {
	    tp->snd_cwnd = ca->shadow_wnd;
	    if (loss_push)
		tcp_push_pending_frames(sk);
	} 
    }
}

static void tcp_bugs_cwnd_event(struct sock *sk, const enum tcp_ca_event ev)
{
    struct bugs *ca = inet_csk_ca(sk);
    struct tcp_sock *tp = tcp_sk(sk);
    struct bugs_minmax *gradients;

    switch (ev) {
	case CA_EVENT_FAST_ACK:
	    if (!use_sack)
		eastwood_fast_bw(sk);
	    break;
	case CA_EVENT_SLOW_ACK:
	    if (!use_sack) {
		eastwood_update_window(sk);
		ca->bk += eastwood_acked_count(sk);
		update_rtt_min(ca);
	    }
	    break;
	case CA_EVENT_CWND_RESTART:
	    gradients = ca->gradients;
	    if (gradients)
		memset(gradients, 0, window * sizeof(gradients[0]));
	    memset(ca, 0, sizeof(*ca));

	    ca->gradients = gradients;
	    ca->rtt_seq = tp->snd_nxt;
	    ca->shadow_wnd = tp->snd_cwnd;
	    break;
	case CA_EVENT_COMPLETE_CWR:
	    tp->snd_cwnd = tp->snd_ssthresh = tcp_eastwood_bw_rttmin(sk);
	    break;
	case CA_EVENT_LOSS:
	    tp->snd_ssthresh = tcp_eastwood_bw_rttmin(sk);
	    ca->reset_rtt_min = 1;
	    break;
	default:
	    break;
    }
}


struct tcp_congestion_ops tcp_bugs __read_mostly = {
    .cong_avoid = tcp_bugs_cong_avoid,
    .cong_collect	= bugs_main,
    .set_state = bugs_state,
    .cwnd_event = tcp_bugs_cwnd_event,
    .pkts_acked = tcp_bugs_acked,
    .undo_cwnd = tcp_bugs_undo_cwnd,
    .ssthresh = tcp_bugs_ssthresh,
    .release = tcp_bugs_release,
    .init = tcp_bugs_init,
    .owner = THIS_MODULE,
    .name = "bugs",
};

static int __init tcp_bugs_register(void)
{
    if (backoff_beta > 1024 || window < 1 || window > 256)
	return -ERANGE;
    if (!is_power_of_2(window))
	return -EINVAL;

    BUILD_BUG_ON(sizeof(struct bugs) > ICSK_CA_PRIV_SIZE);
    tcp_register_congestion_control(&tcp_bugs);
    return 0;
}

static void __exit tcp_bugs_unregister(void)
{
    tcp_unregister_congestion_control(&tcp_bugs);
}

module_init(tcp_bugs_register);
module_exit(tcp_bugs_unregister);
MODULE_AUTHOR("CAO NI MA");
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("TCP BU_GS");
           

繼續閱讀