天天看點

a/b測試_A / B測試還不夠

a/b測試

A / B測試還不夠 (A/B test is not enough)

There is a common opinion that A/B test is a universal, half-automatic tool that always helps to increase conversion, loyalty and UX. However misinterpretation of results or wrong sampling leads to the loss of loyal audience and decrease of margin. Why? A/B is based on the basic assumption that this sample is homogeneous and representative, scalability of results. In reality, the audience is heterogeneous — recall the “20/80” distribution for income. Heterogeneity means that sensitivity to A/B varies significantly within the sample.

普遍認為,A / B測試是一種通用的半自動工具,總是有助于提高轉化率,忠誠度和使用者體驗。 然而,對結果的誤解或錯誤的采樣會導緻忠誠的閱聽人群體的流失和利潤率的下降。 為什麼? A / B基于以下基本假設:該樣本是同質且具有代表性的結果可伸縮性。 實際上,閱聽人是不同的-回憶收入的“ 20/80”配置設定。 異質性意味着樣品中對A / B的敏感性差異很大。

Audience clustering is a real effect (rule, not exception according to Pareto), which means the presence of different psychological profile groups of clients in one pool. Evaluation of conversion confidence interval implies uniformity. Therefore, violation of these criteria means that the accuracy of results is immeasurable. Result without accuracy is garbage. Each unique psychological profile reacts with different sensitivity to the campaign or feature. We assume that a profile is a unique set of features. For simplicity, two profile sets X and Y may be considered. Some features of several profiles may intersect – your girlfriend loves coffee and chocolate as well. Let us illustrate this effect in form of three topologies:

閱聽人群體聚類是一種真實的效果(規則,根據Pareto而言并非例外),這意味着在一個池中存在不同的客戶心理檔案組。 評估轉換置信區間意味着一緻性。 是以,違反這些标準意味着結果的準确性是無法衡量的。 沒有準确性的結果就是垃圾。 每個獨特的心理特征對活動或功能的React都不同。 我們假設配置檔案是一組獨特的功能。 為簡單起見,可以考慮兩個輪廓集X和Y。 多個配置檔案的某些功能可能會相交-您的女友也喜歡咖啡和巧克力。 讓我們以三種拓撲的形式說明這種影響:

a/b測試_A / B測試還不夠

By default, we assume that we cover all segments at once — Сase I. Case II and III involve non-trivial scenarios. Consider a typical scenario of Сase II. Conversion increased significantly – Y set shows a positive reaction, while X gave a negative reaction and negative NPS change. Y set is larger in the random sample with no weights, so the cumulative effect is positive. Conversion increased twice. Now imagine that the average check of X is 10 times higher and the conversion of segment X has fallen by half. Finally: increase of conversion, loss of audience, profit decline. The problem is aggravated by intuitive tricks. Sometimes automotive models tests the hypothesis on segment X (Сase III) and try to generalize to the union (X + Y). What is wrong? The sampling technique does not take into account segmentation. Solutions?

預設情況下,我們假設我們一次涵蓋所有細分市場-案例I。案例II和III涉及非平凡的場景。 考慮第二種情況。 轉化率顯着提高-Y設定為正React,而X則為負React,NPS變化為負。 在沒有權重的随機樣本中,Y set較大,是以累積效果為正。 轉化次數增加了兩倍。 現在想象一下,X的平均支票要高出10倍,而細分X的轉化率下降了一半。 最後:轉化次數增加,閱聽人減少,利潤下降。 直覺的技巧使問題更加嚴重。 有時,汽車模型會測試關于X細分(假設III)的假設,并嘗試推廣到并集(X + Y)。 怎麼了? 采樣技術未考慮細分。 解決方案?

  • 方式1 (Way # 1)

    . Cluster the audience using k-means, other ML models, or RFM analysis. You need to know the hyperparameter — the number of groups as the input. Its definition is not trivial. The next step is to determine individual conversion of the segment. Personalize the campaign — offer A or B script, depending on profile.

    。 使用k均值,其他ML模型或RFM分析對閱聽人進行聚類。 您需要知道超參數-輸入的組數。 其定義并非無關緊要。 下一步是确定細分閱聽人群的單獨轉化。 個性化廣告系列-根據配置檔案提供A或B腳本。

  • 方式#2 (Way # 2)

    . Measure A/B margin. Recall that margin is the product of conversion, traffic, and the average price. The last two parameters can be fixed by selecting a separate category of goods and choosing a uniform traffic period – slow parameters. You may increase the discreteness of traffic measurement (every Monday for a month) to reduce the random component.

    。 測量A / B餘量。 回想一下,保證金是轉化率,點選量和平ASP格的乘積。 可以通過選擇單獨的商品類别并選擇統一的運輸時間段-慢速參數來固定最後兩個參數。 您可以增加流量測量的離散度(每個月的每個星期一),以減少随機分量。

  • 方式#3 (Way # 3)

    . Stability analysis. Sampling with replacement is used in this case. All segments are considered. The sample size is gradually increased. Log-Log representation of conversion vice sample size gives the regression slope (Hurst factor). It provides understanding of uniformity and renorm stability.

    。 穩定性分析。 在這種情況下,使用替換采樣。 考慮所有細分。 樣本數量逐漸增加。 轉換副樣本大小的對數-對數表示給出了回歸斜率(赫斯特因子)。 它提供了對一緻性和規範穩定性的了解。

However. No matter what path you choose, the audience will change with higher frequency. This means that the A/B test is a regularly repeated experiment. An experiment that should be supervised by an experienced analyst despite a significant number of commercial automated solutions. Do not forget that all models are wrong, but some are temporarily useful…under certain conditions.

然而。 無論您選擇哪種方式,觀衆都會以更高的頻率發生變化。 這意味着A / B測試是定期重複的實驗。 盡管有大量的商業自動化解決方案,但仍應由經驗豐富的分析師進行監督的實驗。 不要忘記所有模型都是錯誤的,但是某些模型在某些情況下暫時有用。

Dedicated to my father who taught me that intuition is just as important as math

獻給父親,父親教我直覺和數學一樣重要

翻譯自: https://habr.com/en/post/468329/

a/b測試