天天看点

a/b测试_A / B测试还不够

a/b测试

A / B测试还不够 (A/B test is not enough)

There is a common opinion that A/B test is a universal, half-automatic tool that always helps to increase conversion, loyalty and UX. However misinterpretation of results or wrong sampling leads to the loss of loyal audience and decrease of margin. Why? A/B is based on the basic assumption that this sample is homogeneous and representative, scalability of results. In reality, the audience is heterogeneous — recall the “20/80” distribution for income. Heterogeneity means that sensitivity to A/B varies significantly within the sample.

普遍认为,A / B测试是一种通用的半自动工具,总是有助于提高转化率,忠诚度和用户体验。 然而,对结果的误解或错误的采样会导致忠诚的受众群体的流失和利润率的下降。 为什么? A / B基于以下基本假设:该样本是同质且具有代表性的结果可伸缩性。 实际上,受众是不同的-回忆收入的“ 20/80”分配。 异质性意味着样品中对A / B的敏感性差异很大。

Audience clustering is a real effect (rule, not exception according to Pareto), which means the presence of different psychological profile groups of clients in one pool. Evaluation of conversion confidence interval implies uniformity. Therefore, violation of these criteria means that the accuracy of results is immeasurable. Result without accuracy is garbage. Each unique psychological profile reacts with different sensitivity to the campaign or feature. We assume that a profile is a unique set of features. For simplicity, two profile sets X and Y may be considered. Some features of several profiles may intersect – your girlfriend loves coffee and chocolate as well. Let us illustrate this effect in form of three topologies:

受众群体聚类是一种真实的效果(规则,根据Pareto而言并非例外),这意味着在一个池中存在不同的客户心理档案组。 评估转换置信区间意味着一致性。 因此,违反这些标准意味着结果的准确性是无法衡量的。 没有准确性的结果就是垃圾。 每个独特的心理特征对活动或功能的React都不同。 我们假设配置文件是一组独特的功能。 为简单起见,可以考虑两个轮廓集X和Y。 多个配置文件的某些功能可能会相交-您的女友也喜欢咖啡和巧克力。 让我们以三种拓扑的形式说明这种影响:

a/b测试_A / B测试还不够

By default, we assume that we cover all segments at once — Сase I. Case II and III involve non-trivial scenarios. Consider a typical scenario of Сase II. Conversion increased significantly – Y set shows a positive reaction, while X gave a negative reaction and negative NPS change. Y set is larger in the random sample with no weights, so the cumulative effect is positive. Conversion increased twice. Now imagine that the average check of X is 10 times higher and the conversion of segment X has fallen by half. Finally: increase of conversion, loss of audience, profit decline. The problem is aggravated by intuitive tricks. Sometimes automotive models tests the hypothesis on segment X (Сase III) and try to generalize to the union (X + Y). What is wrong? The sampling technique does not take into account segmentation. Solutions?

默认情况下,我们假设我们一次涵盖所有细分市场-案例I。案例II和III涉及非平凡的场景。 考虑第二种情况。 转化率显着提高-Y设定为正React,而X则为负React,NPS变化为负。 在没有权重的随机样本中,Y set较大,因此累积效果为正。 转化次数增加了两倍。 现在想象一下,X的平均支票要高出10倍,而细分X的转化率下降了一半。 最后:转化次数增加,受众减少,利润下降。 直观的技巧使问题更加严重。 有时,汽车模型会测试关于X细分(假设III)的假设,并尝试推广到并集(X + Y)。 怎么了? 采样技术未考虑细分。 解决方案?

  • 方式1 (Way # 1)

    . Cluster the audience using k-means, other ML models, or RFM analysis. You need to know the hyperparameter — the number of groups as the input. Its definition is not trivial. The next step is to determine individual conversion of the segment. Personalize the campaign — offer A or B script, depending on profile.

    。 使用k均值,其他ML模型或RFM分析对受众进行聚类。 您需要知道超参数-输入的组数。 其定义并非无关紧要。 下一步是确定细分受众群的单独转化。 个性化广告系列-根据配置文件提供A或B脚本。

  • 方式#2 (Way # 2)

    . Measure A/B margin. Recall that margin is the product of conversion, traffic, and the average price. The last two parameters can be fixed by selecting a separate category of goods and choosing a uniform traffic period – slow parameters. You may increase the discreteness of traffic measurement (every Monday for a month) to reduce the random component.

    。 测量A / B余量。 回想一下,保证金是转化率,点击量和平ASP格的乘积。 可以通过选择单独的商品类别并选择统一的运输时间段-慢速参数来固定最后两个参数。 您可以增加流量测量的离散度(每个月的每个星期一),以减少随机分量。

  • 方式#3 (Way # 3)

    . Stability analysis. Sampling with replacement is used in this case. All segments are considered. The sample size is gradually increased. Log-Log representation of conversion vice sample size gives the regression slope (Hurst factor). It provides understanding of uniformity and renorm stability.

    。 稳定性分析。 在这种情况下,使用替换采样。 考虑所有细分。 样本数量逐渐增加。 转换副样本大小的对数-对数表示给出了回归斜率(赫斯特因子)。 它提供了对一致性和规范稳定性的理解。

However. No matter what path you choose, the audience will change with higher frequency. This means that the A/B test is a regularly repeated experiment. An experiment that should be supervised by an experienced analyst despite a significant number of commercial automated solutions. Do not forget that all models are wrong, but some are temporarily useful…under certain conditions.

然而。 无论您选择哪种方式,观众都会以更高的频率发生变化。 这意味着A / B测试是定期重复的实验。 尽管有大量的商业自动化解决方案,但仍应由经验丰富的分析师进行监督的实验。 不要忘记所有模型都是错误的,但是某些模型在某些情况下暂时有用。

Dedicated to my father who taught me that intuition is just as important as math

献给父亲,父亲教我直觉和数学一样重要

翻译自: https://habr.com/en/post/468329/

a/b测试