论文地址:https://arxiv.org/pdf/1804.07931.pdf
论文还公开了数据集:数据集-阿里云天池
ABSTRACT
介绍了传统的CVR模型的两个缺点:
- 样本选择偏差,训练集的样本是产生点击的样本,但是我们线上用的却是在整个空间的推断,这会影响模型的泛化能力。trained with samples of clicked impressions while utilized to make inference on the entire space with samples of all impressions.It is only part of the inference space which is composed of all impressions.This causes a sample selection bias problem。SSB problem will hurt the generalization performance of trained models。
- 数据稀疏,CVR模型训练的样本空间是CTR样本空间的一部分,现实中会比训练CTR任务的样本少得多,而且若只在这个空间构造相关特征,由于数据的稀疏可能也会带来特征在统计意义上的不置信。data sparsity problem, data gathered for training CVR model is generally much less than CTR task.
模型充分利用了用户行为的顺序,即展示→ 点击→ 转化,模型也能解决上述传统CVR模型的缺点:
- 在全样本空间直接建模,?modeling CVR directly over the entire space
- 采用特征表示迁移学习策略,CTR网络和CVR网络共享特征表示。?employing a feature representation transfer learning strategy.
INTRODUCTION
在实验中用户行为的顺序遵循 展示→ 点击→ 转化,CVR建模单点击转化率,CVR modeling refers to the task of estimating the post-click conversion rate,
ESMM引入了两个辅助任务,预测单展示点击率任务post-view click-through rate (CTR)、预测单展示点击且产生转化的任务post-view clickthrough&conversion rate (CTCVR)。?ESMM将pCVR视为中间变量,即。pCTCVR和pCTR都是在整个展示空间上的估算,则导出的pCVR也适用于整个空间,这就消除了样本选择偏差的问题。ESMM treats pCVR as an intermediate variable which multiplied by pCTR equals to pCTCVR.?Both pCTCVR and pCTR are estimated over the entire space with samples of all impressions, thus the derived pCVR is also applicable over the entire space.?CVR网络和CTR网络共享表示特征的参数。CTR网络可以用更丰富的样本进行训练。这种参数转移学习有助于显著缓解DS问题。?parameters of feature representation of CVR network is shared with CTR network. The latter one is trained with much richer samples. This kind of parameter transfer learning? helps to alleviate the DS trouble remarkablely.整个数据集由89亿个带有点击和转换的顺序标签的样本组成。?The full dataset consists of 8.9 billions samples with sequential labels of click and conversion.
THE PROPOSED APPROACH
|