前言
在早期的基于帧内预测模式(IPM)的H.265/HEVC视频中,大多是基于自定义的一些映射规则来将IPM与二进制的隐秘信息做相互映射,以此来修改IPM达到嵌入隐秘信息的目的。
以下论文都可在“中国知网”中查到: [1]王家骥,王让定,李伟,徐达文,严迪群.一种基于帧内预测模式的HEVC视频信息隐藏算法[J].光电子.激光,2014,25(08):1578-1585.DOI:10.16136/j.joel.2014.08.018. [2]王家骥,王让定,李伟,徐达文,徐健.HEVC帧内预测模式和分组码的视频信息隐藏[J].光电子·激光,2015,26(05):942-950.DOI:10.16136/j.joel.2015.05.0954. [3]徐健,王让定,黄美玲,李倩,徐达文.一种基于预测模式差值的HEVC信息隐藏算法[J].光电子·激光,2015,26(09):1753-1760.DOI:10.16136/j.joel.2015.09.0322.
后来,受益于图像的自适应隐写算法研究,视频隐写也逐渐从非自适应隐写向自适应隐写开始发展:
以下论文都可以在“IEEE Xplore”中查到: [1] Yi Dong, Xinghao Jiang, Tanfeng Sun, Dawen Xu. Coding Efficiency Preserving Steganography Based on HEVC Steganographic Channel Model[C]// IWDW 2017: 149-162 [2] Dong, Y., Sun, T., Jiang, X. A High Capacity HEVC Steganographic Algorithm Using Intra Prediction Modes in Multi-sized Prediction Blocks[C]// IWDW 2018. https://doi.org/10.1007/978-3-030-11389-6_18 [3] Y. Wang, Y. Cao, X. Zhao, Z. Xu, and M. Zhu. Maintaining rate distortion optimization for ipm-based video steganography by constructing isolated channels in HEVC[C]// Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia Security. ACM, 2018, pp. 97–107. [4] Y. Dong, X. Jiang, Z. Li, T. Sun and Z. Zhang. Multi-Channel HEVC Steganography by Minimizing IPM Steganographic Distortions[J]. IEEE Transactions on Multimedia, doi: 10.1109/TMM.2022.3150180.
这些自适应的IPM隐写算法,大多是根据IPM的率失真值来进行失真代价的计算的,在前段时间做代码的复现工作时遇到了一些坑,这里记录一下。下面将以第一篇([1] Yi Dong, Xinghao Jiang, Tanfeng Sun, Dawen Xu. Coding Efficiency Preserving Steganography Based on HEVC Steganographic Channel Model[C]// IWDW 2017: 149-162)为例,介绍IPM的率失真是如何得到,并如何使用这些率失真来做自适应隐写的。
一、H.265/HEVC的帧内预测过程
帧内预测过程的理论部分就不再赘述了,具体可以参考一些视频的工具书,代码部分的话可以参考“NB_vol_1”大神的博客介绍。
链接:https://pan.baidu.com/s/1Ss3XuebHhK99zxKjBXYHNQ 提取码:cy4i (百度网盘——HEVC视频工具书) 链接:https://blog.csdn.net/NB_vol_1/article/details/55522822?spm=1001.2014.3001.5502 (NB_vol_1大神的关于帧内预测过程的HM代码介绍)
简单的来说帧内预测过程可以分成三个部分: 1、对35种IPM进行第一次失真计算以及第一次RDcost计算,此时失真计算是基于SATD公式的; 2、按照RDcost对35种IPM升序排序,进行第一次筛选,选择8种RDcost较小的IPM作为候选列表uiRdModeList[];(因为视频隐写大多是对4x4 block进行操作的,所以举的是4x4的例子) 3、使用MPM机制,参考上块和左块的IPM对候选列表uiRdModeList[]进行扩充; 4、遍历候选列表,对这些IPM进行第二次失真计算以及RDcost计算,此时失真计算是基于SSD公式进行的; 5、在遍历候选列表uiRdModeList[]的同时,每次用RDcost与当前最优的IPM的RDcost进行比较,每次选择RDcost更小的那个IPM作为最优IPM; 6、在遍历完成之后,就可以得到当前4x4 block的最优IPM(uiBestPUMode)了。
在HM编码器中的具体代码,以HM16.15版本为例,在TEncSearch::estIntraPredLumaQT()函数中:
Void TEncSearch::estIntraPredLumaQT(TComDataCU* pcCU,
TComYuv* pcOrgYuv,
TComYuv* pcPredYuv,
TComYuv* pcResiYuv,
TComYuv* pcRecoYuv,
Pel resiLuma[NUMBER_OF_STORED_RESIDUAL_TYPES][MAX_CU_SIZE * MAX_CU_SIZE]
DEBUG_STRING_FN_DECLARE(sDebug))
{
const UInt uiDepth = pcCU->getDepth(0);
const UInt uiInitTrDepth = pcCU->getPartitionSize(0) == SIZE_2Nx2N ? 0 : 1;
const UInt uiNumPU = 1 << (2 * uiInitTrDepth);
UInt CandNum;
Double CandCostList[FAST_UDI_MAX_RDMODE_NUM];
Pel resiLumaPU[NUMBER_OF_STORED_RESIDUAL_TYPES][MAX_CU_SIZE * MAX_CU_SIZE];
TComTURecurse tuRecurseCU(pcCU, 0);
TComTURecurse tuRecurseWithPU(tuRecurseCU, false, (uiInitTrDepth == 0) ? TComTU::DONT_SPLIT : TComTU::QUAD_SPLIT);
do
{
Int numModesAvailable = 35;
UInt uiRdModeList[FAST_UDI_MAX_RDMODE_NUM];
Int numModesForFullRD = m_pcEncCfg->getFastUDIUseMPMEnabled() ? g_aucIntraModeNumFast_UseMPM[uiWidthBit] : g_aucIntraModeNumFast_NotUseMPM[uiWidthBit];
assert(tuRecurseWithPU.ProcessComponentSection(COMPONENT_Y));
initIntraPatternChType(tuRecurseWithPU, COMPONENT_Y, true DEBUG_STRING_PASS_INTO(sTemp2));
Bool doFastSearch = (numModesForFullRD != numModesAvailable);
if (doFastSearch)
{
assert(numModesForFullRD < numModesAvailable);
for (Int i = 0; i < numModesForFullRD; i++)
{
CandCostList[i] = MAX_DOUBLE;
}
CandNum = 0;
const TComRectangle& puRect = tuRecurseWithPU.getRect(COMPONENT_Y);
const UInt uiAbsPartIdx = tuRecurseWithPU.GetAbsPartIdxTU();
Pel* piOrg = pcOrgYuv->getAddr(COMPONENT_Y, uiAbsPartIdx);
Pel* piPred = pcPredYuv->getAddr(COMPONENT_Y, uiAbsPartIdx);
UInt uiStride = pcPredYuv->getStride(COMPONENT_Y);
DistParam distParam;
const Bool bUseHadamard = pcCU->getCUTransquantBypass(0) == 0;
m_pcRdCost->setDistParam(distParam, sps.getBitDepth(CHANNEL_TYPE_LUMA), piOrg, uiStride, piPred, uiStride, puRect.width, puRect.height, bUseHadamard);
distParam.bApplyWeight = false;
for (Int modeIdx = 0; modeIdx < numModesAvailable; modeIdx++)
{
UInt uiMode = modeIdx;
Distortion uiSad = 0;
const Bool bUseFilter = TComPrediction::filteringIntraReferenceSamples(COMPONENT_Y, uiMode, puRect.width, puRect.height, chFmt, sps.getSpsRangeExtension().getIntraSmoothingDisabledFlag());
predIntraAng(COMPONENT_Y, uiMode, piOrg, uiStride, piPred, uiStride, tuRecurseWithPU, bUseFilter, TComPrediction::UseDPCMForFirstPassIntraEstimation(tuRecurseWithPU, uiMode));
uiSad += distParam.DistFunc(&distParam);
UInt iModeBits = 0;
iModeBits += xModeBitsIntra(pcCU, uiMode, uiPartOffset, uiDepth, CHANNEL_TYPE_LUMA);
Double cost = (Double)uiSad + (Double)iModeBits * sqrtLambdaForFirstPass;
CandNum += xUpdateCandList(uiMode, cost, numModesForFullRD, uiRdModeList, CandCostList);
}
if (m_pcEncCfg->getFastUDIUseMPMEnabled())
{
Int uiPreds[NUM_MOST_PROBABLE_MODES] = { -1, -1, -1 };
Int iMode = -1;
pcCU->getIntraDirPredictor(uiPartOffset, uiPreds, COMPONENT_Y, &iMode);
const Int numCand = (iMode >= 0) ? iMode : Int(NUM_MOST_PROBABLE_MODES);
for (Int j = 0; j < numCand; j++)
{
Bool mostProbableModeIncluded = false;
Int mostProbableMode = uiPreds[j];
for (Int i = 0; i < numModesForFullRD; i++)
{
mostProbableModeIncluded |= (mostProbableMode == uiRdModeList[i]);
}
if (!mostProbableModeIncluded)
{
uiRdModeList[numModesForFullRD++] = mostProbableMode;
}
}
}
}
else
{
for (Int i = 0; i < numModesForFullRD; i++)
{
uiRdModeList[i] = i;
}
}
for (UInt uiMode = 0; uiMode < numModesForFullRD; uiMode++)
#endif
{
UInt uiOrgMode = uiRdModeList[uiMode];
Distortion uiPUDistY = 0;
Double dPUCost = 0.0;
#if HHI_RQT_INTRA_SPEEDUP
xRecurIntraCodingLumaQT(pcOrgYuv, pcPredYuv, pcResiYuv, resiLumaPU, uiPUDistY, true, dPUCost, tuRecurseWithPU DEBUG_STRING_PASS_INTO(sMode));
#else
xRecurIntraCodingLumaQT(pcOrgYuv, pcPredYuv, pcResiYuv, resiLumaPU, uiPUDistY, dPUCost, tuRecurseWithPU DEBUG_STRING_PASS_INTO(sMode));
#endif
if (dPUCost < dBestPUCost)
{
DEBUG_STRING_SWAP(sPU, sMode)
#if HHI_RQT_INTRA_SPEEDUP_MOD
uiSecondBestMode = uiBestPUMode;
dSecondBestPUCost = dBestPUCost;
#endif
uiBestPUMode = uiOrgMode;
uiBestPUDistY = uiPUDistY;
dBestPUCost = dPUCost;
xSetIntraResultLumaQT(pcRecoYuv, tuRecurseWithPU);
if (pps.getPpsRangeExtension().getCrossComponentPredictionEnabledFlag())
{
const Int xOffset = tuRecurseWithPU.getRect(COMPONENT_Y).x0;
const Int yOffset = tuRecurseWithPU.getRect(COMPONENT_Y).y0;
for (UInt storedResidualIndex = 0; storedResidualIndex < NUMBER_OF_STORED_RESIDUAL_TYPES; storedResidualIndex++)
{
if (bMaintainResidual[storedResidualIndex])
{
xStoreCrossComponentPredictionResult(resiLuma[storedResidualIndex], resiLumaPU[storedResidualIndex], tuRecurseWithPU, xOffset, yOffset, MAX_CU_SIZE, MAX_CU_SIZE);
}
}
}
UInt uiQPartNum = tuRecurseWithPU.GetAbsPartIdxNumParts();
::memcpy(m_puhQTTempTrIdx, pcCU->getTransformIdx() + uiPartOffset, uiQPartNum * sizeof(UChar));
for (UInt component = 0; component < numberValidComponents; component++)
{
const ComponentID compID = ComponentID(component);
::memcpy(m_puhQTTempCbf[compID], pcCU->getCbf(compID) + uiPartOffset, uiQPartNum * sizeof(UChar));
::memcpy(m_puhQTTempTransformSkipFlag[compID], pcCU->getTransformSkip(compID) + uiPartOffset, uiQPartNum * sizeof(UChar));
}
}
UInt uiOrgMode = uiBestPUMode;
#endif
pcCU->setIntraDirSubParts(CHANNEL_TYPE_LUMA, uiOrgMode, uiPartOffset, uiDepth + uiInitTrDepth);
DEBUG_STRING_NEW(sModeTree)
m_pcRDGoOnSbacCoder->load(m_pppcRDSbacCoder[uiDepth][CI_CURR_BEST]);
Distortion uiPUDistY = 0;
Double dPUCost = 0.0;
xRecurIntraCodingLumaQT(pcOrgYuv, pcPredYuv, pcResiYuv, resiLumaPU, uiPUDistY, false, dPUCost, tuRecurseWithPU DEBUG_STRING_PASS_INTO(sModeTree));
if (dPUCost < dBestPUCost)
{
DEBUG_STRING_SWAP(sPU, sModeTree)
uiBestPUMode = uiOrgMode;
uiBestPUDistY = uiPUDistY;
dBestPUCost = dPUCost;
xSetIntraResultLumaQT(pcRecoYuv, tuRecurseWithPU);
if (pps.getPpsRangeExtension().getCrossComponentPredictionEnabledFlag())
{
const Int xOffset = tuRecurseWithPU.getRect(COMPONENT_Y).x0;
const Int yOffset = tuRecurseWithPU.getRect(COMPONENT_Y).y0;
for (UInt storedResidualIndex = 0; storedResidualIndex < NUMBER_OF_STORED_RESIDUAL_TYPES; storedResidualIndex++)
{
if (bMaintainResidual[storedResidualIndex])
{
xStoreCrossComponentPredictionResult(resiLuma[storedResidualIndex], resiLumaPU[storedResidualIndex], tuRecurseWithPU, xOffset, yOffset, MAX_CU_SIZE, MAX_CU_SIZE);
}
}
}
const UInt uiQPartNum = tuRecurseWithPU.GetAbsPartIdxNumParts();
::memcpy(m_puhQTTempTrIdx, pcCU->getTransformIdx() + uiPartOffset, uiQPartNum * sizeof(UChar));
for (UInt component = 0; component < numberValidComponents; component++)
{
const ComponentID compID = ComponentID(component);
::memcpy(m_puhQTTempCbf[compID], pcCU->getCbf(compID) + uiPartOffset, uiQPartNum * sizeof(UChar));
::memcpy(m_puhQTTempTransformSkipFlag[compID], pcCU->getTransformSkip(compID) + uiPartOffset, uiQPartNum * sizeof(UChar));
}
}
}
#endif
pcCU->setIntraDirSubParts(CHANNEL_TYPE_LUMA, uiBestPUMode, uiPartOffset, uiDepth + uiInitTrDepth);
} while (tuRecurseWithPU.nextSection(tuRecurseCU));
}
下面再用一个真实的编码过程举个例子,同样是在HM16.15版本下,在QP=28,编码BasketballPass_416x240_50.yuv视频序列:
看到这个坐标为(56, 56)的offset=0的4x4PU块,它的最优IPM为模式0,通过输出这个块的候选IPM以及相应的SSD和RDcost,可以看到这个PU块的候选IPM有模式0、14、13、10、15、12、11、16、26,它们对应的RDcost为316.867、380.772、394.772、537.657、431.733、492.772、553.657、432.733、463.772,那么通过对比可以发现最小的RDcost为316.867,所对应的IPM即为模式0。所以,这个PU块的最优IPM即为模式0。
二、论文[1]的介绍以及如何复现
在论文[1](Coding Efficiency Preserving Steganography Based on HEVC Steganographic Channel Model)中,作者首先将相邻的IPM进行两两分组,例如(0,1),(2,3),……(32,33),这里作者没有阐明模式34和哪个IPM分组,我们假设模式34是和模式33分组的吧。
由于同组的IPM的是两个奇偶性不同的IPM,所以,可以利用二元STC将二进制隐秘信息嵌入到载体IPM中:例如(14,15)这组IPM,假设载体IPM为模式15,那么将其映射成二进制载体序列1,利用二元STC进行隐写,若STC隐写结果为1,则不修改该IPM,若STC隐写结果0,则将该IPM修改成模式14。
对于每个载体IPM的失真,由于上述的映射规则/修改规则是在相邻的IPM之间进行修改,所以载体的失真即为相邻的两个IPM之间的失真,例如:模式15的失真即为当前块在模式15下的RDcost和当前块在模式14下的RDcost之间的差值的绝对值,即|J15-J14|。 通过第一节对HM编码器的代码分析,可以得知,只有在遍历35种IPM,对其进行粗筛选时才可以得到所有IPM的失真以及RDcost,而在后续遍历候选IPM的时候,无法得到所有IPM的RDcost,所以在论文中的RDcost指的应是基于SATD失真的RDcost。那么我们就可以在HM编码器中将这些值保存下来:
for (Int modeIdx = 0; modeIdx < numModesAvailable; modeIdx++)
{
UInt uiMode = modeIdx;
Distortion uiSad = 0;
const Bool bUseFilter = TComPrediction::filteringIntraReferenceSamples(COMPONENT_Y, uiMode, puRect.width, puRect.height, chFmt, sps.getSpsRangeExtension().getIntraSmoothingDisabledFlag());
predIntraAng(COMPONENT_Y, uiMode, piOrg, uiStride, piPred, uiStride, tuRecurseWithPU, bUseFilter, TComPrediction::UseDPCMForFirstPassIntraEstimation(tuRecurseWithPU, uiMode));
uiSad += distParam.DistFunc(&distParam);
UInt iModeBits = 0;
iModeBits += xModeBitsIntra(pcCU, uiMode, uiPartOffset, uiDepth, CHANNEL_TYPE_LUMA);
Double cost = (Double)uiSad + (Double)iModeBits * sqrtLambdaForFirstPass;
#if DEBUG_INTRA_SEARCH_COSTS
std::cout << "1st pass mode " << uiMode << " SAD = " << uiSad << ", mode bits = " << iModeBits << ", cost = " << cost << "\n";
#endif
CandNum += xUpdateCandList(uiMode, cost, numModesForFullRD, uiRdModeList, CandCostList);
}
还是用刚才那个4x4PU块为例,它的模式15的RDcost为121.145,模式14的RDcost为109.145,那么这个载体的失真即为135.145-109.145=26。对所有的载体进行这样的操作,就可以得到载体序列和对应的代价序列,最后送到STC里就可以完成隐写了。 PS:在保存RDcost的时候,可以不用像我一样把35种IPM都保存下来,只要把相邻的两个IPM的RDcost保存下来就好了,比如说这个块只保存模式15和模式14的RDcost即可,这样可以大大减少编码的时间开销。
最后,我们对隐写后的含密视频根据文献[4](Multi-Channel HEVC Steganography by Minimizing IPM Steganographic Distortions)中的IPM-C隐写分析方法做了隐写分析,得出的数据与论文中基本一致,因此,我们认为整篇的复现过程是合理且正确的。
PS:这个Dong[14]指的是博客中列的自适应文献[2],由于作者在文献[4]中说对于大块的修改对隐写分析的准确率影响不大,所以我们用4x4PU块的修改结果来近似对比文献Dong[14]的效果。
|