前言
最近闲着没事想熟悉下UE4的图形模块(差不多算第一次碰这玩意),自己想寻找个需求,目前UE4 4.27的 Mobile Forward是不支持多光源的,默认是最大支持到4栈点光源. 于是,我萌发了一点想法,通过改造UE的Mobile Forward管线支持多光源剔除.
分析UE MobileForward 的多点光源实现
看下MobileBasePassPixelShader.usf的文件,你会发现是UE是“手动”多光源实现,硬编码最大4个点灯光。如下所示
所以想支持多光源,后续必须改掉这部分。
UE MobileForward 的新方案思考
首先本文章不涉及GPU带宽和Overdraw的解决,因为没做PreZ,只是单纯的光源剔除。
直接暴力For遍历
首先直接的方式是直接在Shader里面直接遍历所有灯光来累加效果,当然几百栈灯遍历性能直接拉胯,直接否定这个方案。
Tiled Based Cull
基于块剔除点光源,顾名思义,? 把整个屏幕分为N多块,并且通过ComputeShader并行计算每个存储的点光源Index列表,? 在像素着色的时候根据像素位置,计算出像素所在的Tile的光源List,最终进行光照计算,这样剔除光源的效率比直接暴力遍历所有光源高。
?
?
Tiled Based Cull的剔除灯光效率不错,但是效率有些依赖PreZ, 因为除了frustum cull 还得进行DepthRange的剔除优化,上面说了 没PreZ,?DepthRange的剔除优化是没了,效率进一步降低。如下所示:
?当然就算有PreZ,DepthRange的剔除优化存在存在不稳定因素的, 假设在一个tile里刚好有一个小物体处于这个frustum的最远处, 又有一个物体存在这个frustum的最近处,这时候DepthBounds 就失去了意义, tile里大量的像素就接受无效光源计算, 这种现象被称为 “Depth Discontinuity(深度不连续性),
?
综上所述,还是暂时放弃Tiled Based Cull的方案。
Cluster?Based Cull
基于簇剔除点光源,与TiledBasedCull在屏幕空间分割不同,基于簇剔除是在TiledBasedCull的基础上引入Z,直接分割整个视截体成Cluster,? 利用ComputerShader计算每个簇的光源IndexList。不需要PreZ阶段也能有效剔除。
?这里分三步:
? (1)游戏开始预先在相机空间分配多个cluster,每个cluster用AABB表示.(不用每帧计算)
? (2) ? 遍历所有的cluster, 并计算影响每个cluster的光源集合.
? (3) ? 进行Shading, 根据像素的ScreenPos.xy 和ViewZ 计算像素处于哪个cluster, 然后取出光源集合进行Shading计算.
UE4的实现
UE4 MobileForward的渲染管线改造
实现步骤
MobileBuildCluster
游戏开始预先在相机空间分配多个cluster,每个cluster用AABB表示, 这个阶段不需要每帧进行,但是你的视锥体的大小等因素改变会影响cluster的重新分配
这里X,Y的划分和tileShading是一样的,但是Z的划分存在多种方案.?
?
?
?Doom (Id software)?的方案
这里划分cluster采用了DOOM的方案进行计算,并且划分数量为?16 X 12?X 24 = 4608
class FMobileBuildClusterCS : public FGlobalShader
{
DECLARE_SHADER_TYPE(FMobileBuildClusterCS, Global);
SHADER_USE_PARAMETER_STRUCT(FMobileBuildClusterCS, FGlobalShader);
BEGIN_SHADER_PARAMETER_STRUCT(FParameters, )
SHADER_PARAMETER_STRUCT_REF(FViewUniformShaderParameters, View)
SHADER_PARAMETER(float, FarPlane)
SHADER_PARAMETER(float, NearPlane)
SHADER_PARAMETER(FVector4, TileSizes)
SHADER_PARAMETER_RDG_BUFFER_UAV(RWStructuredBuffer<FLightCluster>, ClusterList)
END_SHADER_PARAMETER_STRUCT()
public:
static bool ShouldCompilePermutation(const FShaderPermutationParameters& Parameters)
{
return IsFeatureLevelSupported(Parameters.Platform, ERHIFeatureLevel::ES3_1);
}
};
?MobileLightClusterShader.usf
#include "/Engine/Private/Common.ush"
struct FLightCluster
{
float4 MinPoint;
float4 MaxPoint;
};
float FarPlane; //LightCullFarPlane
float NearPlane; //LightCullNearPlane
float4 TileSizes;
float3 LineIntersectionToZPlane(float3 a, float3 b, float z)
{
float3 normal = float3(0.0, 0.0, 1.0); // ???
float3 ab = b - a;
float t = (z - dot(normal, a)) / dot(normal, ab);
float3 result = a + t * ab;
return result;
}
RWStructuredBuffer<FLightCluster> ClusterList;
[numthreads(1, 1, 1)]
void MainCS(
uint3 GroupId : SV_GroupID,
uint3 GroupThreadId : SV_GroupThreadID,
uint GroupIndex : SV_GroupIndex,
uint3 DispatchThreadId : SV_DispatchThreadID)
{
const float3 EyePos = float3(0.0, 0.0, 0.0);
uint TileIndex = GroupId.x + GroupId.y * (uint)TileSizes.x + GroupId.z * (uint)TileSizes.x * (uint)TileSizes.y;
float Px = 1.0 / (float)TileSizes.x;
float Py = 1.0 / (float)TileSizes.y;
//Calculate the min and max point in screen, far plane, near plane exit error(forever zero)
float2 MaxPointViewportUV = float2(GroupId.x + 1, GroupId.y + 1) * float2(Px, Py);
float2 MinPointViewportUV = float2(GroupId.xy) * float2(Px, Py);
float3 MaxPointViewPos = ScreenToViewPos(MaxPointViewportUV, FarPlane);
float3 MinPointViewPos = ScreenToViewPos(MinPointViewportUV, FarPlane);
//Near and far values of the cluster in view space, the split cluster method from siggraph 2016 idtech6
float TileNear = NearPlane * pow(FarPlane / NearPlane, GroupId.z / TileSizes.z);
float TileFar = NearPlane * pow(FarPlane / NearPlane, (GroupId.z + 1) / TileSizes.z);
//find cluster min/max 4 point in view space
float3 MinPointNear = LineIntersectionToZPlane(EyePos, MinPointViewPos, TileNear);
float3 MinPointFar = LineIntersectionToZPlane(EyePos, MinPointViewPos, TileFar);
float3 MaxPointNear = LineIntersectionToZPlane(EyePos, MaxPointViewPos, TileNear);
float3 MaxPointFar = LineIntersectionToZPlane(EyePos, MaxPointViewPos, TileFar);
float3 MinPointAABB = min(min(MinPointNear, MinPointFar), min(MaxPointNear, MaxPointFar));
float3 MaxPointAABB = max(max(MinPointNear, MinPointFar), max(MaxPointNear, MaxPointFar));
ClusterList[TileIndex].MinPoint = float4(MinPointAABB, 1.0);
ClusterList[TileIndex].MaxPoint = float4(MaxPointAABB, 1.0);
}
void FMobileSceneRenderer::AddBuildLightPass(FRDGBuilder& GraphBuilder, const FViewInfo* View, FRDGBufferRef ClusterListBuffer)
{
FMobileBuildClusterCS::FParameters* BuildClusterParameters = GraphBuilder.AllocParameters<FMobileBuildClusterCS::FParameters>();
BuildClusterParameters->ClusterList = GraphBuilder.CreateUAV(ClusterListBuffer);
FIntPoint ViewPortSize = View->ViewRect.Size();
BuildClusterParameters->TileSizes = FVector4(MobileClusterSizeX, MobileClusterSizeY, MobileClusterSizeZ, (float)ViewPortSize.X / (float)MobileClusterSizeX);
BuildClusterParameters->NearPlane = View->NearClippingDistance;
BuildClusterParameters->FarPlane = GPointLightFarClippingPlane;
BuildClusterParameters->View = View->ViewUniformBuffer;
TShaderMapRef<FMobileBuildClusterCS> ComputeShader(View->ShaderMap);
FComputeShaderUtils::AddPass(
GraphBuilder,
RDG_EVENT_NAME("MobileBuildLightClusterPass"),
ComputeShader,
BuildClusterParameters,
FIntVector(MobileClusterSizeX, MobileClusterSizeY, MobileClusterSizeZ));
}
?MobileCullLight
遍历所有的cluster, 计算影响每个cluster的光源集合, 并用全局索引偏移表和全局光源索引表来记录. 这里为了简化数据结构, 全局索引偏移表和全局光源索引表 都是一维数组, ?全局索引偏移表元素记录了每个cluster的的光源数量和每个cluster的的光源在全局光源索引表的Index偏移.
class FMobileLightCullCS : public FGlobalShader
{
DECLARE_SHADER_TYPE(FMobileLightCullCS, Global);
SHADER_USE_PARAMETER_STRUCT(FMobileLightCullCS, FGlobalShader);
BEGIN_SHADER_PARAMETER_STRUCT(FParameters, )
SHADER_PARAMETER_STRUCT_REF(FViewUniformShaderParameters, View)
SHADER_PARAMETER_STRUCT_REF(FMobileLocalLightData, LocalLightData)
SHADER_PARAMETER_RDG_BUFFER_UAV(RWStructuredBuffer<FLightCluster>, ClusterList)
SHADER_PARAMETER_RDG_BUFFER_UAV(RWStructuredBuffer<uint>, GlobalIndexCount)
SHADER_PARAMETER_UAV(RWBuffer<uint>, LightGridList)
SHADER_PARAMETER_UAV(RWBuffer<uint>, GlobalLightIndexList)
SHADER_PARAMETER_SRV(StrongTypedBuffer<float4>, LightViewSpacePositionAndRadius)
END_SHADER_PARAMETER_STRUCT()
public:
static bool ShouldCompilePermutation(const FShaderPermutationParameters& Parameters)
{
return IsFeatureLevelSupported(Parameters.Platform, ERHIFeatureLevel::ES3_1);
}
};
bool ShouldPipelineCompileLightClusterShader(const FScene* Scene);
?MobileLightCullShader.usf
#include "/Engine/Private/Common.ush"
#define THREAD_GROUD_X 16
#define THREAD_GROUD_Y 12
#define THREAD_GROUD_Z 2
#define GROUD_THREAD_TOTAL_NUM THREAD_GROUD_X * THREAD_GROUD_Y * THREAD_GROUD_Z
struct FLightCluster
{
float4 MinPoint;
float4 MaxPoint;
};
/*struct FLightGrid
{
uint Offset;
uint Count;
};*/
Buffer<float4> LightViewSpacePositionAndRadius;
RWStructuredBuffer<FLightCluster> ClusterList;
RWBuffer<uint> LightGridList;
RWBuffer<uint> GlobalLightIndexList;
RWStructuredBuffer<uint> GlobalIndexCount;
float GetSqdisPointAABB(float3 SphereViewPos, uint CluterIndex)
{
float SqDistance = 0.0;
FLightCluster Cluster = ClusterList[CluterIndex];
for (int Index = 0; Index < 3; Index++)
{
float V = SphereViewPos[Index];
if (V < Cluster.MinPoint[Index])
{
float Diff = Cluster.MinPoint[Index] - V;
SqDistance += Diff * Diff;
}
if (V > Cluster.MaxPoint[Index])
{
float Diff = V - Cluster.MaxPoint[Index];
SqDistance += Diff * Diff;
}
}
return SqDistance;
}
bool TestSphereAABB(float3 LightViewPos, float LightRadius, uint CluterIndex)
{
//FPointLight Light = PointLights[LightIndex];
//float3 SphereViewPos = mul(float4(Light.Pos + View.PreViewTranslation.xyz, 1), View.TranslatedWorldToView).xyz;
float SqDistance = GetSqdisPointAABB(LightViewPos, CluterIndex);
return SqDistance <= (LightRadius * LightRadius);
}
[numthreads(THREAD_GROUD_X, THREAD_GROUD_Y, THREAD_GROUD_Z)]
void MainCS(
uint3 GroupId : SV_GroupID,
uint3 GroupThreadId : SV_GroupThreadID,
uint GroupIndex : SV_GroupIndex,
uint3 DispatchThreadId : SV_DispatchThreadID)
{
const uint ThreadCount = GROUD_THREAD_TOTAL_NUM;
uint LightCountInt = (uint)MobileLocalLightData.NumLocalLights;
uint PassCount = (LightCountInt + ThreadCount - 1) / ThreadCount;
uint ClusterIndex = GroupIndex + ThreadCount * GroupId.z;
uint VisibleLightCount = 0;
//one cluster max light num <= GROUD_THREAD_TOTAL_NUM
uint VisibleLightIndexs[GROUD_THREAD_TOTAL_NUM];
//TODO: directly loop for all points
for (uint PassIndex = 0; PassIndex < PassCount; ++PassIndex)
{
for (uint Light = 0; Light < ThreadCount; ++Light)
{
uint LightRealIndex = Light + PassIndex * ThreadCount;
if (LightRealIndex < LightCountInt)
{
float4 LightPositionAndRadius = LightViewSpacePositionAndRadius[LightRealIndex];
float3 ViewSpaceLightPosition = LightPositionAndRadius.xyz;
float LightRadius = LightPositionAndRadius.w;
if(TestSphereAABB(ViewSpaceLightPosition, LightRadius, ClusterIndex))
{
VisibleLightIndexs[VisibleLightCount] = LightRealIndex;
VisibleLightCount += 1;
}
}
}
}
//We want all thread groups to have completed the light tests before continuing
GroupMemoryBarrierWithGroupSync();
uint Offset;
InterlockedAdd(GlobalIndexCount[0], VisibleLightCount, Offset);
for (uint Index = 0; Index < VisibleLightCount; ++Index)
{
GlobalLightIndexList[Offset + Index] = VisibleLightIndexs[Index];
}
LightGridList[ClusterIndex * 2] = Offset;
LightGridList[ClusterIndex * 2 + 1] = VisibleLightCount;
}
void FMobileSceneRenderer::AddLightCullPass(FRDGBuilder& GraphBuilder, const FViewInfo* View, int32 ViewIndex, FRDGBufferRef ClusterListBuffer, FSortedLightSetSceneInfo &SortedLightSet, bool bCullLightsToGrid)
{
//....................省略一大段代码, 具体参考github提交
if (ForwardLocalLightData.Num() == 0)
{
// Make sure the buffer gets created even though we're not going to read from it in the shader, for platforms like PS4 that assert on null resources being bound
ForwardLocalLightData.AddZeroed();
}
FIntPoint ViewPortSize = View->ViewRect.Size();
FVector4 TileSizes = FVector4(MobileClusterSizeX, MobileClusterSizeY, MobileClusterSizeZ, 0);
FVector2D ClusterFactor;
ClusterFactor.X = (float)MobileClusterSizeZ / FMath::Log2(GPointLightFarClippingPlane / View->NearClippingDistance);
ClusterFactor.Y = -((float)MobileClusterSizeZ * FMath::Log2(View->NearClippingDistance)) / FMath::Log2(GPointLightFarClippingPlane / View->NearClippingDistance);
UpdateDynamicVector4BufferData(ForwardLocalLightData, View->ForwardLightingResources->ForwardLocalLightBuffer);
LocalLightData.ForwardLocalLightBuffer = View->ForwardLightingResources->ForwardLocalLightBuffer.SRV;
LocalLightData.NumLocalLights = NumLocalLightsFinal;
LocalLightData.TileSizes = TileSizes;
LocalLightData.ClusterFactor = ClusterFactor;
const bool bShouldCacheTemporaryBuffers = View->ViewState != nullptr;
FForwardLightingCullingResources& ForwardLightingCullingResources = bShouldCacheTemporaryBuffers
? View->ViewState->ForwardLightingCullingResources
: *GraphBuilder.AllocObject<FForwardLightingCullingResources>();
if (ViewSpacePosAndRadiusData.Num() == 0)
{
// Make sure the buffer gets created even though we're not going to read from it in the shader, for platforms like PS4 that assert on null resources being bound
ViewSpacePosAndRadiusData.AddZeroed();
ViewSpaceDirAndPreprocAngleData.AddZeroed();
}
// Alloc Large RWBuffer
if (Scene->UniformBuffers.MobileLightGrid.NumBytes != sizeof(uint32) * 2 * MobileClusterNum)
{
Scene->UniformBuffers.MobileLightGrid.Initialize(sizeof(uint32), 2 * MobileClusterNum, EPixelFormat::PF_R32_UINT);
}
if (Scene->UniformBuffers.MobileGlobalLightIndexList.NumBytes != sizeof(uint32) * 10 * MobileClusterNum)
{
Scene->UniformBuffers.MobileGlobalLightIndexList.Initialize(sizeof(uint32), 10 * MobileClusterNum, EPixelFormat::PF_R32_UINT);
}
check(ViewSpacePosAndRadiusData.Num() == ForwardLocalLightData.Num());
check(ViewSpaceDirAndPreprocAngleData.Num() == ForwardLocalLightData.Num());
UpdateDynamicVector4BufferData(ViewSpacePosAndRadiusData, ForwardLightingCullingResources.ViewSpacePosAndRadiusData);
UpdateDynamicVector4BufferData(ViewSpaceDirAndPreprocAngleData, ForwardLightingCullingResources.ViewSpaceDirAndPreprocAngleData);
if (!Scene->UniformBuffers.MobileLocalLightUniformBuffer.IsValid())
{
Scene->UniformBuffers.MobileLocalLightUniformBuffer = TUniformBufferRef<FMobileLocalLightData>::CreateUniformBufferImmediate(LocalLightData, UniformBuffer_MultiFrame);
}
else
{
Scene->UniformBuffers.MobileLocalLightUniformBuffer.UpdateUniformBufferImmediate(LocalLightData);
}
// Add Clear GlobalIndexCountUAV Pass
FRDGBufferDesc GlobalIndexCountDesc = FRDGBufferDesc::CreateStructuredDesc(sizeof(uint32), 1);
FRDGBufferRef GlobalIndexCountBuffer = GraphBuilder.CreateBuffer(GlobalIndexCountDesc, TEXT("GlobalIndexCount"));
FRDGBufferUAVRef GlobalIndexCountUAV = GraphBuilder.CreateUAV(GlobalIndexCountBuffer);
AddClearUAVPass(GraphBuilder, GlobalIndexCountUAV, 0);
FMobileLightCullCS::FParameters* LightCullParameters = GraphBuilder.AllocParameters<FMobileLightCullCS::FParameters>();
LightCullParameters->ClusterList = GraphBuilder.CreateUAV(ClusterListBuffer);
LightCullParameters->LightGridList = Scene->UniformBuffers.MobileLightGrid.UAV;
LightCullParameters->GlobalLightIndexList = Scene->UniformBuffers.MobileGlobalLightIndexList.UAV;
LightCullParameters->GlobalIndexCount = GlobalIndexCountUAV;
LightCullParameters->LocalLightData = Scene->UniformBuffers.MobileLocalLightUniformBuffer;
LightCullParameters->LightViewSpacePositionAndRadius = ForwardLightingCullingResources.ViewSpacePosAndRadiusData.SRV;
TShaderMapRef<FMobileLightCullCS> ComputeShader(View->ShaderMap);
FComputeShaderUtils::AddPass(
GraphBuilder,
RDG_EVENT_NAME("MobileLightCullPass"),
ComputeShader,
LightCullParameters,
FIntVector(1, 1, MobileClusterSizeZ / 2));
}
MobileBasePass
最后修改MobileBasePass,求出每个像素所在的Cluster, 遍历像素所在Cluster的光源列表进行着色
#if !MATERIAL_SHADINGMODEL_SINGLELAYERWATER
// Local lights
float DeviceZ = SvPosition.z / SvPosition.w;
float PixelDepth = GetPixelDepth(MaterialParameters);
// ViewPosZ
float2 ScreenUV = SvPositionToBufferUV(SvPosition);
float ViewPosZ = PixelDepth;
float2 ClusterFactor = MobileLocalLightData.ClusterFactor;
float4 TileSizes = MobileLocalLightData.TileSizes;
uint ClusterZ = uint(max(log2(ViewPosZ) * ClusterFactor.x + ClusterFactor.y, 0.0));
uint3 Clusters = uint3(uint(ScreenUV.x * TileSizes.x), uint(ScreenUV.y * TileSizes.y), ClusterZ);
uint ClusterIndex = Clusters.x + Clusters.y * (uint)TileSizes.x + Clusters.z * (uint)TileSizes.x * (uint)TileSizes.y;
uint LightOffset = LightGridList[2 * ClusterIndex];
uint LightCount = LightGridList[2 * ClusterIndex + 1];
for (uint Index = 0; Index < LightCount; Index++)
{
uint LocalLightIndex = GlobalLightIndexList[LightOffset + Index];
uint LocalLightBaseIndex = LocalLightIndex * LOCAL_LIGHT_DATA_STRIDE;
float4 LightPositionAndInvRadius = MobileLocalLightData.ForwardLocalLightBuffer[LocalLightBaseIndex + 0];
float4 LightColorAndFalloffExponent = MobileLocalLightData.ForwardLocalLightBuffer[LocalLightBaseIndex + 1];
float4 LightDirectionAndShadowMask = MobileLocalLightData.ForwardLocalLightBuffer[LocalLightBaseIndex + 2];
float4 SpotAnglesAndSourceRadiusPacked = MobileLocalLightData.ForwardLocalLightBuffer[LocalLightBaseIndex + 3];
float4 LightTangentAndSoftSourceRadius = MobileLocalLightData.ForwardLocalLightBuffer[LocalLightBaseIndex + 4];
AccumulateLightingOfDynamicPointLight(MaterialParameters,
ShadingModelContext,
GBuffer,
LightPositionAndInvRadius,
LightColorAndFalloffExponent,
float4(0, 0, 0, 1),
float4(0, 0, 0, 1),
Color);
}
#endif
Demo演示
代码链接
https://github.com/2047241149/UnrealEngine/commit/4acdcb34933c7f6910e53834d2947f57be4e49bb
总结
?很多没优化和没测试的地方,聚光灯暂时被干掉,后面有空慢慢搞
参考资料
【1】http://www.cse.chalmers.se/~uffe/clustered_shading_preprint.pdf
【2】https://ubm-twvideo01.s3.amazonaws.com/o1/vault/gdc2015/presentations/Thomas_Gareth_Advancements_in_Tile-Based.pdf?tdsourcetag=s_pctim_aiomsg
【3】http://www.humus.name/Articles/PracticalClusteredShading.pdf
【4】https://www.slideshare.net/TiagoAlexSousa/siggraph2016-the-devil-is-in-the-details-idtech-666?next_slideshow=1
【5】https://newq.net/dl/pub/SA2014Practical.pdf
【6】?A Primer On Efficient Rendering Algorithms & Clustered Shading.
|