背景

最近继续在做VR会议，在共享桌面这一块遇到了一个棘手的问题。在会议中，共享的桌面是通过发送一帧一帧的图片来实现的。在Unity中需要把图片渲染出来。

3个难点

在Unity实时的渲染图片有3个难点：

在往texture中灌数据的时候是一个非常耗时的工作，如果是在主线程中做的话，是会影响帧率的。
服务端传过来的图片格式数据为RABG格式，是一个不规范的格式，没有办法直接赋值到texture中，需要做一次转换，转成RGBA格式或者ARGB格式。
texture.Apply函数，是把CPU中的数据传到GPU中，这个函数必须要在主线程中执行的。又是一个比较耗时的操作。

为了解决其中的2个难点，进行以下的优化。

1、往texture中灌数据的方案

1、使用unsafe code，内存拷贝的方式进行。这种拷贝方式，在图片分辨率1920x1280的情况下，依然需要4-5ms的时间。

2、往texture灌数据可以不用放在主线程中执行。

    private Texture2D imageTexture;
    private NativeArray<byte> textureNaitve;

//先创建一个纹理，然后获取得到纹理的数据地址textureNaitve 
    void CreateTextureIfNeeded(int width, int height)
    {
        if (imageTexture != null && (imageTexture.width != width || imageTexture.height != height))
        {
            DestroyImmediate(imageTexture);
            imageTexture = null;
        }

        if (imageTexture == null)
        {
            imageTexture = new Texture2D(width, height, TextureFormat.RGBA32, false, true);
            imageTexture.wrapMode = TextureWrapMode.Clamp;
            imageRenderer.material.SetTexture(mainTexPropertyName, imageTexture);
            textureNaitve = imageTexture.GetRawTextureData<byte>();
        }
    }

    public async Task LoadIntoTextureAsync(Texture2D texture, SByte[] data, long len)
    {
        //拷贝数据
        Debug.Log("LoadIntoTextureAsyncTest 1");

        int ret = await Utils.AwaitMethod<int>(false, () =>
        {
            unsafe
            {
                Debug.Log("Marshal.Copy 1");
                Marshal.Copy((byte[])(Array)data, 0, (IntPtr)NativeArrayUnsafeUtility.GetUnsafeBufferPointerWithoutChecks(textureNaitve), (int)len);
                Debug.Log("Marshal.Copy 2");
            }

            return 0;
        });
  }


        public static Task<ReturnType> AwaitMethod<ReturnType>(bool isAttachCurrentThread, Func<ReturnType> cb)
        {
            var task = Task.Run(() =>
            {
                if (isAttachCurrentThread)
                    AndroidJNI.AttachCurrentThread();
                try
                {
                    return cb.Invoke();
                }
                finally
                {
                    if (isAttachCurrentThread)
                        AndroidJNI.DetachCurrentThread();
                }
            });

            return task;
        }

注意：在测试过程中发现，往纹理面拷贝数据也就是花个4-5ms左右，当时等待Task.Run完成，就需要花费20ms左右。所以这需要注意，看看你们自己的使用场景，如果你的场景中在主线程中花个4-5ms不影响帧率，完全可以放到主线程中执行。

2、图片格式转换

在我们的场景中，服务端传过来的图片byte数据是RABG格式的，但是Unity的Texture2D中是没有这种格式的，需要进行格式转换。由于我们当前图片都需要每帧进行转换，量大耗时。

当时想到了3中方案：

方案一：所有的操作都在CPU中执行，利用Unity提供的IJobParallelFor，一个并行化job使用一个NativeArray存放数据来作为它的数据源。并行化job横跨多个核心执行。每个核心上有一个job，每个job处理一部分工作量。IJobParallelFor的行为很类似于IJob，但是不同于只执行一个Execute方法，它会在数据源的每一项上执行Execute方法。Execute方法中有一个整数型的参数。这个索引是为了在job的具体操作实现中访问和操作数据源上的单个元素。

   [BurstCompile(CompileSynchronously = true)]
    struct BGRToRGBJob : IJobParallelFor
    {
        public delegate void BGRToRGBDelegate(ref NativeSlice<byte> textureData, int index);

        public static readonly FunctionPointer<BGRToRGBDelegate> RABG32ToRGBA32FP = BurstCompiler.CompileFunctionPointer<BGRToRGBDelegate>(RABG32ToRGBA32);

        [BurstCompile(CompileSynchronously = true)]
        static void RABG32ToRGBA32(ref NativeSlice<byte> textureData, int index)
        {
            var temp = textureData[mad(4, index, 1)];
            textureData[mad(4, index, 1)] = textureData[mad(4, index, 3)];
            textureData[mad(4, index, 3)] = temp;
        }

        [NativeDisableParallelForRestriction]
        public NativeSlice<byte> textureData;
        public FunctionPointer<BGRToRGBDelegate> processFunction;

        public void Execute(int index) => processFunction.Invoke(ref textureData, index);
    }

需要安装2个包，com.unity.burst，Mathematics。怎样调用执行呢？

using static Unity.Mathematics.math;
using Unity.Burst;

        var mipmapSlice = new NativeSlice<byte>(textureNaitve, 0, texture.width * texture.height * 4);

        JobHandle jobHandle = new BGRToRGBJob
        {
            textureData = mipmapSlice,
            processFunction = BGRToRGBJob.RABG32ToRGBA32FP
        }.Schedule(texture.width * texture.height, 8192);

        while (!jobHandle.IsCompleted) await Task.Yield();
        jobHandle.Complete();

这样就能放到多个worker中并发执行。但是在我们测试中发现，1920x1280x4个字节的数据这个执行效率也不高，我们在这也要耗时将近20ms。

方案二：在渲染的时候，直接进行转换，自己写个渲染shader。

Shader "Custom/ShareScreen"
{
    Properties
    {
        _Color ("Color", Color) = (1,1,1,1)
        _MainTex ("Albedo (RGB)", 2D) = "white" {}
        _Glossiness ("Smoothness", Range(0,1)) = 0.5
        _Metallic ("Metallic", Range(0,1)) = 0.0
    }
    SubShader
    {
        Tags { "RenderType"="Opaque" }
        LOD 200

        CGPROGRAM
        // Physically based Standard lighting model, and enable shadows on all light types
        #pragma surface surf Standard fullforwardshadows

        // Use shader model 3.0 target, to get nicer looking lighting
        #pragma target 3.0

        sampler2D _MainTex;

        struct Input
        {
            float2 uv_MainTex;
        };

        half _Glossiness;
        half _Metallic;
        fixed4 _Color;

        // Add instancing support for this shader. You need to check 'Enable Instancing' on materials that use the shader.
        // See https://docs.unity3d.com/Manual/GPUInstancing.html for more information about instancing.
        // #pragma instancing_options assumeuniformscaling
        UNITY_INSTANCING_BUFFER_START(Props)
            // put more per-instance properties here
        UNITY_INSTANCING_BUFFER_END(Props)

        void surf (Input IN, inout SurfaceOutputStandard o)
        {
            // Albedo comes from a texture tinted by color
            fixed4 c = tex2D (_MainTex, IN.uv_MainTex) * _Color;
            float temp = c.g;
            c.g = c.a;
            c.a = temp;
            o.Albedo = c.rgb;
            // Metallic and smoothness come from slider variables
            o.Metallic = _Metallic;
            o.Smoothness = _Glossiness;
            o.Alpha = c.a;
        }
        ENDCG
    }
    FallBack "Diffuse"
}

这样数据转换的时候基本就不耗时间了。