[游戏开发] LDCP 02 - 奇偶校验码发展史

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 游戏开发 -> LDCP 02 - 奇偶校验码发展史 -> 正文阅读

[游戏开发]LDCP 02 - 奇偶校验码发展史

前言：

? ? ? ? ? LDPC码是麻省理工学院Robert Gallager于1963年在博士论文中提出的一种具有稀疏校验矩阵的分组纠错码。

? ? ? ? ?为什么要稀疏校验呢，这个需要从校验码历史发展开看。

? ?这段是LDPC发明者原话：??

? ? ? ? ?I invented LDPC about ?60 years ago.there are Nelson those cellphones
then data transmission was painfully slow expensive and error-prone.
? ? ? ? ?my main computer at a computational power,halfway between a modern cell?
phone and a coffee machine . error correction coding was fascinating theoretical
problem that studied by many mathematicians and engineers.
? ? ? ? simple schemes were ineffective but complex schemes were too expensive for?
1960 technology.
? ? ? ?LDPC? was an interesting midpoint cute and interesting?to theoreticians but thirty five years before technological feasibility.
? ? ? ?that's the way research is and the way it should be hard research problem take?
years to solve and should not be overly dependent on details of current technological.
? ? ? ? technological capabilities and research projects both proceed at their own pace
and applications? ,resolve when both are ready .

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? thank you?

?上世纪60s年代发明了LDPC 码。那个时候数据传输一方面特别慢，另一方面容易出错。

在60年代,简单的调度没有什么用,复杂的调度又太昂贵.?LDPC理论码是特别有意思的，但是直到35年后才真正应用起来。

? ?

? ? ? Polar 思想是:? ? 信息传输的时候不要出错

? ? ? LDPC 思想：? ? 信息出错了不要紧,我把它纠正过来。

? 一? ?问题起源

? ? ? ? ?

? ? ? ? ? ? ? 通常我们传输数据的时候，是肯定会出错的。发送的bits流

可能会在某一位0 变成1 ,1变成0，也可能被erase掉了,变成一个不确定的状态。

这里主要讨论erase状态，也就是BEC信道

? ? ? ??

? ? ? ?often we need to transmit sensitive? data, such as banking information where
every symbol in the message ?is vital . one mistake can corrupt,?the entire message and
no matter how we communicate . we face this problem one kind of corruption? occurs when a bit flips the transmitter sends a 1 but recived as 0 ,or a 0 which is received as a 1 .

? ? ? ?these are called? errors, another kind of corruption is an erasure which occurs when the bits are so distorted ，they're considered unknowns or ?blanks in the message for this video we will ?discuss this? problem in terms ?of erasers ，because it simplifies the main ideas while some communication suffer from less? corruption than others .

? ? ? ?it's important to note that corruptions are bound to occur, no channels are truly noiseless . so the problem communication engineers will always face ,is how can ?we ?transmit our messages perfectly when? we know some symbols will be lost in transmission?

二? ? repetion code?

? ? ? 最早使用repetion code方式

? ?

如场景1：

? ? ? ? ? ? 一个bits copy 一份，同时发送两份:。

? ? ? ? ?接收方收到bit流时,逐个比特对比，当其中一个被erase掉了，就用另一个的值。

, 但是也会发生场景2的情况，同一个位置，同时被erase掉了。

?也可以理解每个bit都需要一个bit 做奇偶校验。

? 如场景2：

? ? ? ? ? ? ?如果两个copy里面同一个bit 都发生错了，上面依然会校验失败.

解决方法：

? ? ? ? ? ?也可以重复发3或者4次，但是又带来另一个问题，码率降低了

? $rate=\frac{k}{n*k}$

? ? ? ? ? ? ? k:信息bit ,n 为重复次数。

这里面就有个矛盾：

? ? ? ? ? ? ? ? 码率?和? ?降低解码失败概率?

? ? realize we all face this problem during ordinary voice communication,when a word
is misheard in a conversation to recover from this corruption .we simply repeat ourselfs.?

? ? ?technically this is called a repetion code, it's simplest error correction code, for example if we have a binary message ,it would be encoded before transmission into two copies of itself . this might received, for example,if we have a binary message it would be encoded before transmission into two copies itself this might be received. for?example with some erasers as follows,these erasers can easily be decoded by comparing the two copies bit? by bit.?

? ? ?thus the message would be decoded correctly this compare operation is simple and thus can easily?keep us with the speed of communication ,unfortunately simple repetion fails when the same bit is erased in both copies and because this quite common this strategy? leaves us very vulnerable ?to failure ,also it doubles the message length no matter the number erasers we?express this efficiency ?[??f??nsi] .

? ?issue in terms of code rate ?where code rate is the orignial? divied by the total number ?of transmitted bits thus the code rate in this repetition scheme? is 50% that means half of the bis are ?need to describle our original message and the other half are there?
to protect the message if we use triple repetion ?[?tr?pl] ?三部分的; the code rate would drop to? 1/3.???as of the rate drops the protection increases but there are fewer message sent so a lower code rate is a more expensive design since bit transmit has some cost so there is a balance to strike between code rate and probality of failure we ?need a coding strategy with just enough protection? bits to prevent failure for any channel whether we expect a high or lowe erasure any channel whether?we expect a high or low erasure rate while also maintaining fast code and decode operations to solve this problem

三??parity-check 奇偶校验和二项分布

? ? ? ? ?repetion code ,每个bit 只会保护一个bit.

? ? ? ? ?效率比较低，后来到50年底出现了奇偶校验码

? ? 3.1 单奇偶校验

? ? ? ? ?如上图,校验码为1个bit, $p=a_1+a_2+a_3+a_4$ （或者求出其中1的个数,如果为奇数 p=1）

?最后组成一个5bit的code word 发送出去

? ? ? ? ?很多外文把 $[a_1,a_2,a_3,a_4]$ ?集合叫做一个set

? ? ? ? 例子：

?编码过程：

? 要发送的信息为 1 000 11 ，加上一个奇偶校验位1（1的个数和为3,为奇数）

? 解码过程：

? ? ? ? ? ? ?当接收方收到数据 100?11 1 时. 因为其 $r_1+r_2+r_3+r_4+r_5=0$ ,

? ? ? ? ? ? ?可以直接解码出原码为 1000 11

? ? ? ? ? ? ? ? $code rate =\frac{n}{n+1}$ ,是不是高很多。

? ?问题：?

? ? ? ? ? 当超过一个bit 被 erase ,就无法解码了

? ? ? ? ? ?结合二项分布，p为0.5, 这种只能纠正10%的场景。（传输7个bit,只有一个出错，其它都正确的概率为10%）

? ? ? ? ? ?

# -*- coding: utf-8 -*-
"""
Created on Wed Apr 27 15:39:14 2022

@author: chengxf2
"""
import numpy as np
from scipy.stats import binom
import matplotlib.pyplot as plt

n = 6
x = np.arange(n)
p =0.5

#计算每次观测的二项分布概率密度函数为: stats.binom.pmf(k,n,p) 其中n 表示测量次数，
# k 是目标结果的次数， p 为抽样概率。输出为k个元素的数组，数组的每个元素表示出现k次目标结果的概率。
y = binom.pmf(x,n,p)
plt.plot(x, y, 'o-', label='p=0.5')

plt.legend()
plt.title( 'p' )
plt.show()
print('当p不同时，成功m次的能性的最大值都出现在均值处，对应概率为n*p')

? ?

?3.2 多奇偶校验位

? ? ?这里举例解码 $a_1$ ，当其被erase掉：

? ? ? 单奇偶校验简单的理解：

? ? ? ? ? ? ? $[a_1,a_2,a_3,a_4,p]$ ? 只受到一个set的保护.?

? ? ? ? ? ? ?如果要解码成功这个，需要 $a_2,a_3,a_4,p$ 全部传输成功，

? ? ? ? ? ? 假设条件独立:

? ? ? ? ? ? ? ? ? ? ? ? ? ? ?假设都是50%的概率erase,解码成的概率：

? ? ? ? ? ?? ? ? ? ? ? ? ? ?? $p(a_2)p(a_3)p(a_4)p(P)=\frac{1}{2}^4=\frac{1}{16}$

? ? ? ? 多奇偶校验：

? ? ? ? ? ? $[a_1,a_2,a_3,p_1]$

? ? ? ? ? ? $[a_1,a_2,a_4,p_3]$

? ? ? ? ? ? $[a_1,a_3,a_4,p_4]$

? ? ? ? ? ? ? ? ? ? ? ? ? ? ?同样其解码成功的概率为

? ? ? ? ? ? ? ? ? ? ? ? ? ?? $p= 1- (\frac{7}{8})^3$ =0.669

? ? ? ? ? ? ? ? ? ? ? ? ? ? 其中

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? $\frac{7}{8}=1-\frac{1}{2}^3$ : 是任意的一个set中，其解码

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 失败的概率

? ? ? ? ? ? 可以看到增加其set的数量,会提高其解码成功率，

但也会带来码率的降低

? ? ? need to be smarter about how we use the protection bits first realize with repetition codes each bit is only? protecting one bit of information instead we could define a single protection bit that?protects the entire message。

? ? ? from a single erasure no matter where it occurs to to this we use an?encoder which counts the number of ones in the message and then chooses a protection bit to ensure that the total message is even .

? ? ? ? this way we call this protection bit a parity bit the sender includes this?
parity at the end of the message now consider what happens when a ?single erasure during transmission.the receiving decoder can correct the erasure with a simple operation it counts the number of ones in?the received message ，and so it know the erased bit must have been a 1.

? ? ? notice that a single?parity-check bit yields quite an improvement to the code rate compared ?repetition coding,however the chance of failure has increased compared to repetition with a single parity-check bit.
? ? ? ? ?we could only correct a single erasure and if two of more erasers occur the code fails because there?isn't enough information to reconstruct the bits to decrease the chances of failure we can divied up the message into distinct sets and include multiple parity check bits where now each parity check bit protects a different set of bits from a single erasure?.we call each of these sets? a parity check set and as we add more parity check bits we increase the number of erasure as we can?potentially correct ,at the cost of gradually decreasing ,the code rate in general.

? ? ? ?the number of? parity should be at least equal to the number of erases if we want to have a chance of correcting them all
? ? ? ?however we face the same problem with this approach ?[??pro?t?] ?方法 the code fails if two or more? erasers occur in any parity check set in that case the bits can't be recoverd.

四? ?Hamming 码

? ? ? 1950 年??Richard Hamming 找到了一种简单的方法纠错多个错误。当传输一个序列,

消息bit为4个,3个为校验码，可以纠正任意两个erasers.

? ? ? 这是一种重叠法解码思想（overlapping approach)

? ? ? ? ? ? 先解码集合A

? ? ? ? ? ? 解码出的bit 集合可能有部分跟集合B 重叠

? ? ? ? ? ?去更新集合B重叠的部分

? ? ? ? ? ?这个时候多个集合B中的子集就有可能变成集合A中的子集

? ? ? ? 每次解码时候，里面只能有一个bit是erase?

? ? ? ?如下图 $a_1,a_2$ ?被erase掉后，可以基于 $p_2,p_3$ ?校验子集分别纠正出 $a_1,a_2$

erase 状态。

? ? ? ?问题：

? ? ? ? ? ? 重叠码思想后来有大量的数学家以及工程师在这个方向做了研究

? ? ? ? ? ? 但是把这个扩展到数以千计的bit,我们遇到新的问题，

? ? ? ? ? ?这是因为对于较长的消息，大多数人认为我们可以使用更大的重叠子集，但是有许多大的重叠子集将奇偶校验位的解码过程从一个简单的操作变成了一个复杂的谜题，这是因为当子集非常大时，得到只有一个擦除的子集的机会非常小.

? ? ? ? ? ?这防止了级联效应的开始或继续，相反，奇偶校验位必须使用更复杂的解码操作来解决，因为现在你要解决一个方程组，这个方程组随着问题数量的增加而变得越来越复杂，但这些更复杂。解码操作会将通信过程减慢到不再复杂的程度

? ? ? ? ? ?实际使用我们需要的是一个具有非常快速解码操作的代码，所以我们剩下的最后一个问题是，我们如何扩展这种重叠子集方法，以便在非常长的消息上快速工作，从而能够处理我们遇到的任何擦除率，同时保持良好的代码速率。

? ? ? ? ?是不是跟Polar 码一样，要提高解码成功率，用SCL，但是又会影响解码速度，带来一系列的问题。

? ? ? ? ?1950 Richard Hamming looking at simple ways of correcting multiple errors when transmitting? a sequence of bits。 his trick was to use overlapping parity check sets which made it possible to?correct ?any two erasers as a 4-bit message and three parity check bits each parity check bit covers three message bits now no matter where to ?erasers occur the code will always be able to correct in?this case.
? ? ? ?we can't recover the first erasure right away since both both black and blue sets contain
two erasers .the second erasure can be recoverd first beacuse the read set has one erasure?and ?this provides the information we need ?to recover the first bit this ?casecade?
( effect where one correction allows us to solve another is very important insight ,
because instead of correcting all erasers ?in one step we can remove erasers in multiple passsesor iterations it's important to note that this kind of cascade effect only works if we have at least one set with only one erasure at each step ,otherwise it would fail .

? ? ? ?this ?led to a decade of excellent research by many people built on this idea of overlapping parity sets much of this work was aimed at?finding the right structure for the parity check set arrangement 。

? ? ? ? but when we extended this? overlapping approach to very long messages with many thousands of bits we ran into a problem。this? because with longer messages most assumed that we could use larger overlapping subsets,but having many large overlapping subsets turns the process of decoding the parity?check bits from a simple ?operation into a complex puzzle。

? ? ? ? this is because when the subsets are very large the chance of getting a subset in which there is only one erasure is very small ,and this?prevents that cascade effect from starting or continuing, instead the parity check bits must be solved using a more complex decoding operation because now you are faced with solving a system of equations which becomes more and more complex as the number of questions grows ,but these more?complex。 decoding operations slow down the communication process to the point where it's no longer?
? ? ? ? ?practical to use what we need is a code with very fast decode operations ,so we are left with a final? problem how can we extend this overlapping subset approach to work quickly over very long messages in?a way that can handle any erasure rate we encounter while maintaining a good code Rate .

?五 LDPC code??

? ? ?

? ? ? ? 1960 Robert Gallagher 发明了一种新的方法解决通讯编码问题（码率和重叠码冲突问题)

在数据存储以及无线网络中广泛使用。

? ? ?算法的核心是使用小的奇偶校验集,小的奇偶校验集相对大的奇偶校验集里面的

只有一个erase概率更高（可以通过二项分布计算），可以充分利用硬件的并行计算能力，

同时解决多个简单的问题，而不是一个复杂的问题。

? ?

? ? ? 一个消息bit 关联的奇偶校验位越多,密度越高.后面也用Tanner图来表示 ,他的研究受到?Peter Elias的启发。

? ? ? LDPC 的模型，编码，解码后面会系统的介绍一下。

? ? ? ?in 1960 a doctoral thesis by Robert Gallagher discovered a new approach to the communication coding problem,which is in wide use today particularyly? ?in cellular networks and?data storage applications.?
? ? ?Gallagher combined to key in the first was to use?smaller parity-check sets instead of larger ones this improves the?chances that many parity-check sets will contain only a single erasure and enables the?casecade effct to happen?this was key to keeping the decoding speed fast over?long message because it's faster for a computer to solve many simple problems instead?of a single complex problem. which solves all of them at once.
? ? ? Gallaher realized that we need an amount of overlap that is large
enough to protect each bit well, but not so large that the correction problem?
becomes very complex he measures the amount of overlap by something he calls?
density that is how densely the message bits are connected to parity
check bits with larger subsets we have a high density that is each message bit is connected?to many parity check bits 。

? ? ?with smaller subsets we have a lower density that is?
each message bit is connected to fewer parity check bits and this is where the name?
low-density parity-check codes comes from,。

? ? ?but aside from the size of the sets there is also the problem of how to structure the connection between parity and?message bits and that leads to his secode insight which was inspired by the work
of Peter Elias， who had recently shown that looking for the right structure
for large subsets was relatively unimportant instead a random subset arrangement?
this simplified the problem compared to trying to design an idea structure for?
the subsets ，and remarkably it did very well and that's key idea behind low-density?
parity-check codes use small overlapping randomly assigned parity check sets this?
allows us to solve eraser in a rapid casecade of simple operation no matter?
how long the meessage or what the erasure rate is while maintaining a fast decoding
speed ,but there is one last insight to understand when it came to the actual
construction of these codes recall that Hammonds construction foucused on
protecting each message bit with multiple parity check in order to better protect?
those bits in case of multiple erasers the problem is this dosent's proivde
the same protection to parity bits which can also be lost during transmission?
notice that each parity check bit is itself only connected to one parity check?
set since any bit that is connected to only one parity check set is vulnerable,
?v?ln?r?bl] ?脆弱的，this leaves all parity check bits vulnerable.
this can make correction impossible,for example this would happen when one message
bit is erased and two parity bits it's connected to is also erased,
in this case we are stuck。

? ? ? ? ? because the casecade can't happen and over very long messages
with many small parity check sets this kind of failure would become more and?
more likely to happen, the solution to this dilemma ,is to instead
view things in a more general way and ?protect the parity bits and the?
message bits equally,that is we make sure all bits message and parity?
are contained in multiple check sets and now we need to understand the basics of
low-density parity-check codes or LDPC codes to see this in action , lets's do a?
very simple example now say our message is ?101,recall that we chose our?
subsets randomly,to generate the code ,encoding is now done by finding a
collection of parity bits so all the check sets have an even number of ones this
involves solving a simple ?set of linear equations ,
? ? ? ? now imagine our message is received with 3 erasers as follows to decode?

this message we begin with the check sets that have a single erasure?
first such as this one and we repeat this process for every check bit
until there are no more erasers left that's one way iterative can work using?
low-density parity-check codes it's simple fast reliable,and can be extended
to messages of any length ,today many communication devices use LDPC codes
and versions of LDPC codes are being proposed for fifth generation wireless
system known as 5g but important to note that when these coats came out they?
did not have a big impact at first and ?were almost forgotten for many years