Programmer notes on CRC computation

Introduction

CRC is a function that receives a sequence of bits of arbitrary size and returns a sequence of bits of fixed size:

uint8_t Result = CRC(ArbitraryLengthData);

The function name depends on how many bits it returns:

uint8 ResultA = CRC8(Data); // Returns 8 bits
uint16 ResultB = CRC16(Data); // Returns 16 bits
uint32 ResultC = CRC32(Data); // Returns 32 bits

Given the same input the function always returns the same output:

uint8_t ResultA = CRC8(DataA);
uint8_t ResultB = CRC8(DataA);
assert(ResultA == ResultB);

However two different inputs could return the same output, but dependending on the CRC polynomial in use the chances can be low (As a general note if the polynomial has a name it means that a couple of PHDs crafted the polynomial and demonstrated the low chances of repetition under certain criteria).

uint8_t ResultA = CRC8(DataA);
uint8_t ResultB = CRC8(DataB);
if (DataA != DataB &&
    ResultA == ResultB)
{
    printf("Collision");
}

If you transmit a message of arbitrary size along with the respective computed CRC it's possible for a receiver to recalculate the CRC and compare if it is equal to the received CRC (Used like this by USB, RFID, SD, Bluetooth, Ethernet, SCTP, PNG, SATA, cksum, Layers on OSI model and much more).

// At transmitter
Transmit(Message, CRC32(Message));

// At receiver
DataFrame Received = GetData();
if (Received.GetCRC() == CRC32(Received.GetMessage()))
{
    printf("Received CRC matches recomputed CRC");
}

You can also use the CRC function as a hash function, which is useful if you're building a hash map or something else that requires a hash (For example, Unreal Engine uses CRC to hash various data structures inside TMaps).

// Unreal Engine function that creates a
// hash value from an IntVector2 data
template<typename T>
uint32 GetTypeHash(const TIntVector2<T>& Vector)
{
    return FCrc::MemCrc32(&Vector, sizeof(Vector));
}

Calculation

CRC computation consists on calculating:

remainder \left( \frac{d(x)}{g(x)} \right)=r(x)

Where:

d(x)

Is a polynomial whose coefficients encode an arbitrary chunk of data, and:

g(x)

Is a polynomial known as the divisor, generator or CRC polynomial. The degree of the generator polynomial states the name of the CRC in use and also the max degree of:

r(x)

For example, the following CRC polynomial is known as CRC16-CCITT:

x^{16}+x^{12}+x^{5}+1

The coefficients of these polynomials can be only 1 or 0, or more rigurously, these coefficients are elements of the underlying set of a mathematical field known as GF(2), coders can interpret this as fancy terminology to refer to boolean true, false, XOR and AND.

Description of GF(2)

Underlaying set of the field is defined as:

S=\{0, 1\}

Binary operations of the field are:

\{+, *\}

The field is defined as:

GF(2)=(S,+, *)

Intuitively it means that addition, subtraction, multiplication, and division are defined and behave as the corresponding operations on rational and real numbers.

All the valid operations in GF(2) are:

`A`	`B`	`A + B`	`A - B`
0	0	0	0
0	1	1	1
1	0	1	1
1	1	0	0

Note above that subtraction and addition looks the same and is equivalent to the boolean XOR operation.

`A`	`B`	`A * B`	`A / B`
0	0	0	Undefined
0	1	0	0
1	0	0	Undefined
1	1	1	1

We conventionally state division by zero to be undefined, apart from that, multiplication and division looks very similar and are equivalent to the boolean AND operation.

Polynomials over GF(2)

Some example of operations between polynomials whose coefficients are elements of GF(2).

x^2+x^2=(1+1)x^2=0x^2=0

x^7-x^7=(1-1)x^7=0x^7=0

x^2(x^3+x^2+1)=x^{3+2}+x^{2+2}+x^{0+2}=x^5+x^4+x^2

The following github repo contains a toy C++ class to perform operations between polynomials over GF(2) using a syntax similar to the one of math texts (Sacrificing performance as is intended only for small experimentation) and it will be used in the following sections.

Polynomial A = (1 * x{6} + 1 * x{ 0 }) * x{ 1 };
Polynomial B = x{ 7 } + x{ 1 };
Polynomial SumResult = SumA + SumB;
assert(SumResult == Polynomial::Zero);

Polynomials to binary

With the tools previously described we can encode a polynomial using binary strings:

x^3+x^1+1 \leftrightarrow (1011)

x^2+1 \leftrightarrow (0101)

And perform polynomial operations using bitwise XOR and AND:

(x^3+x^1+1)+(x^2+1) = x^3+x^1+x^2 \leftrightarrow 1011 \oplus 0101 = 1110

CRC computation 1 bit at a time

This is based on the standard long division algorithm taught in school, where multiples of the divisor g(x) are subtracted from the dividend d(x) until the remainder is of smaller degree than g(x).

The following slide illustrates the computation of:

remainder \left( \frac{x^3+x^2+1}{x^3+x+1} \right) \leftrightarrow remainder \left( \frac{1101}{1011} \right)

Let n be the degree of the CRC polynomial, in this case n=3, the algorithm above consists on

Load the first (n + 1) terms of the polynomial d(x)
Subtract the CRC polynomial
(1) Store the result
(2) Left shift the result
Subtract the CRC polynomial, repeat...
Stop after 4 iterations because d(x) encodes a 4 bits long message

C++ example implementation below (see full source code here):

Polynomial Message = x{3} + x{2} + x{0};
Polynomial Crc3 = x{3} + x{1} + x{0};

// Load n+1 bits of the message
// (we depend here on Message
// and CRC having the same degree,
// otherwise this is incorrect,
// read next examples for a demostration on how to handle
// message and CRC not having the same degree)
Polynomial Remainder = Message;
assert(Remainder.GetDegree() == Crc3.GetDegree());

for (size_t Counter = 0;
    Counter < Message.TotalTerms();
    ++Counter)
{
    if (Remainder.GetDegree() == Crc3.GetDegree())
    {
        Remainder = Remainder - Crc3;
    }

    // Left shift the remainder one bit
    Remainder = Remainder * x{ 1 };
}

return Remainder;

The above code won't work if the message has greater degree than the CRC polynomial, next we will handle the loading of abitrary sized messages:

// This example is based on
// https://en.wikipedia.org/wiki/Cyclic_redundancy_check#Computation
Polynomial Crc3 = x{3} + x{1} + x{0};

// A message polynomial of degree 13, encoding 14 bits
PolynomialBitString Message{ 0b11'0100'1110'1100, 14 };

// Load first n bits of the message a.k.a 110
Polynomial Remainder =
    Polynomial::FromBitString(
        Message.Substring(Crc3.GetDegree())
    );

for (size_t Index = 0;
    Index < Message.TotalTerms();
    ++Index)
{
    Remainder =
        // Align remainder and CRC, e.g. make 110, to be 1100
        // so it has the same degree of CRC
        Remainder * x{ 1 }

        // Load the next bit of the message
        + Message[Crc3.GetDegree() + Index] * x{ 0 };

    // Subtract and update the remainder
    // if both polynomials share the same degree
    if (Remainder.GetDegree() == Crc3.GetDegree())
    {
        Remainder = Remainder - Crc3;
    }
}

return Remainder;

Note that we can ignore the term of degree n as it always gets zeroed when subtracting the CRC from the Remainder.

The following example is the most used in hardware, it doesn't requires preloading the first n bits of the message and only uses n registers to store the result (instead of n + 1), to achive this, it leverages on the associative property of addition, i.e. it performs the same additions per digit noted above, but in a distinct order, continue reading for an illustrated explanation.

// This example is based on
// https://en.wikipedia.org/wiki/Cyclic_redundancy_check#Computation
Polynomial Crc3 = x{3} + x{1} + x{0};

// A message polynomial of degree 13, encoding 14 bits
PolynomialBitString Message{ 0b11'0100'1110'1100, 14 };

// No more need to load first n bits of the message
Polynomial Remainder = Polynomial::Zero();

for (size_t Index = 0;
    Index < Message.TotalTerms();
    ++Index)
{
    Remainder =
        // No more need of left shifting one bit
        Remainder

        // Load the next bit of the message
        + Message[Index] * x{ Crc3.GetDegree() - 1 };

    // Is the term n-1 at remainder 1? will the remainder
    // be aligned to the crc after one left shift?
    // 0110 (This will be aligned to CRC after one left shift)
    // 1011
    if (Remainder.GetDegree() == (Crc3.GetDegree() - 1))
    {
        // Each register of remainder will accumulate
        // the operations consider the first iteration:
        // r= x*1x^(n-1) = 1x^n
        // crc=            1x^n + x + 1
        // updated r=             x + 1
        // the terms x and 1 will eventually be subtracted
        // from a new loaded bit
        Remainder = Remainder * x{ 1 } - Crc3;
    }
    else
    {
        // Shift left everything if loaded n-1 term was zero
        Remainder = Remainder * x{ 1 };
    }
}

return Remainder;

Below is an illustrated step by step debugging of the above algorithm, each bit has an identifier tracking when was it read for first time so you can track the order of operations.

For more credibility compare the above graphic with all the operations "expected" to happen on each digit in the first algorithm studied.

Iteration 0
                result bits
1   1   0   1   0   0   0   =   d(x)
1   0   1   1               =   g(x)
1+1 1+0 0+1 1+1                 operations at iteration 0

Iteration 1
                result bits
0   1   1   0   0   0   0   =   d(x)
    1   0   1   1           =   g(x)
1+1 1+0 0+1 1+1                 operations at iteration 0
    1+1 1+0 0+1 0+1             operations at iteration 1

Iteration 2
                result bits
0   0   1   1   1   0   0   =   d(x)
        1   0   1   1       =   g(x)
1+1 1+0 0+1 1+1                 operations at iteration 0
    1+1 1+0 0+1 0+1             operations at iteration 1
        1+1 1+0 1+1 0+1         operations at iteration 2

Iteration 3
                result bits
0   0   0   1   0   1   0   =   d(x)
            1   0   1   1   =   g(x)
1+1 1+0 0+1 1+1                 operations at iteration 0
    1+1 1+0 0+1 0+1             operations at iteration 1
        1+1 1+0 1+1 0+1         operations at iteration 2
            1+1 0+0 1+1 0+1     operations at iteration 3

Which expands to:
digit 6 = 1+1
digit 5 = (1+0)+1
digit 4 = ((0+1)+0)+1
digit 3 = ((((1+1)+1)+0)+1)
digit 2 = (((0+1)+1)+0)
digit 1 = ((0+1)+1)
digit 0 = (0+1)

Where digit 2, 1 and 0 conform the result.

Next example uses bitwise operators directly to removes usage of the class Polynomial.

// Binary message
// 00010000 00010001 00010010 00010011
// 00010100 00010101 00010110 00010111
constexpr uint8_t Message[] {
        0x10, 0x11, 0x12, 0x13,
        0x14, 0x15, 0x16, 0x17
    };

// Most significant bit first (big-endian),
// note that the x power n bit is ignored,
// aka the coefficient of the msb bit
// (x16)+x12+x5+1 = (1) 0001 0000 0010 0001 = 0x1021
constexpr uint32_t Crc = 0x1021;
constexpr uint32_t CrcN = 16;

// Result of the CRC function
uint32_t Remainder = 0;

for (uint8_t MessageByte : Message)
{
    // Load the next byte and
    // Align the byte with the generator
    // polynomial e.g. make 0x12 to be 0x1200 so
    // it represents a polynomial of degree n-1
    uint32_t MessageByteShifted = MessageByte << 8;

    // Calculate the remainder
    Remainder = Remainder ^ MessageByteShifted;

    // Process the next 8 bits
    for (uint32_t BitIndex = 0; BitIndex < 8; ++BitIndex)
    {
        // Check the coefficient of polynomial n-1,
        // i.e. 1000 0000 0000 0000
        if (Remainder & 0x8000)
        {
            Remainder = (Remainder << 1) ^ Crc;
        }
        else
        {
            Remainder = Remainder << 1;
        }
    }

    // We need to mask 16bits on the remainder because we are ignoring the n coefficient of the crc polynomial
    // so we are not guaranteeing the n coefficient at remainder and crc to cancel each other
    Remainder = Remainder & 0xffff;
}

return Remainder;

If we encode the polynomial coefficients using a LSB endianess the code simplifies to:

// Binary message
// 00010000 00010001 00010010 00010011
// 00010100 00010101 00010110 00010111
constexpr uint8_t Message[] {
        0x10, 0x11, 0x12, 0x13,
        0x14, 0x15, 0x16, 0x17
    };

// Least significant bit first (little-endian)
// 1+x5+x12+(x16) = 1000 0100 0000 1000 (1) = 0x8408
constexpr uint32_t Crc = 0x8408;
constexpr uint32_t CrcN = 16;

// Result of the CRC function
uint32_t Remainder = 0;

for (uint8_t MessageByte : Message)
{
    // Calculate the remainder
    Remainder = Remainder ^ MessageByte

    // Process the next 8 bits
    for (uint32_t BitIndex = 0; BitIndex < 8; ++BitIndex)
    {
        // Check the coefficient of polynomial n-1, i.e. 0000 0001
        if (Remainder & 0x1)
        {
            Remainder = (Remainder >> 1) ^ Crc;
        }
        else
        {
            Remainder = Remainder >> 1;
        }
    }
}

CRC computation using Dilip V. Sarwate algorithm

Sarwate's algorithm consists of a table containing 256 rows where each element represents the remainder of a polynomial division:

remainder \left( \frac{T_p(x)}{g(x)} \right)=r(x)

Such that:

0 \leq p \leq255

And:

T_p(x)=i_7x^7 + i_6x^6 + ... + i_2x^2 + i_1x^1 + i_0x^0

Where:

DecimalToBinary(p) = (i_7,i_6,i_5,i_4,i_3,i_2,i_1,i_0)

This way, if we wanted to calculate the remainder of:

remainder \left( \frac{x^6+x^4+x^3+x+1}{g(x)} \right)=r(x)

We would only need to look-up an entry from a table T.

x^6+x^4+x^3+x+1 \leftrightarrow 0b01011011 \leftrightarrow 91

remainder \left( \frac{T_{91}(x)}{g(x)} \right)=F[91]

Next example shows how to compute a CRC16 from a 4 bytes long message.

Iteration                                                                           Result bytes
            A3          A2              A1                      A0                  A{-1}           A{-2}
0:          T=A3        C1=F1[T]        C0=F0[T]
1:                      T=C1+A2         C1=C0+F1[T]             C0=F0[T]
1:                                      T=C1+A1                 C1=C0+F1[T]         C0=F0[T]
1:                                                              T=C1+A0             C1=C0+F1[T]     C0=F0[T]

In the above example consider that only one byte is loaded from the message at a time, e.g. at iteration 0 only A3 is loaded, at iteration 1 only A2 and so on.

For a detailed explanation on how to generate the table F[T] please refer to appendix at Sarwate's paper.

Further lecture

Sarwate's work was extendend by Kounavis and Berry to perform 4 bytes and 8 bytes per iteration, this is the most performant software implementation so far and is widely used in a lot of mainstream systems like the Linux kernel and Unreal Engine (Requires access to the github repo).

Check the slide by 8 implementation used by Unreal with a lot of extra comments here

Credits

Written by Romualdo Villalobos