10 minute read

Introduction

Early last year, James Munns posed this “fun Sunday puzzle”:

Here's a fun Sunday puzzle for data minded people out there: * COBS is a bit-stuffing technique, it changes the bitstream when encoding * CRC is designed to defend against corruptions of the EXACT bit pattern on the wire If I want to frame my messages with COBS, how do I also CRC them?

— James Munns (@jamesmunns.com) January 26, 2025 at 7:03 AM

I had some initial ideas that were dead ends. And eventually I came to the same conclusion others suggested: adding an extra “counter” byte to ensure the CRC was non-zero could work. But the discussion online raised concerns as to whether this solution was bounded or efficient.

A few months later (and stuck home recovering from the flu) this puzzle caught my interest again. I spent some time checking the boundedness and handling some COBS-encoding edge cases. I convinced myself it would work in a practical way.

In this article I’ll lay out the approach I settled on and answer the boundedness question.

Background

In case the reader isn’t familiar with COBS encoding or the CRC problem, here’s a brief description and some links with background information. Otherwise, skip to the solution.

COBS Encoding

COBS, or Consistent Overhead Byte Stuffing is a technique for encoding frames (think packets or records) so they don’t contain one special byte value, called the sentinel. The input packets can be any length and contain arbitrary binary data. After encoding, the frames can be concatenated into a byte stream or transmitted on a serial interface, with sentinel bytes between frames to denote their boundaries.

COBS Encoding Records into a byte stream

Diagram 1: COBS encoded records in a data stream.

The encoding process produces frames that are longer than the original data but only slightly so. The worst case bound is 1 additional byte per 254 bytes of original data.

CRC

CRCs, or Cyclic Redundancy Checks, are frequently used to ensure data integrity when bit errors are possible during transmission or retrieval.

CRC Generation and Checking

Diagram 2: CRC computation and check processes.

The technique involves passing all bits of a frame through a computation that produces a small check value. This is then appended to the original data for storage or transmission.

On reception, the same calculation is performed but if the received check value doesn’t match the computed one, the received frame is treated as corrupted.

A given CRC algorithm can guarantee some minimum number of bit errors are always detectable, given the length of the input frame. Some also have good properties for multiple bit errors in a short span of the transmission.

One particulary popular CRC algorithm is CRC-32.

CRC for COBS framed data?

Naturally, there are cases where we would like to have CRC checks for data frames that are also COBS encoded. Here we run into a subtle problem, though.

An obvious problem appears if we perform COBS encoding, then generate a CRC value in the usual way. The computed CRC value may contain bytes that match the sentinel value. Those will be treated as end-of-frame markers at the receiving end result in corrupted frames.

COBS encoding followed by CRC

Diagram 3: COBS framing followed by CRC.

Instead, we could compute a CRC on the data first, then COBS encode the whole thing. Now the COBS deframing and decoding work fine. And after we have the decoded frame, we can check the CRC. This isn’t bad. But the fact that the CRC-protected data was transformed by COBS means the protection against bit errors is weakened. In the initial bluesky post, James pointed out how a similar situation in CAN bus encoding reduces the effectiveness of CRCs there. This is what we want to prevent.

CRC Generation followed by COBS encoding

So can we have both COBS framing and fully effective CRC checks?

Solution Approach

To solve this problem, we’re going to construct a frame that is simultaneously:

  • A valid COBS-encoded frame
  • A valid CRC-32 protected frame

(For the rest of this article, I’ll assume we are working with CRC-32.)

The next sections describe the construction and de-construction of such frames. Implementation and efficiency concerns are addressed afterwards.

Framing / Transmit process

To build such a frame, we begin by augmenting the original payload data with a 6 byte trailer. We’ll name these bytes S, F, C1, C2, C3, C4.

Augmented Record

S will have the value of sentinel (usually zero)

F is the fixer value. (Must be non-sentinel value)

C1 - C4 are placeholders for the CRC-32 value. (Must be non-sentinel values.)

The last 5 bytes have values that aren’t determined yet but we can set them to any non-sentinel value for now. They will be changed to other non-sentinel values in the following steps.

Now we COBS encode this augmented frame.

The presence of a sentinel as the first byte of the trailer causes the COBS byte stuffing counter to be reset. This guarantees the remaining non-sentinel bytes will pass through the COBS encoding unchanged.

Encoded Augmented Record

Now we can construct the CRC for this frame.

First set the fixer value, F, to sentinel+1. Then compute the CRC for the COBS encoded frame through F. Check to see if the CRC contains a byte with the sentinel value. If not, we adopt this value for F and set C1-C4 to the computed CRC.

If the CRC DOES contain the sentinel byte, try again with F set to sentinel+2. Repeat the check above and continue trying different values of F until a CRC is found that doesn’t contain the sentinel.

CRC in Encoded Frame

Because the values for F, C1-C4, were non-sentinel values during COBS encoding and are still non-sentinel values, we can modify these in the encoded frame without breaking the COBS-encoded properties.

And because C1-C4 form a valid CRC of the COBS encoded data through the fixer byte, F, we have full CRC protection of this frame.

Deframing / Receive process

Upon reception or retrieval of these frames, we can rely on the COBS properties to find the frame boundaries using the sentinel in the usual way.

The CRC should then be checked on the COBS-encoded frame. The last four bytes can be checked against a CRC computed over the first N-4 bytes. (Or CRC checks frequenty compute a CRC over all bytes and expect a zero result.)

Finally, if the CRC check passes, COBS decoding can proceed to convert the received data back into the original data frame. The output will include the 6 byte trailer that accomodated the CRC calculation. This can be discarded and the result is the original data frame.

Deframing COBS with CRC

Implementation tips

To achieve an efficient implementation, several optimizations can be made to the process described above.

Instead of actually appending the trailer to the original data, a COBS encoder can be fed the original data followed by the single byte S. Then the COBS encoding of that data can be read from the encoder.

And instead of repeatedly computing the CRC over the whole COBS-encoded frame with various fixers, F, we can do the CRC calculation over just the encoding through S. The state of this calculation can then be saved before adding in the trial value for the fixer. If the first fixer value fails, simply restore the CRC state and try with the next value.

When a fixer value, F, is found, it and the corresponding C1-C4 can be appended to the output frame.

When receiving COBS data, frame delimiting can proceed in the usual way. The frame’s contents should be passed simultaneously to both a COBS decoder and a CRC computation. Upon receiving a sentinel, the CRC computation can be finalized and the COBS reception finalized if the CRC is good. The decoded COBS frame will have the 6-byte trailer appended so the user of the data should be given a slice that doesn’t include that.

Boundedness

The process of repeatedly trying CRC calculations until one fits may seem problematic. As mentioned above, though, each additional try to fix the CRC should only have a cost of adding one byte to a CRC calculation in progress.

But how many tries are needed? And is it guaranteed that a fix will always be found?

To get an intuitive feel for this, consider that the CRC calculation produces a random-ish 32-bit value. (That is, all 2^32 values are produced with uniform distribution over all possible inputs.) The chances that such a CRC-32 value contains a sentinel is about 4/256. (Because each of the four bytes has a chance of 1/256 of being zero.) If we repeat that process twice, we have about 1/2^12 chance of failing. After six trials the odds of failing are 1/2^36. Since there are only 2^32 possible CRC-32 values, it seems likely that all failure cases might be eliminated at that point.

We can check this by brute force. Simply create a CRC calculator, seed it with all 2^32 possible states, and for each one, try to produce a COBS-compliant CRC value using fixer values of 1, 2, 3,…. When I did this, the expected distribution of tries was confirmed and no CRC state needed more than five fix attempts. (In the test code below, I check for COBS compliance by ensuring none of the four CRC bytes are either 0 or 0xFF. I wanted this to be robust in the presence of an inverted bit stream.)

use crc32fast::Hasher;

#[cfg(test)]

mod test {
        #[test]
    fn test_fixability() {
        // Count how many initial states required 1, 2, 3... fixups before success.
        let mut counts = [0_usize; 7];

        // Values to add to CRC32, when attempting to fix it. (Entry 0 not used.)
        const FIXUP: [[u8; 1]; 7] = [[0], [1], [2], [3], [4], [5], [6]];

        // Test all possible CRC32 states
        for crc_state in 0..=0xFFFFFFFF {
            let mut fixed = false;

            // Try fixing it with up to 6 different values.
            for n in 1..=6 {

                // Start off with CRC calculation in state n
                let mut hasher = Hasher::new_with_initial(crc_state);

                // Fix CRC with fixup n
                hasher.update(&FIXUP[n]);

                let crc = hasher.finalize();

                // Test whether this crc is COBS-compliant.
                let bad = ((crc & 0xFF000000) == 0x00000000) || 
                          ((crc & 0xFF000000) == 0xFF000000) ||
                          ((crc & 0x00FF0000) == 0x00000000) ||
                          ((crc & 0x00FF0000) == 0x00FF0000) ||
                          ((crc & 0x0000FF00) == 0x00000000) ||
                          ((crc & 0x0000FF00) == 0x0000FF00) ||
                          ((crc & 0x000000FF) == 0x00000000) ||
                          ((crc & 0x000000FF) == 0x000000FF);

                if !bad {
                    // we fixed it!
                    fixed = true;
                    counts[n] += 1;  // record how many tries this needed.// 
                    break;           // go on to test next CRC state
                }

            }

            // If it didn't work in 6 tries, panic.
            assert!(fixed, "CRC wasn't fixed after the max attempts.");
        }

        // Print distribution of fixup attempts
        for n in 1..=6 {
            println!("Fixed with {n} tries: {}", counts[n]);
        }
    }
}

The results show the vast majority of cases are fixed on the first try and none require more than 5 tries.

running 1 test
test cobs_crc32::test::test_fixability has been running for over 60 seconds
Fixed with 1 tries: 4162314256
Fixed with 2 tries: 129556240
Fixed with 3 tries: 3048224
Fixed with 4 tries: 48192
Fixed with 5 tries: 384
Fixed with 6 tries: 0
test cobs_crc32::test::test_fixability ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 4 filtered out; finished in 218.01s

So all possible CRC-32 states can be fixed, and the cost to find the fix is bounded to 5 bytes of CRC calculation.

Cost Analysis

If we accept the storage space and processing time of COBS-encoding the original data and performing a CRC-32 calculation as baseline costs, what are the additional costs required for this approach?

The 6-byte trailer is 2 bytes longer than a simple CRC alone. And, worst case, these extra bytes could incur the overhead of one byte of COBS stuffing. So three bytes is the maximum storage/transmission overhead. (Two bytes is the typical case.)

And the worst case time needed to find the right fixer value is 5x the time to reset the CRC state and process one byte. On average it’s only 1.016 times the cost to process one byte, since the first try works most of the time.

These are constant, small, time and space requirements.

Conclusion

By adding two bytes after the payload data, one to reset the COBS byte stuffing count and the other to allow fixing the CRC value, it is possible to use COBS encoding and CRC-32 together without compromising CRC error detection performance.

The overhead cost of doing so is quite low, even in the worst case.