Giter Club home page Giter Club logo

nanocobs's Introduction

nanocobs

nanocobs is a C99 implementation of the Consistent Overhead Byte Stuffing ("COBS") algorithm, defined in the paper by Stuart Cheshire and Mary Baker.

Users can encode and decode data in-place or into separate target buffers. Encoding can be incremental; users can encode multiple small buffers (e.g. header, then payloads) into one target. The nanocobs runtime requires no extra memory overhead. No standard library headers are included, and no standard library functions are called.

Rationale

Some communication buses (e.g. two-wire UART) are inherently unreliable and have no built-in flow control, integrity guarantees, etc. Multi-hop ecosystem protocols (e.g. Device (BLE) Phone (HTTPS) Server) can also be unreliable, despite being comprised entirely of reliable protocols! Adding integrity checks like CRC can help, but only if the data is framed; without framing, the receiver and transmitter are unable to agree exactly what data needs to be retransmitted when loss is detected. Loss does not always have to be due to interference or corruption during transmission, either. Application-level backpressure can exhaust receiver-side storage and result in dropped frames.

Traditional solutions (like CAN) rely on bit stuffing to define frame boundaries. This works fine, but can be subtle and complex to implement in software without dedicated hardware.

nanocobs is not a general-purpose reliability solution, but it can be used as the lowest-level framing algorithm required by a reliable transport.

You probably only need nanocobs for things like inter-chip communications protocols on embedded devices. If you already have a reliable transport from somewhere else, you might enjoy using that instead of building your own :)

Why another COBS?

There are a few out there, but I haven't seen any that optionally encode in-place. This can be handy if you're memory-constrained and would enjoy CPU + RAM optimizations that come from using small frames. Also, the cost of in-place decoding is only as expensive as the number of zeroes in your payload; exploiting that if you're designing your own protocols can make decoding very fast.

None of the other COBS implementations I saw supported incremental encoding. It's often the case in communication stacks that a layer above the link provides a tightly-sized payload buffer, and the link has to encode both a header and this payload into a single frame. That requires an extra buffer for assembling which then immediately gets encoded into yet another buffer. With incremental encoding, a header structure can be created on the stack and encoded into the target, then the payload can follow into the same target.

Finally, I didn't see as many unit tests as I'd have liked in the other libraries, especially around invalid payload handling. Framing protocols make for lovely attack surfaces, and malicious COBS frames can easily instruct decoders to jump outside of the frame itself.

Metrics

It's pretty small, and you probably need either cobs_[en|de]code_tinyframe or cobs_[en|de]code[_inc*], but not both.

❯ arm-none-eabi-gcc -mthumb -mcpu=cortex-m4 -Os -c cobs.c
❯ arm-none-eabi-nm --print-size --size-sort cobs.o

0000011c 0000001e T cobs_encode_inc_end    (30 bytes)
0000007a 00000022 T cobs_encode_inc_begin  (34 bytes)
00000048 00000032 T cobs_decode_tinyframe  (50 bytes)
0000013a 00000034 T cobs_encode            (52 bytes)
00000000 00000048 T cobs_encode_tinyframe  (72 bytes)
0000009c 00000080 T cobs_encode_inc        (128 bytes)
0000016e 00000090 T cobs_decode            (144 bytes)
Total 1fe (510 bytes)

Usage

Compile cobs.c and link it into your app. #include "path/to/cobs.h" in your source code. Call functions.

Encoding With Separate Buffers

Fill a buffer with the data you'd like to encode. Prepare a larger buffer to hold the encoded data. Then, call cobs_encode to encode the data into the destination buffer.

char decoded[64];
unsigned const len = fill_with_decoded_data(decoded);

char encoded[128];
unsigned encoded_len;
cobs_ret_t const result = cobs_encode(decoded, len, encoded, sizeof(encoded), &encoded_len);

if (result == COBS_RET_SUCCESS) {
  // encoding succeeded, 'encoded' and 'encoded_len' hold details.
} else {
  // encoding failed, look to 'result' for details.
}

Decoding

Decoding works similarly; receive an encoded buffer from somewhere, prepare a buffer to hold the decoded data, and call cobs_decode. Decoding can always be performed in-place, since the encoded frames are always larger than the decoded data. Simply pass the same buffer to the encoded and decoded parameters and the frame will be decoded in-place.

char encoded[128];
unsigned encoded_len;
get_encoded_data_from_somewhere(encoded, &encoded_len);

char decoded[128];
unsigned decoded_len;
cobs_ret_t const result = cobs_decode(encoded, encoded_len, decoded, sizeof(decoded), &decoded_len);

if (result == COBS_RET_SUCCESS) {
  // decoding succeeded, 'decoded' and 'decoded_len' hold details.
} else {
  // decoding failed, look to 'result' for details.
}

Incremental Encoding

Sometimes it's helpful to be able to encode multiple separate buffers into one target. To do this, use the cobs_encode_inc family of functions: initialize a cobx_enc_ctx_t in cobs_encode_inc_begin, then call cobs_encode_inc multiple times, and finish encoding with cobs_encode_inc_end.

cobs_enc_ctx_t ctx;
char encoded[128];
cobs_ret_t r = cobs_encode_inc_begin(encoded, 128, &ctx);
if (r != COBS_RET_SUCCESS) { /* handle the error */ }

char header[8];
unsigned const header_len = get_header_from_somewhere(header);
r = cobs_encode_inc(&ctx, header, header_len); // encode the header
if (r != COBS_RET_SUCCESS) { /* handle the error */ }

char const *payload;
unsigned const payload_len = get_payload_from_somewhere(&payload);
r = cobs_encode_inc(&ctx, payload, payload_len); // encode the payload
if (r != COBS_RET_SUCCESS) { /* handle the error */ }

unsigned encoded_len;
r = cobs_encode_inc_end(&ctx, &encoded_len);
if (r != COBS_RET_SUCCESS) { /* handle your error, return / assert, whatever */ }

/* At this point, |encoded| contains the encoded header and payload.
   |encoded_len| contains the length of the encoded buffer. */

Encoding "Tiny Frames"

If you can guarantee that your payloads are shorter than 254 bytes, then you can use the "tinyframe" API, which lets you both decode and encode in-place in a single buffer. The COBS protocol requires an extra byte at the beginning and end of the payload. If encoding and decoding in-place, it becomes your responsibility to reserve these extra bytes. It's easy to mess this up and just put your own data at byte 0, but your data must start at byte 1. For safety and sanity, cobs_encode_tinyframe will error with COBS_RET_ERR_BAD_PAYLOAD if the first and last bytes aren't explicitly set to the sentinel value. You have to put them there.

(Note that 64 is an arbitrary size in this example, you can use any size you want up to COBS_TINYFRAME_SAFE_BUFFER_SIZE)

char buf[64];
buf[0] = COBS_TINYFRAME_SENTINEL_VALUE; // You have to do this.
buf[63] = COBS_TINYFRAME_SENTINEL_VALUE; // You have to do this.

// Now, fill buf[1 .. 63] with whatever data you want.

cobs_ret_t const result = cobs_encode_tinyframe(buf, 64);

if (result == COBS_RET_SUCCESS) {
  // encoding succeeded, 'buf' now holds the encoded data.
} else {
  // encoding failed, look to 'result' for details.
}

Decoding "Tiny Frames"

cobs_decode_tinyframe is also provided and offers byte-layout-parity to cobs_encode_tinyframe. This lets you, for example, decode a payload, change some bytes, and re-encode it all in the same buffer:

Accumulate data from your source until you encounter a COBS frame delimiter byte of 0x00. Once you've got that, call cobs_decode_inplace on that region of a buffer to do an in-place decoding. The zeroth and final bytes of your payload will be replaced with the COBS_TINYFRAME_SENTINEL_VALUE bytes that, were you encoding in-place, you would have had to place there anyway.

char buf[64];

// You fill 'buf' with an encoded cobs frame (from uart, etc) that ends with 0x00.
unsigned const length = you_fill_buf_with_data(buf);

cobs_ret_t const result = cobs_decode_tinyframe(buf, length);
if (result == COBS_RET_SUCCESS) {
  // decoding succeeded, 'buf' bytes 0 and length-1 are COBS_TINYFRAME_SENTINEL_VALUE.
  // your data is in 'buf[1 ... length-2]'
} else {
  // decoding failed, look to 'result' for details.
}

Developing

nanocobs uses doctest for unit and functional testing; its unified mega-header is checked in to the tests directory. To build and run all tests on macOS or Linux, run make -j from a terminal. To build + run all tests on Windows, run the vsvarsXX.bat of your choice to set up the VS environment, then run make-win.bat (if you want to make that part better, pull requests are very welcome).

The presubmit workflow compiles nanocobs on macOS, Linux (gcc) 32/64, Windows (msvc) 32/64. It also builds weekly against a fresh docker image so I know when newer stricter compilers break it.

nanocobs's People

Contributors

charlesnicholson avatar oreparaz avatar redfast00 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nanocobs's Issues

SPDX License headers

Thank you for the extremely free license for this, this definitely makes my life easier. For me, it would be useful if the license information is tracked per-file with standard SPDX headers. https://spdx.dev/learn/handling-license-info/#why. Is this something you would be willing to implement? I can also make a PR if that's easier for you.

// SPDX-License-Identifier: Unlicense OR 0BSD

invalid frame case for cobs_inplace_decode

See here for an executable version: https://godbolt.org/z/MdT5befeq

Driver Program:

#include "cobs.h"
#include <iostream>

using namespace std;
int main()
{
    char buf[] = {0x2, 0, 0};
    auto ret = cobs_decode_inplace(buf, sizeof(buf));

    cout << "returned COBS_RET_SUCCESS: " << (ret == COBS_RET_SUCCESS) << endl;

    cout << "buffer: " << hex << endl;
    for(int i = 0; i < sizeof(buf); i++)
    {
        cout << (int)(buf[i]) << " ";
    }
    cout << endl;
}

Output:

returned COBS_RET_SUCCESS: 1
buffer: 
5a 0 5a 

Expected Output:
Return COBS_RET_ERR_BAD_PAYLOAD since the input was invalid due to the intermediate 0 byte.

Incremental decoder

Handy for sending COBS frames over a streaming link where you don't want or need to have the entire frame in RAM. Get a chunk, decode it, do whatever you want with the newly-decoded chunk, flush it, get another chunk, repeat.

Probably just the natural opposite of the incremental encoder.

`cobs_decode_inplace` does not work for >=254 consecutive non-zero bytes

cobs_decode_inplace claims the decoded data sits in 1 ... size - 1, but this is clearly wrong since if there is 254 consecutive non-zero bytes, the next byte (for the next header) shuold be truncated and not be replaced with 0. Apparaently current implementation of cobs_decode_inplace doesn't seem to consider this case, and indeed round-trip tests for such input data fails.

Aside from asking for a fix, is there any intention on why you didn't include round_trip_inplace in these two cases? Adding them indeed made the test fail.

Rewrite `cobs_encode_inc` to support small buffers

Make cobs_encode_inc look like cobs_decode_inc - each call to cobs_encode_inc should take src + dst pointers, so that you don't need enough RAM to hold a fully-encoded buffer if you don't need that.

encode API overhaul: incremental encoding

If the frames you're encoding have a user-supplied payload and a link-layer-supplied header, it costs an extra buffer and copy to make the final frame contiguous- you have to do something wasteful like this:

unsigned char final_buf[128];
header_t const h = {.magic = 0x12, .len = payload_len}; // packed
memcpy(final_buf, &h, sizeof(h));
memcpy(&final_buf[sizeof(h)], payload, payload_len);

If the COBS encoding interface instead exposed a context struct, the api could look like this

cobs_ret_t cobs_encode_inc_begin(void *dst, unsigned dst_max, cobs_encode_ctx_t *out_ctx);
cobs_ret_t cobs_encode_inc(cobs_encode_ctx_t *ctx, void const *src, unsigned src_len);
cobs_ret_t cobs_encode_inc_end(cobs_encode_ctx_t *ctx, unsigned *out_enc_len);

Then the current cobs_encode function could be reimplemented in terms of begin/inc/end.

Users without full header/payload frame buffers would then call

unsigned char buf[128]; // or whatever
unsigned buf_len;

cobs_encode_ctx_t ctx;
cobs_encode_inc_begin(buf, buf_len, &cobs_ctx);
cobs_encode_inc(&ctx, header, sizeof(header));
cobs_encode_inc(&ctx, payload, payload_len);

unsigned enc_len;
cobs_encode_inc_end(&ctx, &enc_len);

phy_tx(buf, enc_len);

cobs_encode_inc and cobs_encode_inc_end need return values for buffer exhaustion.

thorough unit tests for `cobs_decode`

It's the straggler. Enough of the other tests (wiki, paper, encode, decode-inplace) work that I'm pretty sure it's good, but it's bad to leave it untested.

Variable length decoding in place, where ends the data?

We get in variable-length COBS encoded messages. In the README, there's an example with a fixed, 64-byte long buffer:

char buf[64];

// You fill 'buf' with an encoded cobs frame (from uart, etc) that ends with 0x00.
unsigned const length = you_fill_buf_with_data(buf);

cobs_ret_t const result = cobs_decode_inplace(buf, length);
if (result == COBS_RET_SUCCESS) {
  // decoding succeeded, 'buf' bytes 0 and length-1 are COBS_SENTINEL_VALUE.
  // your data is in 'buf[1 ... length-2]'
} else {
  // decoding failed, look to 'result' for details.
}

However, if the buffer buf is variable-length and for example 10000 bytes long, I don't think the last valid byte will be at length-2. Is there an easy way to figure out the length of the decoded message? Maybe decode_inplace can in the len parameter take a pointer to an int, and update it to the decoded length?

Unsigned integer underflow

When running the tests using the code from #39 with the addition of the integer option to -fsanitize, I get this warning:

cobs.c:107:17: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'unsigned int'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.