Giter Club home page Giter Club logo

Comments (5)

irauta avatar irauta commented on September 28, 2024 2

I guess the main problem with little-endian data is that there needs to be some higher-level type around than just the slice of bytes and the bits within them - endianness makes only sense for multi-byte types.

After pondering about this, I came up with a potential way to tell to the reader that the upcoming bits come from a type stored in little-endian order. Here's an imaginary use case with lots of comments:

// The "reader" variable is a BitReader, in a usable state
// Just some regular method call for context, this is just BitReader doing what
// it normally does:
let some_value = reader.read_u8(8)?;
   
// To read bits in little-endian order you need to create a "sub-reader",
// that's valid only for a small amount of bits:
let le = reader.read_le_32()?;
// "le" can provide up to 32 bits (the alternatives would be 16 and 64),
// read from a u32 variable stored in little-endian order in the byte slice.
// Also read_le_* takes the reader as a mutable reference, so the main
// reader is not available during its existence

// Reading bits with it looks as expected, it's just that source data is
// interpreted differently
let le_value = le.read_u16(14)?;
let another_le_value = le.read_u8(8)?;
// ...and so on, the little-endian reader provides similar methods as BitReader

// To stop using the sub-reader and gain back control of the main BitReader, you need
// to "close" it:
le.close();
// "close" can't fail, and will always increment the cursor position by
// 32 bits (or 16 or 64, depending on the originally requested amount)

The close method could return a &mut BitReader, to allow chaining:

let le = reader.read_le_32()?;
let x = le.read_u8(num_bits)?;
let y = le.read_u8(more_bits)?;
// ...and more calls to le.read_* here, up to 32 bits, until finally another sub-reader
// is set up to interpret the next 32 bits in little-endian order:
let le = le.close().read_le_32()?;
let z = le.read_u16(bits_never_end)?;

For another perspective, some fn signatures in pseudo-Rust:

impl BitReader {
    // The return types actually should have some lifetime parameter too,
    // or did lifetime elision work here too, can't remember right now.
    fn read_le_16(&mut self) -> LittleEndianReader16;
    fn read_le_32(&mut self) -> LittleEndianReader32;
    fn read_le_64(&mut self) -> LittleEndianReader64;
}

impl LittleEndianReader32 {
    fn read_u8(&mut self, bits: u8) -> BitReaderResult;
    fn read_u16(&mut self, bits: u8) -> BitReaderResult;
    // etc. for all the types

    // Consumes self, advances the BitReader cursor to the end of 32-bit segment
    fn close(self) -> &mut BitReader;
    // Maybe there should be a version that *doesn't* advance the cursor, to allow "peeking"
    // into the upcoming data
    fn cancel(self) -> &mut BitReader;
}

from bitreader.

irauta avatar irauta commented on September 28, 2024

Does the OpenSimulator protocol really use that complicated encoding, that the bits have to be shifted, or padded as you phrase it, before doing endianness conversion?

I honestly don't have particularly wide experience in bit-level stuff in general - I ended up making this crate because I was interested in the MPEG-2 format, which seems pretty friendly in the sense that big endian support is enough for pretty much all parts of it I came across.

Anyway, I suppose one reason for strange requirements to extract a value could be that some original implementation has just cast a piece of memory into a struct of u32, u16 and other wider types while assuming a little endian platform, and only then smaller bitfields are extracted with shifts and masking.

from bitreader.

leoschwarz avatar leoschwarz commented on September 28, 2024

I'm not sure yet what all the specifics of the "protocol" (honestly it's more of a combination of an actual protocol usually called "LLUDP" and many ressource encodings) entail, but for one decoder I have already encountered 10 bit long (and at times dynamic size) unsigned integers, which tend to be encoded in little endian and not big endian. I cannot really say yet what the exact need is since I haven't been able to perform a successful test of the decoder I'm working on the last couple days. :')

But anyway I'll probably write my own Reader/Decoder helper anyway, if I end up needing even more specific things, maybe it would be better to see what kind of little endian support other people need, if there's any such demand at all.

from bitreader.

irauta avatar irauta commented on September 28, 2024

I was thinking about my earlier comment, and the assumption that the endianness comes from byte level of the source data. The following may have some fatal error in it. In a big endian encoding a 14 bit value could span three bytes like this (just a contrived example of a hard case; the numbers are there just to show from which byte each bit comes and x marks an irrelevant bit):

xxxxxxx1 22222222 33333xxx

But if the bytes were in little endian order, the same bits would be laid down on the consecutive bytes like this:

33333xxx 22222222 xxxxxxx1

That is not a problem at all, if the sequence of bytes is cast into a u32 (just imagine an xxxxxxxx as a most significant byte after the three bytes to fill out the entire u32). Then extracting the value would only require a >> 3 and a &. However, it's hard to imagine how the big-endian-by-default BitReader should act when read_u32_little_endian(14) were called - where should the cursor end up after the call?

I don't mean to discourage you by any means, but little-endian reading just started to seem a lot harder to handle.

from bitreader.

leoschwarz avatar leoschwarz commented on September 28, 2024

I think what I requested is very application specific, but at the same time the functionality of this crate also kind of is. There really are many points where bit encoding might be done differently.

One thing I'm still thinking about (but it's rather somewhere on my backlog → right now I'm going to use the code that I linked, which was wrong btw but I fixed it in the must ugly way right now) is whether someone could use the type system to generate a reader type with many configuration options, which would hopefully benefit from compile time optimizations. Or the easier option would be to provide the reader with a Config struct instance carrying information about the encoding which would be applied at run time, this would also allow the construction through a builder struct, however it's not zero cost for users of the current implementation.

Configuration like that could allow for parameters like cursor_follow: enum { Bit | Byte } and then it would clear "where the cursor ends up after the call". Or maybe I misunderstood the exact problem you mentioned.

from bitreader.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.