Giter Club home page Giter Club logo

Comments (10)

xogeny avatar xogeny commented on August 29, 2024

My recollection is that the issue I ran into was that the msgpack module only allowed writing objects, not raw ints. But I'll investigate further.

from recon.

xogeny avatar xogeny commented on August 29, 2024

I recognized one issue with this (and perhaps why I didn't do this in the first place). The implementation of msgpack that we're using doesn't just read one object at a time from a stream. It uses a whole buffer to do so. So it won't just read the bytes of the length, it will read additional stuff in the next object. Furthermore, when you unpack from a stream containing more than one object, you get an exception.

I see two possibilities (and I'll be exploring those shortly). The first is always to write the longest possible int. That is a little bit wasteful, but then we'll know exactly how many bytes to read in when we read the file and we can read them into a small memory buffer and have msgpack unpack that.

Another alternative would be to manually decode the integer by first reading the "type" byte to know how long the int was and then to read the rest of the int. The problem here is that it will trigger to reads just to get one integer.

In the first case, we need to implement our own write method but reads can use the normal methods. For the second option, we'd need to create our own read method. This means that in either case we'll need to re-implement one of those methods. Since reading is more common than writing, it seems like we should go with the first approach since it is likely to have the least impact on other implementors (since most will probably only read, not write).

from recon.

xogeny avatar xogeny commented on August 29, 2024

OK, I came up with a bit of a compromise. What I did was to write the lengths in normal msgpack format (normal write). Then, when reading, I make a single read of the worst case number of bytes (9). Then, looking at the first byte I can determine how many bytes I actually need. Then I send only those bytes to the msgpack.packb method to get the resulting integer and then I seek backward to make sure that the file pointer is aligned with the start of the next object in the stream.

This assumes that the file we are dealing with has a seek method, but that is already assumed by the library (in order to make multiple passes over entries for the wall format and to extract the specific bytes requires for a signal in the meld format).

In this way, the minimum number of reads is maintain, no further assumptions about the underlying file object are required and all data (with #47) is read and written in msgpack format.

from recon.

harmanpa avatar harmanpa commented on August 29, 2024

This seems a complex way of simplifying the format. When writing in crecon, we write to a memory buffer first, leaving 4 bytes free before each row. Once the row is complete we fill in the 4 bytes with the length of the row. If it isn't a fixed size, we have to put the whole row in a buffer, measure it's length and then copy the row buffer into the overall buffer after writing the size.

from recon.

xogeny avatar xogeny commented on August 29, 2024

Is your concern about the entries in the wall format specifically? I ask because I'm scratching my head now wondering why we worry about the length? We can always determine the length of the entry by actually reading the entry. I think Martin was trying to explain this issue to me and I didn't quite get it.

So the question is...what are the complications if we just skip the lengths for entries entirely? We can always cache the offsets of the entries when we read the file if necessary. But as I see it (now), the wall file isn't really a "random access" format anyway, so we shouldn't really worry so much about jumping around inside of it.

If we do away with entry lengths, then it seems like your concern goes away AND we simplify the format even more. Comments?

from recon.

harmanpa avatar harmanpa commented on August 29, 2024

I think that would work, it does require slightly changing the approach when reading.

from recon.

xogeny avatar xogeny commented on August 29, 2024

True. But I think from a performance standpoint, that might be required anyway. I've been benchmarking some of the wall reading and conversion stuff and I think the performance could improve significantly with such changes.

from recon.

xogeny avatar xogeny commented on August 29, 2024

Hmm...the problem that I see here is that the Python implementation we are using is very unhappy if you ask it to unpack an array of bytes and there are any extra bytes left over. It would be perfect if it had a mode that simply unpacked the next value and told you how many bytes it consumed. But it doesn't work that way.

Obviously, we could fork it and add such functionality. But the question for me is...how do other implementations handle this situation? Is it reasonable to expect that someone trying to read a wall or meld file will be able to read a sequence of data that isn't contained in an array (using an array is impractical for several reasons).

Comments?

from recon.

xogeny avatar xogeny commented on August 29, 2024

OK, I found a solution for the Python implementation of msgpack. What you do is use the Unpacker class as an generator, e.g.,

for o in msgpack.Unpacker(fp):
    print o # Do whatever you want to with the object

In this way, you could deserialize the entries in a wall (the main issue, as I see it).

But will this work for crecon, jrecon and/or using msgpack implementations in other languages?

from recon.

xogeny avatar xogeny commented on August 29, 2024

@harmanpa Keep in mind there is one other "remedy" for this issue of having the size change. You always have the option, when writing, of using the longest possible representation in msgpack. That's what I did in OpenModelica. I was too lazy to add all the logic to figure out the most efficient packing so I always stored all integers using 8 bytes (even when they weren't required). So just keep that in mind as one way to simplify things.

from recon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.