Giter Club home page Giter Club logo

Comments (16)

simc avatar simc commented on June 9, 2024 2

Hey @Manuelbaun,

Flatbuffers is a great library but I did a lot of research (basically trial and error) when writing the Hive binary encoder. My goal was to make it as fast as possible. It turned out that some of the things Flatbuffers does (for example varints) might be okay in other languages but is quite slow in Dart.

I managed to achieve three important goals with the Isar binary implementation:

  1. To read a primitive, only a single lookup in the buffer is required. To read Strings or Lists, two lookups are needed.
  2. Serialization in Dart only requires a single allocation and no buffer resizing is required.
  3. Switching between nullable and non-nullable fields does not require migration.

The main benefit I see is that it does not require to deserialize the buffer and can read the values out of it directly.

My implementation also does not require deserialization and not even a vtable lookup.

There are implementations of it in various languages also Dart and TypeScript

Isar-web will not need to deal with serialization. I'll let IndexedDb do the heavy lifting ;)

Edit:

Here are samples of the generated serialize and deserialize methods:

@Bank()
class User extends IsarObject {
  String name;

  int age;

  bool isCustomer;
}
  void serialize(User object, RawObject raw) {
    var dynamicSize = 0;
    var nameBytes = utf8Encoder.convert(object.name);
    dynamicSize += nameBytes.length;
    var bufferSize = 17 + dynamicSize;
    var ptr = allocate<Uint8>(count: bufferSize);
    var buffer = ptr.asTypedList(bufferSize);
    var writer = BinaryWriter(buffer, 17);

    writer.writeInt(object.age);
    writer.writeBool(object.isCustomer);
    writer.writeBytes(nameBytes);

    raw.oid = object.id;
    raw.data = ptr;
    raw.length = bufferSize;
  }

  User deserialize(RawObject raw) {
    var buffer = raw.data.asTypedList(raw.length);
    var reader = BinaryReader(buffer);
    var object = User();
    object.age = reader.readIntOrNull();
    object.isCustomer = reader.readBoolOrNull();
    object.name = reader.readStringOrNull();
    return object;
  }

This line var bufferSize = 17 + dynamicSize is how Isar only needs a single allocation. It knows the static size of the object: 8 (int) + 1 (bool) + 8 (String offset & length) = 17 and it calculates the dynamic size.

from isar.

Manuelbaun avatar Manuelbaun commented on June 9, 2024 2

I did a simple comparison between hive binary_writer/reader vs flatbuffers vs message_pack. The results can be seen in the readme.md of my repo I created: https://github.com/Manuelbaun/serializer_libs_comparison. Before I ran those tests, I used dart2native on my windows machine.

As far as I can tell, I did not really see, that flatbuffer was slower than the hive encoder/decoder. But again, they are not the ISAR encoder/decoder. Honestly, I am not sure, if that actually reflects real use-cases.

I was looking into your binary Writer Implementation, and it got simpler compared to the Hive implementation. So you got rid of the field numbers completely?

How do you sort your fields? From the example, your class starts with a String name; and in your serialize method you write it last.

I like to contribute to this project somehow, let me know, how I can help you :)

from isar.

simc avatar simc commented on June 9, 2024 1

The results can be seen in the readme.md of my repo I created

Very cool! Thanks for comparing the libraries.

As far as I can tell, I did not really see, that flatbuffer was slower than the hive encoder/decoder

I'll need some time to check out your findings and the reason for the results.

So you got rid of the field numbers completely?

Yes, the Isar binary format does not have much in common with the Hive format. Even tho the classes look similar. Field numbers are no longer needed since the schema is known at compile time.

How do you sort your fields?

First by type: int, double, bool, string, bytes, List ... this ensures that fields with static size are first. Multiple fields with the same type are additionally sorted by name. This ensures a reproducible order.

I like to contribute to this project somehow, let me know, how I can help you

Do you know Rust or TypeScript? Otherwise I'll need some time to finish the basics. After that PRs will be very welcome :)

from isar.

simc avatar simc commented on June 9, 2024 1

I am not sure if sorting by name will work if you refactor your models.

Isar will know that the schema changed when the app is started and migrates all the data to the new schema.

but TypeScript

Great, I need help developing isar-web. There are some challenges to solve. Especially because the IndexedDB transactions auto commit. It will be tricky to make them usable with Dart Futures.

from isar.

Manuelbaun avatar Manuelbaun commented on June 9, 2024

I'll need some time to check out your findings and the reason for the results.

Yes, please and let me know what you find out. I might have not considered some cases or my assumption of how it works might be wrong. So curious, what you find out.

Yes, the Isar binary format does not have much in common with the Hive format. Even tho the classes look similar. Field numbers are no longer needed since the schema is known at compile time.

Yes, that makes sense. It also gets ride of few extra bytes, => smaller => faster πŸ‘

First by type: int, double, bool, string, bytes, List ... this ensures that fields with static size are first. Multiple fields with the same type are additionally sorted by name. This ensures a reproducible order.

I am not sure if sorting by name will work if you refactor your models. Assume you got the class:

class MyZooVersion1 {
  int id;
  String dog;
}

then you extend it into:

class MyZooVersion2 {
  int someId;
  String dog;
  String cat;
}

When you add a cat to your class, Cat will be sorted first, then Dog. When you read your old buffer with the new version, then the cat will have the name of the dog, while the doc will be something else (null?). Is my assumption correct?

Do you know Rust or TypeScript?

I don't know Rust, but TypeScript. Eventually, I will learn Rust.

from isar.

Manuelbaun avatar Manuelbaun commented on June 9, 2024

Isar will know that the schema changed when the app is started and migrates all the data to the new schema.

ok that is good to know

Great, I need help developing isar-web. There are some challenges to solve. Especially because the IndexedDB transactions auto commit. It will be tricky to make them usable with Dart Futures.

I will first fork it, Could you point me to where I should start looking, I've never done something with indexedDb :)

Edit:
I just saw, the isar-web repo is empty πŸ˜†

from isar.

simc avatar simc commented on June 9, 2024

I just saw, the isar-web repo is empty

Haha yes, it needs A LOT of work ^^

from isar.

MarcelGarus avatar MarcelGarus commented on June 9, 2024

Isar will know that the schema changed when the app is started and migrates all the data to the new schema.

Whoa, that is impressive. How does that work?
Let's say I rename a field from "foo" to "bar". I see that JSON is generated from the class, so are the two versions somehow compared during code generation? And what are the heuristics for determining a migration strategy? Seems super interesting.

from isar.

simc avatar simc commented on June 9, 2024

Whoa, that is impressive. How does that work?

Okay sorry I have not been clear and it is not as cool as it sounds xD

It just notices that you add or remove a field and transforms the existing data to be valid again e.g. it adds or removes the field everywhere.

from isar.

simc avatar simc commented on June 9, 2024

Because of #2 we might be forced to use FlexBuffers anyway.

from isar.

MarcelGarus avatar MarcelGarus commented on June 9, 2024

It just notices that you add or remove a field and transforms the existing data to be valid again e.g. it adds or removes the field everywhere.

Ah, I see πŸ˜… Still cool

from isar.

Manuelbaun avatar Manuelbaun commented on June 9, 2024

Because of #2 we might be forced to use FlexBuffers anyway.

I have not tried out the FlexBuffers yet. I am curious, how they compare to the plan Flatbuffers.
But I was looking a bit more into the Flatbuffer generated code:

  String get house => const fb.StringReader().vTableGet(_bc, _bcOffset, 6, null);
  Actor get playedBy => Actor.reader.vTableGet(_bc, _bcOffset, 8, null);
  int get age => const fb.Int32Reader().vTableGet(_bc, _bcOffset, 10, 0);
  String get firstSeen => const fb.StringReader().vTableGet(_bc, _bcOffset, 12, null);

If you want to access the attribute from the buffer, you call the Reader and then vTableGet method. The problem I see here now is that the vTableGet will convert the value from the buffer on every access of the class attribute.

That is the vTableGet method of the Reader class.

  T vTableGet(BufferContext object, int offset, int field, [T defaultValue]) {
    int vTableSOffset = object._getInt32(offset);
    int vTableOffset = offset - vTableSOffset;
    int vTableSize = object._getUint16(vTableOffset);
    int vTableFieldOffset = field;
    if (vTableFieldOffset < vTableSize) {
      int fieldOffsetInObject =
          object._getUint16(vTableOffset + vTableFieldOffset);
      if (fieldOffsetInObject != 0) {
        return read(object, offset + fieldOffsetInObject);
      }
    }
    return defaultValue;
  }

If you want to access the attribute of a class multiple times, it would make send to cache it in the generated class. Essentially I would do this:

  String _name;
  String get name => _name == null
      ? _name =
          _name = const fb.StringReader().vTableGet(_bc, _bcOffset, 4, null)
      : _name;

from isar.

Manuelbaun avatar Manuelbaun commented on June 9, 2024

I was looking at objectbox and as far as I understand, you provide the flatbuffers IDL file and it generates the db internals tables, etc for you. Do you think, it is also an option to look at?

from isar.

simc avatar simc commented on June 9, 2024

FlexBuffers allows schemaless object. This will be required if we decide to use some kind of p2p sync. Imagine the following objects of two peers with differen app versions:

class UserV1 extends IsarObject {
  String name;
}
class UserV2 extends IsarObject {
  String name;

  int age;
}

If the two peers synchronize, the older one would drop the age field when the data is stored in its local database. Once the v1 peer updates to v2, the existing data will still miss the age field. In order to process future state updates like "increment age by 1", we already expect age to have a certain value which it hasn't.

from isar.

Manuelbaun avatar Manuelbaun commented on June 9, 2024

I added flexbuffers to my benchmark. So far I only got the Object Reference working. When I manually encoded an object via Flexbuffers, It did not work to use the Reference.fromBuffer()

Runs 1000:

Hive                             Types.encode        70 bytes :     19.707 average ticks
Hive                             Types.decode        70 bytes :     18.647 average ticks
Flatbuffers objectBuilder        Types.encode       160 bytes :     26.845 average ticks
Flatbuffers objectBuilder        Types.decode       160 bytes :      1.517 average ticks
Flatbuffers buffersBuilder       Types.encode       160 bytes :     29.244 average ticks
Flatbuffers buffersBuilder       Types.decode       160 bytes :      1.289 average ticks
Message Pack with toMap Json     Types.encode       123 bytes :     53.601 average ticks
Message Pack with fromMap Json   Types.decode       123 bytes :     24.704 average ticks
Message Pack map, int keys       Types.encode        67 bytes :     21.645 average ticks
Message Pack map, int keys       Types.decode        67 bytes :     14.965 average ticks
FlexBuffers build from Object    Types.encode       200 bytes :     83.324 average ticks
FlexBuffers Reference Object     Types.decode       200 bytes :      1.965 average ticks
FlexBuffers build into Vector    Types.encode       130 bytes :     41.927 average ticks
FlexBuffers build into Map       Types.encode       154 bytes :     59.077 average ticks

When a schemaless serializer should be used, then I think at the moment, MessagePack is a better option. It produces a much smaller binary size and also does it faster. And it is also ported in many different languages.

The classes could the use a toMap function which uses int as keys instead of an String. Flexbuffer does not support this at the moment (not sure if they will) but MessagePack does it.

class UserV1 extends IsarObject {
  String name;

  Map<int, dynamic> toMap() {
    return {
      0: name,
    };
  }

  static UserV1 fromMap(Map<int, dynamic> map) {
    if (map == null) return null;

    return UserV1()..name = map[0];
  }
}
class UserV2 extends IsarObject {
  String name;
  int age;

  Map<int, dynamic> toMap() {
    return {
      0: name,
      1: age,
    };
  }

  static UserV2 fromMap(Map<int, dynamic> map) {
    if (map == null) return null;

    return UserV2()
      ..name = map[0]
      ..age = map[1];
  }
}

and the only requirement would be, as always, to not change the field numbers.

from isar.

simc avatar simc commented on June 9, 2024

This is obsolete since Isar has some special requirements that need a custom format.

from isar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.