Giter Club home page Giter Club logo

bencodex.net's Introduction

Bencodex codec for .NET

GitHub Actions Status NuGet

This library implements Bencodex serialization format which extends Bencoding.

Usage

It currently provides only the most basic encoder and decoder. See also these methods:

  • Bencodex.Codec.Encode(Bencodex.Types.IValue, System.IO.Stream)
  • Bencodex.Codec.Encode(Bencodex.Types.IValue)
  • Bencodex.Codec.Decode(System.IO.Stream)
  • Bencodex.Codec.Decode(System.Byte[])

It will provide type-extensible higher-level APIs as well in the future.

License

Distributed under LGPL 2.1 or later.

bencodex.net's People

Contributors

dahlia avatar earlbread avatar greymistcube avatar limebell avatar moreal avatar onedgelee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bencodex.net's Issues

Implement better support for `IKey` and `IValue` types

Example:

Dictionary dict = Dictionary.Empty;
string rawText = "foo";
long rawInteger = 1L;
Text text = new Text(rawText);
Integer i = new Integer(rawInteger);

dict.Add(rawText, rawInteger);      // 01 Works.
dict.Add(text, rawInteger);         // 02 Works.
dict.Add(rawText, i);               // 03 Doesn't work.
dict.Add(rawText, (IValue)i);       // 04 Works.
dict.Add(text, i);                  // 05 Doesn't work.
dict.Add((IKey)text, i);            // 06 Works.
dict.Add((IKey)text, (IValue)i);    // 07 Works.

List list = List.Empty;
list.Add(rawText);                  // 08 Works.
list.Add(rawInteger);               // 09 Works.
list.Add(text);                     // 10 Doesn't work.
list.Add(i);                        // 11 Doesn't work.

The inherent pattern for which works and which doesn't work is very confusing at a glance.
Providing convenience is nice and all, but in my opinion, following POLA is more important.
That is, supporting 05, 10, and 11 should take precedence over other cases.
However, as seen in the example, it's entirely other way around. ๐Ÿ™„

Remove `Fingerprint`

Rationale:

  • This is no longer used for offloading.
  • Overhead of computing an IValue's Fingerprint might be detrimental for small IValues.
    • I'd say there is a good reason why in general no such feature is a part of a core functionality for such generic data types like IValue. ๐Ÿ™„

High-level API to build dictionaries

Currently creating a new Bencodex.Types.Dictionary instance requires lines of code, e.g.:

new Bencodex.Types.Dictionary(new Dictionary<Bencodex.Types.IKey, Bencodex.Types.IValue>
{
    [(Text) "foo"] = (Text) "bar",
    [(Text) "baz"] = new Binary(Guid.NewGuid().ToByteArray()),
    [(Text) "qux"] = (Integer) 123,
})

As it involves unnecessary elements like casting and immediate objects, I suggest to have a high-level โ€œbuilderโ€ API. The following are example code I propose. Please leave comments which is better and why, or your own proposals would be great too.

Proposal 1. Overloading SetItem(string, T)

Although Bencodex.Types.Dictionary already has SetItem(IKey, IValue) method as it implements IImmutableDictionary<IKey, IValue>, SetItem() is not enough convenient to build a long dictionary:

(Bencodex.Types.Dictionary) new Bencodex.Types.Dictionary()
    .SetItem((Text) "foo", (Text) "bar")
    .SetItem((Text) "baz", new Binary(Guid.NewGuid().ToByteArray()))
    .SetItem((Text) "qux", (Integer) 123)

The problem of the above code is that you still need a lot of casting operators. As the return type of SetItem() is not Bencodex.Types.Dictionary but IImmutableDictionary<IKey, IValue>, you need to cast the result to Bencodex.Types.Dictionary again.

In order to work around this inconvenience, I suggest to overload SetItem(string, T) which returns Bencodex.Types.Dictionary (where T can be frequently used types like string or int). With such overloading, code can be more concise like:

new Bencodex.Types.Dictionary()
    .SetItem("foo", "bar")
    .SetItem("baz", Guid.NewGuid().ToByteArray())
    .SetItem("qux", 123)

(The above code assumes there are 3 overloads: SetItem(string, string), SetItem(string, byte[]), and SetItem(string, int)).

One of the cons is that the method name SetItem still needs to be repeated.

Proposal 2. ToBencodexDictionary(this IDictionary<string, IValue>) extension

The second suggestion is an extension method to convert an ordinary mutable dictionary into Bencodex.Types.Dictionary, in order to leverage C#'s collection initializer, which is unavailable to immutable objects at the moment (this syntax relies on mutability: Add() method).

The idea is simple: you build a mutable dictionary, and make it Bencodex.Types.Dictionary by freezing it. And for convenience, all string keys are converted to Text keys. It would look like:

new Dictionary<string, IValue>
{
    ["foo"] = (Text) "bar",
    ["baz"] = new Binary(Guid.NewGuid().ToByteArray()),
    ["qux"] = (Integer) 123,
}.ToBencodexDictionary()

A con here is that it still requires values to be converted to IValue.

Proposal 3. Declarative serializer using attributes

The last proposal is the most ambitious and the most difficult to implement at a time. It's basically similar approach to .NET's standard serialization and NewtonSoft.Json; we provide declarative attributes, and attributed types automagically became serialized to Bencodex.Types.Dictionary:

using Bencodex.Declarative;

[DictionaryEncoded]
class Record
{
    [DictionaryKeyEncoded("foo")]
    public string Foo { get; set; }

    [DictionaryKeyEncoded("baz")]
    public Guid Baz { get; set; }

    [DictionaryKeyEncoded("qux")]
    public int Qux { get; set; }
}
new Encoder().Encode(
    new Record
    {
        Foo = "bar",
        Baz = Guid.NewGuid(),
        Qux = 123,
    }
)

This approach has many pros and cons at a time: it gets rid of a lot of boilerplate code; the attributes can be used for deserialization as well; it can be extended to Bencodex.Types.List or any other Bencodex types as well as Bencodex.Types.Dictionary; it takes time to implement; it involves runtime reflectionโ€ฆ

Remove unnecessary `IComparar<T>` and `IEquatable<T>` implementations

I'm convinced some are improperly implemented. For instance, we have

public struct Integer : IEquatable<Integer>, IEquatable<BigInteger>

If I'm not mistaken, there doesn't seem to be any circumstance where IEquatable<BigInteger>.Equals() would be called. From what I understand, IEquatable<T> is to remove the overhead of boxing and unboxing, more generally for value types. The current codebase does the opposite by casting things around. ๐Ÿ˜‘

For example, when we have List<Integer> and try to use IndexOf(), we'd need to override Equals(object?)

public struct Integer
{
    public override bool Equals(object?)
    {
        // some implementation
    }
}

That is, although overriding Equals(object?) is sufficient, IndexOf() method calls Integer.Equals(object?) with the comparing target cast as object. IEquatable<T> is there to bypass this problem. If we have the following instead

public struct Integer
{
    bool IEquatable<Integer>.Equals(Integer other)
    {
        // some implementation
        // should be the same as below
    }

    public override bool Equals(object? other)
    {
        // some implementation
    }
}

when calling IndexOf() from List<Integer>, it would bypass Equals(object?) and call Equals(Integer) instead. Which is why we have the following rule of thumb:

  • IEquatable<T> should generally be implemented specifically for T.
  • IEquatable<T> should generally be implemented for T where T is a struct.

Fix a bug where an unordered encoded "dictionary" can be decoded.

Seems like there might be a bug where an unordered encoded "dictionary" data can be decoded into a Dictionary.

As per specification, note that d3:bar4:spam3:fooi42ee is valid but d3:fooi42e3:bar4:spame is not as keys are supposed to be lexicographically ordered. However, it seems like encoded byte array of d3:fooi42e3:bar4:spame would get decoded to d3:bar4:spam3:fooi42ee as Decoder sorts all the KeyValuePairs it has decoded before creating a Dictionary. Decoder should not sort the KeyValuePairs, but only check whether such KeyValuePairs are sorted or not.

On the flip side, while fix the issue, we might be able to further optimize Dictionary decoding. ๐Ÿ™ƒ

`Null.Value`: the singleton value for `Null`

There are currently two ways to get a Bencodex.Types.Null value:

  1. new Null(): to explicitly call the default constructor
  2. default(Null): to apply default() operator

However, since C# 8 both expressions became to have different types due to nullable reference types; the type of new Null() is Null while one of default(Null) is Null?. If the target type is not IValue? but IValue default(Null) is no more usable.

On the other hand, the way to explicitly call the default constructor causes StyleCop's SA1129, which is not usable either.

In order to work around these problems, Null should provide the official way to get the singleton value, in the same manner to string.Empty. IMHO Null.Value would be fine:

Change `Dictionary`'s internal data structure to trie

Currently Dictionary is a thin wrapper around ImmutableDictionry<K, V> (data structure wise). However, it does not share common data between copies, which is highly inefficient both space-wise and time-wise. I believe the memory consumption of Bencodex can be significantly reduced by changing its internal data structure to a trie which shares mutual data between instances rather than a naive hash map. As it would require much less memcpy, I expect it will be even faster than as is.

Fix `GetHashCode()` for `List` and `Dictionary`

See planetarium/libplanet#2518.

We get the following result, which I don't think is the expected behavior.

HashSet<Bencodex.Types.List> hashSet = new HashSet<Bencodex.Types.List>();
hashSet.Add(Bencodex.Types.List.Empty.Add(1));
hashSet.Add(Bencodex.Types.List.Empty.Add(1));
Console.WriteLine(hashSet.Count);   // Outputs 2.

Use not-so-generic names for types.

In my opinion, names such as Binary, List, Text are too generic and prone to naming collisions. It might not be much of a problem within this project, as a low level library, it isn't unreasonable high level applications might already be using List, Text, etc. Two main problems arise frequently when there are two Text types such as Bencodex.Types.Text and some App.Text:

  • It isn't immediately apparent which Text type is being used at a glance. Yes, in most cases, it can be derived from context and IDEs help, but this still is unnecessarily confusing.
  • Again, it is also often likely to happen both Text types are needed in a single source file since the whole point of Bencodex.Types.Text is to encode string or App.Text, in which case either unnecessary aliasing or fully qualified names are needed.

Honestly, I think it'd be better to name such types as BList, BText, etc. like here. There are pros and cons but I'd say overall the benefits outweigh the costs. As we have no control over the naming scheme in end-user's codebase, we should do as much as possible to avoid name collisions happening in the first place.

Change `Text` from `struct` to `class`

Might be a personal preference, but no strong reason other than that string is a class in C#. ๐Ÿ™„
Also I'd say having extra nullable handles internally as struct cannot have default initializer defeats the whole purpose of reducing overhead. ๐Ÿ˜ถ

ci(gh-actions): deploy NuGet package with timestamp suffix

Currently, GitHub Actions workflow deploys NuGet package with Git commit hash suffix (e.g, 0.3.0-dev.d6bdbb581c5d9cd024c0b428e1c9404cdf671b75).

NuGet Gallery shows versions ordered by the package name, not uploaded time. So it makes users confused to find the latest package easily.

image

So it should use timestamp as its suffix (i.e. 0.3.0-dev.20210714071711) and move the commit hash to build metadata section (i.e. 0.3.0-dev.20210714071711+d6bdbb581c5).

Equality behavior of `IValue`

Preamble

Although the example given here is explicitly for Integer, this applies to other IValue types in general.

Variables

int i = 2;
long l = 2;
BigInteger b = new BigInteger(2);
Integer x = new Integer(2);
object oi = (object)i;
object ol = (object)l;
object ob = (object)b;
object ox = (object)x;

Notes

  • X: Syntax error
  • โ•: C# language quirk
  • โ—: Non-intuitive behavior / Changed from original

Current Behavior

Table for ==

i l b x
i T T T Xโ—
l T T T Xโ—
b T T T T
x Xโ— Xโ— T T

Table for Equals()

i l b x oi ol ob ox
i T Fโ• Fโ• T T F F F
l T T Fโ• T F T F F
b T T T T F F T F
x Fโ— Fโ— T T F F Tโ— T
oi T F F F T F F F
ol F T F F F T F F
ob F F T F F F T F
ox F F Tโ— T F F Tโ— T

Proposal

Table for ==

i l b x
i T T T Tโ—
l T T T Tโ—
b T T T T
x Tโ— Tโ— T T

Table for Equals()

i l b x oi ol ob ox
i T Fโ• Fโ• T T F F F
l T T Fโ• T F T F F
b T T T T F F T F
x Tโ— Tโ— T T F F Fโ— T
oi T F F F T F F F
ol F T F F F T F F
ob F F T F F F T F
ox F F Fโ— T F F Fโ— T

Add `List.From<T>` factory method taking `IEnumerable<T> where T : IValue`

The current signature of Bencodex.Types.List(IEnumerable<IValue>) constructor requires frequent type casting to IEnumerable<IValue>, because C# does not automatically infer G<T> where T : B to G<B>. For example, the following code fails to build:

int[] ints = { 1, 2, 3, 4 };
IEnumerable<Bencodex.Types.Integer> bInts =
    ints.Select(i => (Bencodex.Types.Integer)i);
var list = new Bencodex.Types.List(bInts);  // Type error

It's because IEnumerable<Bencodex.Types.Integer> cannot be passed to IEnumerable<IValue>, even though Bencodex.Types.Integer implements IValue. Instead, we need to explicitly cast bInts so that the code is compiled:

var list = new Bencodex.Types.List(bInts.Cast<IValue>());

This can be fixed by changing the parameter type from IEnumerable<IValue> instead of IEnumerable<T> where T : IValue. However, unfortunately .NET disallows constructors to have type parameters. Therefore, we need add a separate factory method instead:

public static List From<T>(IEnumerable<T> values)
    where T : IValue;

Or, we might be better to have extension methods on IEnumerable<T> where T : IValue instead:

public static class EnumerableExtensions
{
    public static Bencodex.Types.List ToBencodexList<T>(this IEnumerable<T> source)
        where T : IValue;
    public static Bencodex.Types.Dictionary ToBencodexDictionary<TKey, TValue>(
        this IEnumerable<KeyValuePair<TKey, TValue>> source
    )
        where TKey : IKEy
        where TValue : IValue;
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.