csharptest / csharptest.net.collections Goto Github PK

View Code? Open in Web Editor NEW

107.0 13.0 48.0 429 KB

BPlusTree and other collections moved from http://code.google.com/p/csharptest-net

C# 100.00%

csharptest.net.collections's Introduction

CSharpTest.Net.Collections

CSharpTest.Net.Collections (moved from https://code.google.com/p/csharptest-net/)

Change Log

2014-09-06 Initial clone and extraction from existing library.

Online Help

BPlusTree Help: http://help.csharptest.net/?CSharpTest.Net.BPlusTree~CSharpTest.Net.Collections.BPlusTree%602.html

Quick start

LurchTable Example

//Example producer/consumer queue where producer threads help when queue is full
using (var queue = new LurchTable<string, int>(LurchTableOrder.Insertion, 100))
{
    var stop = new ManualResetEvent(false);
    queue.ItemRemoved += kv => Console.WriteLine("[{0}] - {1}", Thread.CurrentThread.ManagedThreadId, kv.Key);
    //start some threads eating queue:
    var thread = new Thread(() => { while (!stop.WaitOne(0)) queue.Dequeue(); })
        { Name = "worker", IsBackground = true };
    thread.Start();

    var names = Directory.GetFiles(Path.GetTempPath(), "*", SearchOption.AllDirectories);
    if (names.Length <= 100) throw new Exception("Not enough trash in your temp dir.");
    var loops = Math.Max(1, 10000/names.Length);
    for(int i=0; i < loops; i++)
        foreach (var name in names)
            queue[name] = i;

    stop.Set();
    thread.Join();
}

BPlusTree Example

var options = new BPlusTree<string, DateTime>.OptionsV2(PrimitiveSerializer.String, PrimitiveSerializer.DateTime);
options.CalcBTreeOrder(16, 24);
options.CreateFile = CreatePolicy.Always;
options.FileName = Path.GetTempFileName();
using (var tree = new BPlusTree<string, DateTime>(options))
{
    var tempDir = new DirectoryInfo(Path.GetTempPath());
    foreach (var file in tempDir.GetFiles("*", SearchOption.AllDirectories))
    {
        tree.Add(file.FullName, file.LastWriteTimeUtc);
    }
}
options.CreateFile = CreatePolicy.Never;
using (var tree = new BPlusTree<string, DateTime>(options))
{
    var tempDir = new DirectoryInfo(Path.GetTempPath());
    foreach (var file in tempDir.GetFiles("*", SearchOption.AllDirectories))
    {
        DateTime cmpDate;
        if (!tree.TryGetValue(file.FullName, out cmpDate))
            Console.WriteLine("New file: {0}", file.FullName);
        else if (cmpDate != file.LastWriteTimeUtc)
            Console.WriteLine("Modified: {0}", file.FullName);
        tree.Remove(file.FullName);
    }
    foreach (var item in tree)
    {
        Console.WriteLine("Removed: {0}", item.Key);
    }
}

csharptest.net.collections's People

Contributors

Stargazers

Watchers

csharptest.net.collections's Issues

Atomic update

First of all, I would like to thank you for sharing such an awesome work with us. Although it took me a while, but eventually I succeeded in adapting it with my work.

Based on my work scenario, I have quite a lot of updates. Checking the update functions, it seems to me that "TryUpdate(TKey key, KeyValueUpdate<TKey, TValue> fnUpdate)" is a better solution. However, I'm not sure I know how to run a proper function for KeyValueUpdate.

I would sincerely appreciate if you could share an example with us.

Reverse Iteration

Hi, I should say: BPlusTree has lots of quite utile features specially for iterations.
Features such as: tree.EnumerateFrom, tree.EnumerateRange, or tree.GetEnumerator().
Leveraging on these functions most of common operations are supported out-of-box, however I was wondering if there is any possibility to do reverse iteration. For example a variation of tree.EnumerateFrom which starts iterating from given key toward BEGINNING of tree rather END. Maybe this aspect is covered, in that case I would appreciate if you could explain it. Otherwise, would you mind please guide me through ?

Reverse iteration is indeed useful when trying to perform some operations on key's surrounding the given key.

Adoption/Support?

I'm interested in potentially adopting this as part of BrightChain if you're no longer interested in maintaining it. But what lead you to abandon it? I'd love to keep your help, but would be happy to try and help you out as your code is ideal for what I'm doing. Please let me know if I can help monetarily or with labor.

BPlusTree: Meaningless AssertionFailedException if the lock was not acquired

If the CallLevel lock cannot be acquired due to timeout then the Assertion is triggered. In this case it would be better to throw some specific exception derived from TimeoutException, so the user code can react appropriately.

There are a couple of lines in the constructor of RootLock:

_locked = _exclusive ? _tree._selfLock.TryWrite(tree._options.LockTimeout) :   _tree._selfLock.TryRead(tree._options.LockTimeout);
Assert(_locked);

Instead I suggest to do this:

_locked = _exclusive ? _tree._selfLock.TryWrite(tree._options.LockTimeout) : _tree._selfLock.TryRead(tree._options.LockTimeout);
if (!_locked)
    throw new BPlusTreeTimeoutException("...");

BPlusTree: CallLevel lock is never released in the case of exception

There is a possibility that CallLevel lock would never be released.
And this is really happening if LockTimeout expires in NodeCache.

Here is the code in the constructor of RootLock:

_locked = _exclusive ? _tree._selfLock.TryWrite(tree._options.LockTimeout) : _tree._selfLock.TryRead(tree._options.LockTimeout);
Assert(_locked);
Pin = _tree._storage.LockRoot(type);

If an exception is thrown in '_tree._storage.LockRoot(type)' then '_tree._selfLock' will never be released.
So the code should be enclosed by try..catch:

try
{
    Pin = _tree._storage.LockRoot(type);
}
catch
{
    if (_exclusive)
       _tree._selfLock.ReleaseWrite();
    else
       _tree._selfLock.ReleaseRead();
    throw;
}

Does LurchTable depend on the implementation of new Guid(byte[]) and guid.ToByteArray()?

I am upgrading my project to .NET Core 2.0 and discovered that the LurchTable tests have a method that depends on the implementation of new Guid(byte[]) and guid.ToByteArray():

        private static Random random = new Random();
        private static int iCounter = 0x01010101;

        public static Guid NextHashCollision(Guid guid)
        {
            var bytes = guid.ToByteArray();

            // Modify bytes 8 & 9 with random number
            Array.Copy(
                BitConverter.GetBytes((short)random.Next()),
                0,
                bytes,
                8,
                2
            );

            // Increment bytes 11, 12, 13, & 14
            Array.Copy(
                BitConverter.GetBytes(
                    BitConverter.ToInt32(bytes, 11) +
                    Interlocked.Increment(ref iCounter)
                    ),
                0,
                bytes,
                11,
                4
            );

            Guid result = new Guid(bytes);
#if !NETCOREAPP2_0
            Assert.AreEqual(guid.GetHashCode(), result.GetHashCode());
#endif
            return result;
        }

The assert fails in .NET Core 2.0 because the underlying implementation has changed. What I am wondering is if there is some dependency of LurchTable on the Guid generation algorithm, or if this is just for testing? What reference you were following to come up with this logic?

Removing the offending assert and the TestNextHashCollision test seems to have no effect on the results of other tests, but I just wanted to be sure there isn't anything special about the design of LurchTable that relies on the Guid creation algorithm.

Possible bug in IOStream

Hi,
Just checking the code, I see on https://github.com/csharptest/CSharpTest.Net.Collections/blob/master/src/CSharpTest.Net.Collections/IO/IOStream.cs#L68 that you are initializing bytesRead with 0. I believe the correct value should be "offset". Without that, the offset argument is meaningless in the context of that method.

Enumeration out of sequence.

   IEnumerable<KeyValuePair<UInt32, ComboNode>> Values( ComboNode[] nodes ) {
      foreach( ComboNode cn in nodes )
        yield return new KeyValuePair<UInt32, ComboNode>( cn.id, cn );
    }

IEnumerable<KeyValuePair<UInt32, ComboNode>> bulkPairs = Combinations.Values( nodes.ToArray() );

// >>>> Enumeration out of sequence.    in next block
int cnt= combos.data.BulkInsert( bulkPairs, new BulkInsertOptions{
                    ReplaceContents = true,
                    InputIsSorted = true,
                    CommitOnCompletion = true,
                    DuplicateHandling = DuplicateHandling.RaisesException,
                } );

LurchTable Insertion Order Not Working

You have advertised LurchTable as a replacement for LinkedHashMap. Per this answer on StackOverflow:

LinkedHashMap will iterate in the order in which the entries were put into the map

Basically, the primary purpose of LinkedHashMap is to keep the insertion order regardless of deletions and re-additions, so I would expect this replacement for it should function the same way, but doesn't. Since there is an insertion order in the enumeration, but it doesn't function, I suspect this is a bug.

    class Program
    {
        static void Main(string[] args)
        {
            var lurchTable = new LurchTable<string, string>(16, LurchTableOrder.Insertion);

            for (int i = 0; i < 20; i++)
            {
                string entry = "entry" + i;
                lurchTable.Add(entry, entry);
            }

            foreach (var entry in lurchTable)
            {
                Console.WriteLine(entry.Key);
            }

            Console.ReadKey();
        }
    }

Results:

entry4
entry13
entry8
entry1
entry19
entry17
entry5
entry12
entry9
entry2
entry16
entry11
entry18
entry6
entry15
entry3
entry10
entry7
entry0
entry14

Unfortunately, it seems that .NET doesn't contain a dictionary that guarantees the results will be iterated in insertion order. Well, there is OrderedDictionary, but it is not generic.

BTree fails to initialize when FileBlockSize > 4096

If you have very large nodes (single-node size > 16 MB), you can increase the FileBlockSize to accommodate the larger nodes. This works fine when writing the data, however if you restart your process and it tries to initialize the BTree by reading the data from disk, it will fail in the TransactedCompoundFile.FileSection.Read method when it tries to validate the ActualBlocks count. This is because the Length written during the Write portion was greater than 16 MB, occupying more than 3 bytes, however the first byte was reserved for the header length. When trying to read the length back out, we only get the last 3 bytes of the length, which doesn't match the expected number of ActualBlocks, causing an error.

Make LurchTable work on Xamarin.iOS

We are trying to run Lucene.Net 4.8 on a Xamarin.iOS app, but it doesn't work due to the fact that it depends on LurchTable, that uses some instructions not fully supported by Xamarin.iOS compiler.

In particular, the compiler seems to have problems with the structs that implement generic interfaces, and within LurchTable there are several of them.

This problem has been discussed here mono/mono#7016
It seems that the guys at Mono tried to support these instructions, but I think it didn't work and in one of the last comments they say "The LurchTable was quite a complex code that was not necessary for the particular use. In fact it used so complex generics that it managed to break Mono AOT in some cases".

In Lucene.Net, LurchTable was preferred to other LruCache implementations because their performance weren't good enough, so it make sense to continue to use LurchTable, but we'd like to find a way to make it work on Xamarin.iOS as well (Lucene.Net is trying to support also this platform).

We made a test, converting the generic structs inside LurchTable with classes: the test worked, but we don't know if this fix is right, regarding both performance and correctness.

Can you suggest the correct way to make LurchTable work in Xamarin.iOS as well?

PS: just to be clear, LurchTable works on Xamarin.iOS when the app is compiled for iOS simulator, but it doesn't work when is compiled for the devices, using the AOT compiler.

Value update options for BulkInsert and AddRange

First of all, I would like to thank you (again) for sharing such an excellent work with us.

I usually use AddOrUpdate method to add a new item to the bplustree. The possibility of passing a struct of AddUpdateValue that implements ICreateOrUpdateValue<C, B>, IRemoveValue<C, B> is a perfect option that extends and facilitates value update procedure. However, I wonder why such a magnificent feature is avoided in BulkInsert and AddRange.

Depending on application, value update by replacing old value with new one might be preferred, however, some applications could have custom update procedures. Some applications may require updating the value of a given key with new value where the new value it self depends on old one. For instance, lets suppose value is the average of all observed measures for a key; you observe new measure, now you would like to update the value, not replace it. AddOrUpdate perfectly addresses this requirement, but BulkInsert and AddRange are very limited in this regard.

I would like to know if you have any plans for incorporating AddUpdateValue feature in these super fast :) insertions ?

BPlusTree: Deadlock through the Thread Pool

I have encountered some weird deadlock condition: a deadlock through the Thread Pool.
This can occur when the maximum number of threads in the pool is limited.
The root of the problem is the asynchronous flushing in StorageCache class.

How to reproduce this problem:
First of all there should be limited number of threads in ThreadPool. It could be achieved by calling SetMaxThreads in standard ThreadPool or by using custom thread pool implementations (my case).

Suppose we have 10 working threads.
All threads are busy. They perform some operations and sometimes call methods on BPlusTree.
One of them performs the operation that invokes 'Update' in StorageCache with this line of code:

this._asyncWriteBehind = this._writeBehindFunc.BeginInvoke(null, null);

But at this point we have no free working threads in the pool, so the job is queued.

Then one thread executes TryUpdate on BPlusTree. The writer lock is acquired there.
At the same time 9 other threads trying to perform any other operations on BPlusTree and become locked.
The update process goes to the final stage where it executes 'Commit' in StorageCache, that in turn executes 'CompleteAsync' with this code:

this._writeBehindFunc.EndInvoke(asyncWriteBehind);

But the delegate is still inside the queue, so this thread is also locked.
And we have the situation when all 10 threads are locked.

I see 3 possible solution to this problem:

The thread, that performs 'CompleteAsync' should detect such situation (by checking some flag) and execute the flushing process in synchronous way. This is the best solution.
Increase the number of threads in ThreadPool, which reduce the probability of this situation. But I don't like this way.
Create separate thread inside BPlusTree which will perform flushing process.

I've described only the situation that has been reproduced in my application.
There can be other places in the BPlusTree with the same pattern and the same problem.

Race condition in TransactionLog() c'tor

Issue:
Race condition in TransactionLog.cs, TransactionLog's c'tor, between the File.Exists(...) call and the new FileInfo(_options.FileName).Length statement. A file can be deleted in between leading to an exception later.

What steps will reproduce the problem?

Set V2 options with "CreateFile = CreatePolicy.Always"
Add via something like (note the using usage)

public void Add(MyEntity newItem)
{            
    // Open the BPlusTree using these options
    using (var bpt = new BPlusTree<string, KeyEntity>(_options))
    {
        bpt.Add(newItem.key, newItem.Value);
    }
 }

Add about 100K entities in rapid fire via the Add(MyEntity newItem) call.

There is a 1% error chance (on a high performance SSD) so by the 100th of so addition, you'll hit the race condition.

What is the expected output? What do you see instead?

See above

What version of the product are you using? On what operating system?

Tip of HG repo

Please provide any additional information below.

Not specific to this issue but at a project level, it would be nice if you had a forum or something where user's of this library could talk about uses/features/bugs. For example: A cool idea would be extending the storage to use Azure Blob Storage (in Page mode). Likely extending/replicating off the BTreeFileStoreV2 class. A forum could facilitate this sort of a discussion. Or perhaps having a multi master write model (extending on that azure thought).

Dotnet 6 Preview? :)

 /usr/lib/mono/msbuild/Current/bin/NuGet.targets(128,5): error : Package CSharpTest.Net.Collections 14.906.1403.1082 is not compatible with net6.0 (.NETCoreApp,Version=v6.0). Package CSharpTest.Net.Collections 14.906.1403.1082 supports: [/home/runner/work/BrightChain/BrightChain/BrightChain.sln]