Giter Club home page Giter Club logo

column's Introduction

kelindar/column
Go Version PkgGoDev Go Report Card License Coverage

Columnar In-Memory Store with Bitmap Indexing

This package contains a high-performance, columnar, in-memory storage engine that supports fast querying, update and iteration with zero-allocations and bitmap indexing.

Features

  • Optimized, cache-friendly columnar data layout that minimizes cache-misses.
  • Optimized for zero heap allocation during querying (see benchmarks below).
  • Optimized batch updates/deletes, an update during a transaction takes around 12ns.
  • Support for SIMD-enabled aggregate functions such as "sum", "avg", "min" and "max".
  • Support for SIMD-enabled filtering (i.e. "where" clause) by leveraging bitmap indexing.
  • Support for columnar projection (i.e. "select" clause) for fast retrieval.
  • Support for computed indexes that are dynamically calculated based on provided predicate.
  • Support for concurrent updates using sharded latches to keep things fast.
  • Support for transaction isolation, allowing you to create transactions and commit/rollback.
  • Support for expiration of rows based on time-to-live or expiration column.
  • Support for atomic merging of any values, transactionally.
  • Support for primary keys for use-cases where offset can't be used.
  • Support for change data stream that streams all commits consistently.
  • Support for concurrent snapshotting allowing to store the entire collection into a file.

Documentation

The general idea is to leverage cache-friendly ways of organizing data in structures of arrays (SoA) otherwise known "columnar" storage in database design. This, in turn allows us to iterate and filter over columns very efficiently. On top of that, this package also adds bitmap indexing to the columnar storage, allowing to build filter queries using binary and, and not, or and xor (see kelindar/bitmap with SIMD support).

Collection and Columns

In order to get data into the store, you'll need to first create a Collection by calling NewCollection() method. Each collection requires a schema, which needs to be specified by calling CreateColumn() multiple times or automatically inferred from an object by calling CreateColumnsOf() function. In the example below we create a new collection with several columns.

// Create a new collection with some columns
players := column.NewCollection()
players.CreateColumn("name", column.ForString())
players.CreateColumn("class", column.ForString())
players.CreateColumn("balance", column.ForFloat64())
players.CreateColumn("age", column.ForInt16())

Now that we have created a collection, we can insert a single record by using Insert() method on the collection. In this example we're inserting a single row and manually specifying values. Note that this function returns an index that indicates the row index for the inserted row.

index, err := players.Insert(func(r column.Row) error {
	r.SetString("name", "merlin")
	r.SetString("class", "mage")
	r.SetFloat64("balance", 99.95)
	r.SetInt16("age", 107)
	return nil
})

While the previous example demonstrated how to insert a single row, inserting multiple rows this way is rather inefficient. This is due to the fact that each Insert() call directly on the collection initiates a separate transacion and there's a small performance cost associated with it. If you want to do a bulk insert and insert many values, faster, that can be done by calling Insert() on a transaction, as demonstrated in the example below. Note that the only difference is instantiating a transaction by calling the Query() method and calling the txn.Insert() method on the transaction instead the one on the collection.

players.Query(func(txn *column.Txn) error {
	for _, v := range myRawData {
		txn.Insert(...)
	}
	return nil // Commit
})

Querying and Indexing

The store allows you to query the data based on a presence of certain attributes or their values. In the example below we are querying our collection and applying a filtering operation bu using WithValue() method on the transaction. This method scans the values and checks whether a certain predicate evaluates to true. In this case, we're scanning through all of the players and looking up their class, if their class is equal to "rogue", we'll take it. At the end, we're calling Count() method that simply counts the result set.

// This query performs a full scan of "class" column
players.Query(func(txn *column.Txn) error {
	count := txn.WithValue("class", func(v interface{}) bool {
		return v == "rogue"
	}).Count()
	return nil
})

Now, what if we'll need to do this query very often? It is possible to simply create an index with the same predicate and have this computation being applied every time (a) an object is inserted into the collection and (b) an value of the dependent column is updated. Let's look at the example below, we're fist creating a rogue index which depends on "class" column. This index applies the same predicate which only returns true if a class is "rogue". We then can query this by simply calling With() method and providing the index name.

An index is essentially akin to a boolean column, so you could technically also select it's value when querying it. Now, in this example the query would be around 10-100x faster to execute as behind the scenes it uses bitmap indexing for the "rogue" index and performs a simple logical AND operation on two bitmaps when querying. This avoid the entire scanning and applying of a predicate during the Query.

// Create the index "rogue" in advance
out.CreateIndex("rogue", "class", func(v interface{}) bool {
	return v == "rogue"
})

// This returns the same result as the query before, but much faster
players.Query(func(txn *column.Txn) error {
	count := txn.With("rogue").Count()
	return nil
})

The query can be further expanded as it allows indexed intersection, difference and union operations. This allows you to ask more complex questions of a collection. In the examples below let's assume we have a bunch of indexes on the class column and we want to ask different questions.

First, let's try to merge two queries by applying a Union() operation with the method named the same. Here, we first select only rogues but then merge them together with mages, resulting in selection containing both rogues and mages.

// How many rogues and mages?
players.Query(func(txn *column.Txn) error {
	txn.With("rogue").Union("mage").Count()
	return nil
})

Next, let's count everyone who isn't a rogue, for that we can use a Without() method which performs a difference (i.e. binary AND NOT operation) on the collection. This will result in a count of all players in the collection except the rogues.

// How many rogues and mages?
players.Query(func(txn *column.Txn) error {
	txn.Without("rogue").Count()
	return nil
})

Now, you can combine all of the methods and keep building more complex queries. When querying indexed and non-indexed fields together it is important to know that as every scan will apply to only the selection, speeding up the query. So if you have a filter on a specific index that selects 50% of players and then you perform a scan on that (e.g. WithValue()), it will only scan 50% of users and hence will be 2x faster.

// How many rogues that are over 30 years old?
players.Query(func(txn *column.Txn) error {
	txn.With("rogue").WithFloat("age", func(v float64) bool {
		return v >= 30
	}).Count()
	return nil
})

Iterating over Results

In all of the previous examples, we've only been doing Count() operation which counts the number of elements in the result set. In this section we'll look how we can iterate over the result set.

As before, a transaction needs to be started using the Query() method on the collection. After which, we can call the txn.Range() method which allows us to iterate over the result set in the transaction. Note that it can be chained right after With..() methods, as expected.

In order to access the results of the iteration, prior to calling Range() method, we need to first load column reader(s) we are going to need, using methods such as txn.String(), txn.Float64(), etc. These prepare read/write buffers necessary to perform efficient lookups while iterating.

In the example below we select all of the rogues from our collection and print out their name by using the Range() method and accessing the "name" column using a column reader which is created by calling txn.String("name") method.

players.Query(func(txn *column.Txn) error {
	names := txn.String("name") // Create a column reader

	return txn.With("rogue").Range(func(i uint32) {
		name, _ := names.Get()
		println("rogue name", name)
	})
})

Similarly, if you need to access more columns, you can simply create the appropriate column reader(s) and use them as shown in the example before.

players.Query(func(txn *column.Txn) error {
	names := txn.String("name")
	ages  := txn.Int64("age")

	return txn.With("rogue").Range(func(i uint32) {
		name, _ := names.Get()
		age,  _ := ages.Get()

		println("rogue name", name)
		println("rogue age", age)
	})
})

Taking the Sum() of a (numeric) column reader will take into account a transaction's current filtering index.

players.Query(func(txn *column.Txn) error {
	totalAge := txn.With("rouge").Int64("age").Sum()
	totalRouges := int64(txn.Count())

	avgAge := totalAge / totalRouges

	txn.WithInt("age", func(v float64) bool {
		return v < avgAge
	})

	// get total balance for 'all rouges younger than the average rouge'
	balance := txn.Float64("balance").Sum()
	return nil
})

Sorted Indexes

Along with bitmap indexing, collections support consistently sorted indexes. These indexes are transient, and must be recreated when a collection is loading a snapshot.

In the example below, we create a SortedIndex object and use it to sort filtered records in a transaction.

// Create the sorted index "sortedNames" in advance
out.CreateSortIndex("richest", "balance")

// This filters the transaction with the `rouge` index before
// ranging through the remaining balances by ascending order
players.Query(func(txn *column.Txn) error {
	name    := txn.String("name")
	balance := txn.Float64("balance")

	txn.With("rogue").Ascend("richest", func (i uint32) {
		// save or do something with sorted record
		curName, _ := name.Get()
		balance.Set(newBalance(curName))
	})
	return nil
})

Updating Values

In order to update certain items in the collection, you can simply call Range() method and use column accessor's Set() or Add() methods to update a value of a certain column atomically. The updates won't be instantly reflected given that our store supports transactions. Only when transaction is commited, then the update will be applied to the collection, allowing for isolation and rollbacks.

In the example below we're selecting all of the rogues and updating both their balance and age to certain values. The transaction returns nil, hence it will be automatically committed when Query() method returns.

players.Query(func(txn *column.Txn) error {
	balance := txn.Float64("balance")
	age     := txn.Int64("age")

	return txn.With("rogue").Range(func(i uint32) {
		balance.Set(10.0) // Update the "balance" to 10.0
		age.Set(50)       // Update the "age" to 50
	})
})

In certain cases, you might want to atomically increment or decrement numerical values. In order to accomplish this you can use the provided Merge() operation. Note that the indexes will also be updated accordingly and the predicates re-evaluated with the most up-to-date values. In the below example we're incrementing the balance of all our rogues by 500 atomically.

players.Query(func(txn *column.Txn) error {
	balance := txn.Float64("balance")

	return txn.With("rogue").Range(func(i uint32) {
		balance.Merge(500.0) // Increment the "balance" by 500
	})
})

While atomic increment/decrement for numerical values is relatively straightforward, this Merge() operation can be specified using WithMerge() option and also used for other data types, such as strings. In the example below we are creating a merge function that concatenates two strings together and when MergeString() is called, the new string gets appended automatically.

// A merging function that simply concatenates 2 strings together
concat := func(value, delta string) string {
	if len(value) > 0 {
		value += ", "
	}
	return value + delta
}

// Create a column with a specified merge function
db := column.NewCollection()
db.CreateColumn("alphabet", column.ForString(column.WithMerge(concat)))

// Insert letter "A"
db.Insert(func(r column.Row) error {
	r.SetString("alphabet", "A") // now contains "A"
	return nil
})

// Insert letter "B"
db.QueryAt(0, func(r column.Row) error {
	r.MergeString("alphabet", "B") // now contains "A, B"
	return nil
})

Expiring Values

Sometimes, it is useful to automatically delete certain rows when you do not need them anymore. In order to do this, the library automatically adds an expire column to each new collection and starts a cleanup goroutine aynchronously that runs periodically and cleans up the expired objects. In order to set this, you can simply use Insert...() method on the collection that allows to insert an object with a time-to-live duration defined.

In the example below we are inserting an object to the collection and setting the time-to-live to 5 seconds from the current time. After this time, the object will be automatically evicted from the collection and its space can be reclaimed.

players.Insert(func(r column.Row) error {
	r.SetString("name", "Merlin")
	r.SetString("class", "mage")
	r.SetTTL(5 * time.Second) // time-to-live of 5 seconds
	return nil
})

On an interesting note, since expire column which is automatically added to each collection is an actual normal column, you can query and even update it. In the example below we query and extend the time-to-live by 1 hour using the Extend() method.

players.Query(func(txn *column.Txn) error {
	ttl := txn.TTL()
	return txn.Range(func(i uint32) {
		ttl.Extend(1 * time.Hour) // Add some time
	})
})

Transaction Commit and Rollback

Transactions allow for isolation between two concurrent operations. In fact, all of the batch queries must go through a transaction in this library. The Query method requires a function which takes in a column.Txn pointer which contains various helper methods that support querying. In the example below we're trying to iterate over all of the players and update their balance by setting it to 10.0. The Query method automatically calls txn.Commit() if the function returns without any error. On the flip side, if the provided function returns an error, the query will automatically call txn.Rollback() so none of the changes will be applied.

// Range over all of the players and update (successfully their balance)
players.Query(func(txn *column.Txn) error {
	balance := txn.Float64("balance")
	txn.Range(func(i uint32) {
		v.Set(10.0) // Update the "balance" to 10.0
	})

	// No error, transaction will be committed
	return nil
})

Now, in this example, we try to update balance but a query callback returns an error, in which case none of the updates will be actually reflected in the underlying collection.

// Range over all of the players and update (successfully their balance)
players.Query(func(txn *column.Txn) error {
	balance := txn.Float64("balance")
	txn.Range(func(i uint32) {
		v.Set(10.0) // Update the "balance" to 10.0
	})

	// Returns an error, transaction will be rolled back
	return fmt.Errorf("bug")
})

Using Primary Keys

In certain cases it is useful to access a specific row by its primary key instead of an index which is generated internally by the collection. For such use-cases, the library provides Key column type that enables a seamless lookup by a user-defined primary key. In the example below we create a collection with a primary key name using CreateColumn() method with a ForKey() column type. Then, we use InsertKey() method to insert a value.

players := column.NewCollection()
players.CreateColumn("name", column.ForKey())     // Create a "name" as a primary-key
players.CreateColumn("class", column.ForString()) // .. and some other columns

// Insert a player with "merlin" as its primary key
players.InsertKey("merlin", func(r column.Row) error {
	r.SetString("class", "mage")
	return nil
})

Similarly, you can use primary key to query that data directly, without knowing the exact offset. Do note that using primary keys will have an overhead, as it requires an additional step of looking up the offset using a hash table managed internally.

// Query merlin's class
players.QueryKey("merlin", func(r column.Row) error {
	class, _ := r.String("class")
	return nil
})

Storing Binary Records

If you find yourself in need of encoding a more complex structure as a single column, you may do so by using column.ForRecord() function. This allows you to specify a BinaryMarshaler / BinaryUnmarshaler type that will get automatically encoded as a single column. In th example below we are creating a Location type that implements the required methods.

type Location struct {
	X float64 `json:"x"`
	Y float64 `json:"y"`
}

func (l Location) MarshalBinary() ([]byte, error) {
	return json.Marshal(l)
}

func (l *Location) UnmarshalBinary(b []byte) error {
	return json.Unmarshal(b, l)
}

Now that we have a record implementation, we can create a column for this struct by using ForRecord() function as shown below.

players.CreateColumn("location", ForRecord(func() *Location {
	return new(Location)
}))

In order to manipulate the record, we can use the appropriate Record(), SetRecord() methods of the Row, similarly to other column types.

// Insert a new location
idx, _ := players.Insert(func(r Row) error {
	r.SetRecord("location", &Location{X: 1, Y: 2})
	return nil
})

// Read the location back
players.QueryAt(idx, func(r Row) error {
	location, ok := r.Record("location")
	return nil
})

Streaming Changes

This library also supports streaming out all transaction commits consistently, as they happen. This allows you to implement your own change data capture (CDC) listeners, stream data into kafka or into a remote database for durability. In order to enable it, you can simply provide an implementation of a commit.Logger interface during the creation of the collection.

In the example below we take advantage of the commit.Channel implementation of a commit.Logger which simply publishes the commits into a go channel. Here we create a buffered channel and keep consuming the commits with a separate goroutine, allowing us to view transactions as they happen in the store.

// Create a new commit writer (simple channel) and a new collection
writer  := make(commit.Channel, 1024)
players := NewCollection(column.Options{
	Writer: writer,
})

// Read the changes from the channel
go func(){
	for commit := range writer {
		fmt.Printf("commit %v\n", commit.ID)
	}
}()

// ... insert, update or delete

On a separate note, this change stream is guaranteed to be consistent and serialized. This means that you can also replicate those changes on another database and synchronize both. In fact, this library also provides Replay() method on the collection that allows to do just that. In the example below we create two collections primary and replica and asychronously replicating all of the commits from the primary to the replica using the Replay() method together with the change stream.

// Create a primary collection
writer  := make(commit.Channel, 1024)
primary := column.NewCollection(column.Options{
	Writer: &writer,
})
primary.CreateColumnsOf(object)

// Replica with the same schema
replica := column.NewCollection()
replica.CreateColumnsOf(object)

// Keep 2 collections in sync
go func() {
	for change := range writer {
		replica.Replay(change)
	}
}()

Snapshot and Restore

The collection can also be saved in a single binary format while the transactions are running. This can allow you to periodically schedule backups or make sure all of the data is persisted when your application terminates.

In order to take a snapshot, you must first create a valid io.Writer destination and then call the Snapshot() method on the collection in order to create a snapshot, as demonstrated in the example below.

dst, err := os.Create("snapshot.bin")
if err != nil {
	panic(err)
}

// Write a snapshot into the dst
err := players.Snapshot(dst)

Conversely, in order to restore an existing snapshot, you need to first open an io.Reader and then call the Restore() method on the collection. Note that the collection and its schema must be already initialized, as our snapshots do not carry this information within themselves.

src, err := os.Open("snapshot.bin")
if err != nil {
	panic(err)
}

// Restore from an existing snapshot
err := players.Restore(src)

Examples

Multiple complete usage examples of this library can be found in the examples directory in this repository.

Benchmarks

The benchmarks below were ran on a collection of 100,000 items containing a dozen columns. Feel free to explore the benchmarks but I strongly recommend testing it on your actual dataset.

cpu: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
BenchmarkCollection/insert-8            2523     469481 ns/op    24356 B/op    500 allocs/op
BenchmarkCollection/select-at-8     22194190      54.23 ns/op        0 B/op      0 allocs/op
BenchmarkCollection/scan-8              2068     568953 ns/op      122 B/op      0 allocs/op
BenchmarkCollection/count-8           571449       2057 ns/op        0 B/op      0 allocs/op
BenchmarkCollection/range-8            28660      41695 ns/op        3 B/op      0 allocs/op
BenchmarkCollection/update-at-8      5911978      202.8 ns/op        0 B/op      0 allocs/op
BenchmarkCollection/update-all-8        1280     946272 ns/op     3726 B/op      0 allocs/op
BenchmarkCollection/delete-at-8      6405852      188.9 ns/op        0 B/op      0 allocs/op
BenchmarkCollection/delete-all-8     2073188      562.6 ns/op        0 B/op      0 allocs/op

When testing for larger collections, I added a small example (see examples folder) and ran it with 20 million rows inserted, each entry has 12 columns and 4 indexes that need to be calculated, and a few queries and scans around them.

running insert of 20000000 rows...
-> insert took 20.4538183s

running snapshot of 20000000 rows...
-> snapshot took 2.57960038s

running full scan of age >= 30...
-> result = 10200000
-> full scan took 61.611822ms

running full scan of class == "rogue"...
-> result = 7160000
-> full scan took 81.389954ms

running indexed query of human mages...
-> result = 1360000
-> indexed query took 608.51µs

running indexed query of human female mages...
-> result = 640000
-> indexed query took 794.49µs

running update of balance of everyone...
-> updated 20000000 rows
-> update took 214.182216ms

running update of age of mages...
-> updated 6040000 rows
-> update took 81.292378ms

Contributing

We are open to contributions, feel free to submit a pull request and we'll review it as quickly as we can. This library is maintained by Roman Atachiants

License

Tile is licensed under the MIT License.

column's People

Contributors

dreeseaw avatar florimond avatar kelindar avatar sgosiaco avatar siennathesane avatar skx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

column's Issues

Empty columns?

I'd like to have columns that contain no data to be used as tags, Is a struct{} or other empty value possible? Bool where its always is true could be an option but seems wasteful sense you already have the concept of bitsets.

EDIT:
Reading through the code it looks like Bool is actually doing what I thought it should and not using a Go bool but using the bitset. Might be worth explaining in the docs and/or the comments on the type.

Enum vs String

In the million example you use enum for all string like fields. For things with a low cardinality, like gender, it makes sense but you also did it for names. Is there any more documentation or guidance on when to use each?

Panic when adding many values

I have an application that adds columns dynamically, and it causes a panic when there are more than 16384 rows in the collection being updated.

Here is what I am trying to do, structured as a test case in column_test.go:

func TestUpdate(t *testing.T) {
	coll := NewCollection()

	// set up an initial column of data
	coll.CreateColumn("foo", ForString())

	// up to 16384 it works, but at 16385
	// it panics
	for i := 0; i < 16385; i++ {
		coll.Insert(func(row Row) error {
			row.SetString("foo", fmt.Sprintf("foo-%d", i))
			return nil
		})
	}

	// set up a derived column of data
	coll.CreateColumn("bar", ForString())

	coll.Query(func(txn *Txn) error {
		src := txn.String("foo")
		dest := txn.String("bar")
		txn.Range(func(_ uint32) {
			if value, ok := src.Get(); ok {
				dest.Set("bar-" + value[4:])
			}
		})
		return nil
	})
}

here's the whole stack trace:

dkolbly@RT7NV42324:~/misc/column$ go test -run TestUpdate
--- FAIL: TestUpdate (0.01s)
panic: runtime error: index out of range [1] with length 1 [recovered]
	panic: runtime error: index out of range [1] with length 1

goroutine 36 [running]:
testing.tRunner.func1.2({0x100b41da0, 0x14000120318})
	/usr/local/go/src/testing/testing.go:1526 +0x1c8
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1529 +0x384
panic({0x100b41da0, 0x14000120318})
	/usr/local/go/src/runtime/panic.go:884 +0x204
github.com/kelindar/column.chunks[...].chunkAt(...)
	/Users/dkolbly/misc/column/column.go:290
github.com/kelindar/column.(*columnString).Apply(0x140001442c0?, 0x0?, 0x14000114240?)
	/Users/dkolbly/misc/column/column_strings.go:173 +0x308
github.com/kelindar/column.(*column).Apply(0x14000106b58?, 0x182c8?, 0x14000106b58?)
	/Users/dkolbly/misc/column/column.go:189 +0xe8
github.com/kelindar/column.(*Txn).commitUpdates.func1(0x14000120300?)
	/Users/dkolbly/misc/column/txn.go:571 +0x3c
github.com/kelindar/column/commit.(*Reader).Range(0x14000114240, 0x1400012e550, 0x1, 0x1400014bc60)
	/Users/dkolbly/misc/column/commit/reader.go:303 +0x168
github.com/kelindar/column.(*Txn).commitUpdates(0x140001a2090, 0x1)
	/Users/dkolbly/misc/column/txn.go:570 +0x1b4
github.com/kelindar/column.(*Txn).commit.func2(0x1768a10ef9fbaccb, 0x1, {0x14000185800?, 0x1009dce30?, 0x14000106d78?})
	/Users/dkolbly/misc/column/txn.go:530 +0x58
github.com/kelindar/column.(*Txn).rangeWrite.func1(0x1)
	/Users/dkolbly/misc/column/txn_lock.go:92 +0x208
github.com/kelindar/bitmap.Bitmap.Range({0x1400011c600, 0x1, 0x14000106e18?}, 0x1400014bdf8)
	/Users/dkolbly/go/pkg/mod/github.com/kelindar/[email protected]/range.go:33 +0x108
github.com/kelindar/column.(*Txn).rangeWrite(0x140001a2090?, 0x14000000001?)
	/Users/dkolbly/misc/column/txn_lock.go:80 +0x58
github.com/kelindar/column.(*Txn).commit(0x140001a2090)
	/Users/dkolbly/misc/column/txn.go:524 +0x178
github.com/kelindar/column.(*Collection).Query(0x1400013c420, 0x100b5f938)
	/Users/dkolbly/misc/column/collection.go:362 +0x108
github.com/kelindar/column.TestUpdate(0x0?)
	/Users/dkolbly/misc/column/column_test.go:764 +0xfc
testing.tRunner(0x1400010d380, 0x100b5f9d0)
	/usr/local/go/src/testing/testing.go:1576 +0x10c
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1629 +0x368
exit status 2
FAIL	github.com/kelindar/column	0.810s

[Feature Request]: Simper collection creation & data insertion & and Query

First thank you very much build such cool project.
I have used it in my project for cache. it works well. after some time usage I found it could be simpler for collection creation, data insertion and query.

for my case alwasy used it to store a struct. every time I need to follow the steps
-- save
1: create a collection
2: create column
3: insert row and binding.
-- query
1: query the row & binding
2: construct a struct.

saying I have a such a stuct
struct Food { Id string json:"id"Categroy stringjson:"category"}

for the case if I save a Food, I hope it would be create the Collection & Column automatically by using the json tag. as in most cases even for a simple key-value, it's still a struct. if it supplies such a simpler API it will much simpler and reducing a lot of coding.

for data query, hope it can automatically marshall into a struct, by match the query properties and struct property json tag.

Txn Commit can result in transaction guarantee violation

Hi

Looking at the txn source, it seems that Commit is not holding the table lock for the entire apply duration, but instead on per delete/update/insert basis. This means that another concurrent transaction might read some deleted value (cause it is done first), and not see the update. Even worse, first transaction might delete a record, while second one might try to update it, resulting in lost update. Haven't thought deeply about the per column lock problems, but probably it gonna has same issues.

For disclosure, I have not tried to reproduce this, but full lock is needed to ensure only one transaction is updating, till finish. This is what sqlite does. For concurrent writer db (e.g. postgres/mysql), they have another protocol (one example is xmin/xmax check in postgres, range/gap lock till commit in mysql) to handle this.

The simplest solution is to just wrap the entire Commit call with a lock, at the cost of reduced concurrency.

What do you think?

large strings

if i do a table.Insert(func(r column.Row) error { r.SetString("name", "Alice"+RandStringBytes(70000)) return nil }) it results in a "runtime error: index out of range [75901] with length 70009".
Is there a limit for large strings? Thank you

no way to issue where clause `a in ("1", "2") and b in ("5", "6")`

	c := column.NewCollection()
	c.CreateColumn("a", column.ForString())
	c.CreateColumn("b", column.ForString())
	c.CreateIndex("a_1", "a", func(r column.Reader) bool { return r.String() == "1" })
	c.CreateIndex("a_2", "a", func(r column.Reader) bool { return r.String() == "2" })
	c.CreateIndex("a_3", "a", func(r column.Reader) bool { return r.String() == "3" })
	c.CreateIndex("b_4", "b", func(r column.Reader) bool { return r.String() == "4" })
	c.CreateIndex("b_5", "b", func(r column.Reader) bool { return r.String() == "5" })
	c.CreateIndex("b_6", "b", func(r column.Reader) bool { return r.String() == "6" })

	c.Query(func(txn *column.Txn) error {
		fmt.Println(txn.InsertObject(map[string]interface{}{
			"a": "1",
			"b": "4",
		}))
		fmt.Println(txn.InsertObject(map[string]interface{}{
			"a": "2",
			"b": "5",
		}))
		fmt.Println(txn.InsertObject(map[string]interface{}{
			"a": "3",
			"b": "6",
		}))
		return nil
	})
	c.Query(func(txn *column.Txn) error {
		// no way to issue where clause `a in ("1", "2") and b in ("5", "6")`, count should be 1
		fmt.Println(txn.Union("a_1", "a_2").Union("b_5", "b_6").Count()) // 3
		fmt.Println(txn.With("a_1", "a_2").With("b_5", "b_6").Count()) // 0
		fmt.Println(txn.With("a_1", "a_2").Union("b_5", "b_6").Count()) // 2
		fmt.Println(txn.Union("a_1", "a_2").With("b_5", "b_6").Count()) // 0
		return nil
	})

index out of range

Hello. Found a bug.
It doesn't appear every execution.

Stack trace:

panic: runtime error: index out of range [3030211] with length 3030211

goroutine 70 [running]:
github.com/kelindar/column.(*columnfloat64).Apply(0xc0004b6570, 0xc0c332ecc0)
        /root/go/golibs/src/github.com/kelindar/column/column_numbers.go:205 +0x1ad
github.com/kelindar/column.(*column).Apply(0xc0004b6660, 0xc0c332ecc0, 0x0)
        /root/go/golibs/src/github.com/kelindar/column/column.go:148 +0x5b
github.com/kelindar/column.(*Txn).commitUpdates.func1(0xc0c332ecc0)
        /root/go/golibs/src/github.com/kelindar/column/txn.go:445 +0x7e
github.com/kelindar/column/commit.(*Reader).Range(0xc0c332ecc0, 0xc000092780, 0xb8, 0xc0009077d8)
        /root/go/golibs/src/github.com/kelindar/column/commit/reader.go:219 +0x11e
github.com/kelindar/column.(*Txn).commitUpdates(0xc0c30fc6e0, 0xb8, 0xc006cdc000)
        /root/go/golibs/src/github.com/kelindar/column/txn.go:439 +0x14b
github.com/kelindar/column.(*Txn).commit.func2(0xc0000000b8, 0xc006cdc000, 0x100, 0x4800)
        /root/go/golibs/src/github.com/kelindar/column/txn.go:408 +0x1f0
github.com/kelindar/column.(*Txn).rangeWrite.func1(0xc0000000b8)
        /root/go/golibs/src/github.com/kelindar/column/txn_lock.go:71 +0x13b
github.com/kelindar/bitmap.Bitmap.Range(0xc0c3c2b2c0, 0x3, 0x4, 0xc000907a08)
        /root/go/golibs/src/github.com/kelindar/bitmap/range.go:28 +0xbc
github.com/kelindar/column.(*Txn).rangeWrite(0xc0c30fc6e0, 0xc000907a78)
        /root/go/golibs/src/github.com/kelindar/column/txn_lock.go:62 +0x7f
github.com/kelindar/column.(*Txn).commit(0xc0c30fc6e0)
        /root/go/golibs/src/github.com/kelindar/column/txn.go:402 +0x1a6
github.com/kelindar/column.(*Collection).Query(0xc0000b2000, 0xc09ba1fc38, 0x1485b00, 0x28)
        /root/go/golibs/src/github.com/kelindar/column/collection.go:264 +0xff

My code causing the issue:

e = d.Query(func(txn *column.Txn) (e error) {
	_ = txn.Range("col2", func(v column.Cursor) {
		v.SetFloat64(0)
	})
	return txn.Range("col1", func(v column.Cursor) {
		if a, h := someMap[v.String()]; h {
			v.SetFloat64At("col2", a[1].(float64))
			v.SetStringAt("col3", a[0].(string))
		}
	})
}

Possible undesired behavior when using query filters

In the With / Without / Union functions it is not checked whether the passed columns are indexes, which might cause unexpected/undesired behavior if used with "normal" columns.

Let's take the following example:

c := column.NewCollection()
c.CreateColumn("name", column.ForString())
c.CreateColumn("some-column", column.ForInt16())

c.Insert(func(row column.Row) error {
	row.SetString("name", "john")
	row.SetInt16("some-column", 100)
	return nil
})
c.Insert(func(row column.Row) error {
	row.SetString("name", "jane")
	return nil
})

If I now start a query over the rows and only want to have the rows that have a value in this column, it works fine:

c.Query(func(txn *column.Txn) error {
	// Prints "1" 
	fmt.Println(txn.With("some-column").Count())
})

If the attribute for "jane" is set somewhere later in the program, this is of course picked up correctly using With. The problem is, however, that you can no longer get "jane" out of the With, because you can't "unset" columns.

By the way, it is not even checked whether the column exists at all. In my opinion, this should cause a panic like it does in other places (e.g. the *ReaderFor functions).

c.Query(func(txn *column.Txn) error {
	// Column does not exist 
	txn.With("abc").Count()
})

Now I have to ask myself if users should be able to pass any column (if so, there should be a Unset(column string) in txn) or if only the WithValue function should be used for normal columns (as I interpret it). In this case you should somehow check this within Txn, possibly via the owner or column, since this struct contains the IsIndex function.

Union is buggy

package main

import (
	"fmt"

	"github.com/kelindar/column"
)

func main() {
	statistics := column.NewCollection()

	// schema
	(statistics.CreateColumn("d_a", column.ForString()))

	statistics.CreateIndex("d_a_1", "d_a", func(r column.Reader) bool { return r.String() == "1" })
	statistics.CreateIndex("d_a_2", "d_a", func(r column.Reader) bool { return r.String() == "2" })
	statistics.CreateIndex("d_a_3", "d_a", func(r column.Reader) bool { return r.String() == "3" })

	// insert
	statistics.InsertObject(map[string]interface{}{
		"d_a": "1",
	})
	statistics.InsertObject(map[string]interface{}{
		"d_a": "2",
	})
	statistics.InsertObject(map[string]interface{}{
		"d_a": "3",
	})

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.With("d_a_1").Count()) // 1, correct
		return nil
	}))

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.With("d_a_2").Count()) // 1, correct
		return nil
	}))

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.With("d_a_3").Count()) // 1, correct
		return nil
	}))

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.Union("d_a_1", "d_a_2").Count()) // 3, incorrect, should be 2
		return nil
	}))

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.Union("d_a_1", "d_a_3").Count()) // 3, incorrect, should be 2
		return nil
	}))

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.Union("d_a_2", "d_a_3").Count()) // 3, incorrect, should be 2
		return nil
	}))

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.With("d_a_1", "d_a_2").Count()) // 0, correct
		return nil
	}))

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.With("d_a_1", "d_a_3").Count()) // 0, correct
		return nil
	}))

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.With("d_a_2", "d_a_3").Count()) // 0, correct
		return nil
	}))

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.With("d_a_1").Union("d_a_2").Count()) // 2, correct
		return nil
	}))

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.With("d_a_1").Union("d_a_3").Count()) // 2, correct
		return nil
	}))

	(statistics.Query(func(tx *column.Txn) error {
		fmt.Println(tx.With("d_a_2").Union("d_a_3").Count()) // 2, correct
		return nil
	}))
}

Exporting read-only column accessors

Hello,

When attempting to dynamically create N column accessors within a transaction, I'm having trouble finding a work-around for creating & defining the type (i.e. readerMap := make([]column.anyWriter)), as the current reader/writer types (anyReader, anyWriter, stringReader, intWriter, etc) are all un-exported. Attempting to reference these types result in a 'un-exported type: column.anyWriter' compilation error, in the case of the readerMap example above.

From my understanding, leaving these types un-exported helps force the developer to create one accessor per column as per the query examples in the README. In turn, this forces a consistent write buffer for all Writer types. Please correct me if I'm wrong.

I'm proposing that all Reader types become exported, or add exportable wrappers for all types. One constraint of this proposal would be for read-only transactions, as I think creating a reader in a transaction after editing that column's data (or deleting the entire row) could result in undefined behavior.

With solely 'anyWriter' exported (renamed to AnyWriter), the read-only capabilities of the package are robust enough to handle a very poorly coded example-

result_rows := make([]column.Object, 0)
collection.Query(func(txn *column.Txn) error {

        // filter rows with a map[string]string
        for colname, colval := range filters {
            txn = txn.WithValue(colname, func(v interface{}) bool {
                return v == colval
            })
        }

        return txn.Range(func (i uint32) {
            row_obj := make(column.Object)

            // create row readers INSIDE Range - bad coding practice, but 
            // column's cursor pointers handle this
            readerMap := make(map[string]column.AnyWriter)
            for _, sel := range selectors {
                readerMap[sel] = txn.Any(sel)
            }

            for _, sel := range selectors {
                rd, _ := readerMap[sel]
                value, _ := rd.Get()
                row_obj[sel] = value
            }
            result_rows = append(result_rows, row_obj)
        })
        return nil
    })

Let me know if there's other reasoning behind leaving them un-exported that I'm not grasping.

Thanks,
William

Non deterministic filter results

Hey, it's me again

I've stumbled across a case where my filtering logic sometimes returns the correct result; a count of 5, and sometimes returns a count of 0.

I've created a gist to show the logic and the test I'm running
https://gist.github.com/james-bowers/55722b1cf2cb60d88093a3051c7e0c20

I'm wondering if there's something I'm doing wrong in the checkFilters function 🤔 as in the readme examples, the "where" clause is achieved by creating various indexes. However I'm confused as to why my approach sometimes works, and sometimes doesn't.

Any light you can shed on the issue would be greatly appreciated.

If the recommended approach is to always create an index for each filter, what overhead is there in doing this?

Thanks so much! 🙏🏼

Index issue

Hello, maybe I dont undestand how it works but, I prepare an example, which completely confuses me. Here is the code:

package main

import (
	"fmt"
	
	"github.com/kelindar/column"

)

func main() {
	str()
	fmt.Println()
	int()
}

func str() {
	coll := column.NewCollection()
	coll.CreateColumn("id", column.ForInt64())
	coll.CreateColumn("data", column.ForString())
	coll.CreateIndex("1", "id", func(r column.Reader) bool {
		return r.Int() == 1
	})

	dd := []string{"aaa", "bbb", "ccc", "ddd"}

	for i, d := range dd {
		coll.Insert(map[string]interface{}{"id": i, "data": d})
	}

	coll.Query(func(tx *column.Txn) error {
		tx.With("1").Select(func(v column.Selector) {
			fmt.Printf("%v: %v\n", v.ValueAt("id"), v.ValueAt("data"))
		})

		return nil
	})
}

func int() {
	coll := column.NewCollection()
	coll.CreateColumn("id", column.ForInt64())
	coll.CreateColumn("data", column.ForInt64())
	coll.CreateIndex("1", "id", func(r column.Reader) bool {
		return r.Int() == 1
	})

	dd := []int64{100, 200, 300, 400}

	for i, d := range dd {
		coll.Insert(map[string]interface{}{"id": i, "data": d})
	}

	coll.Query(func(tx *column.Txn) error {
		tx.With("1").Select(func(v column.Selector) {
			fmt.Printf("%v: %v\n", v.ValueAt("id"), v.ValueAt("data"))
		})

		return nil
	})
}

and an output, which is the same on both apple silicon and apple intel macs:

0: ddd
1: ddd
2: ddd
3: ddd

0: 100
1: 200
2: 300
3: 400

So, which behaviour confused me, in str-func, I expect to see output:

1: bbb

similarly, in intI expect something like:

1: 200

But same code works different for different data types and, as I think, not solve problem to get values by index.

sql dsl

cool project.

I saw in some issue or other that you would be happy about an SQL DSL.

Genji has one if your interested. https://github.com/genjidb/genji

its quite nice... Let me know if you have any questions.

Just saw in #5 that your thinking about persistence too. genji might be useful for that too. But i imagine you want something closer to a sime file system rather than a LSM ?

Allow for int column names

There is a lot of string look UPS, especially around column names. With the generics in 1.18 is it possible to make it so that columns could be using integers for faster lookups?

how do I efficiently query for unique values of a field

say I get a stream of data: {machineCode: "", lat: , lon: }
And I want to display a count of such datums per machineCode.

Is there a way to efficiently get all the unique machine codes? or should I just keep track of them while inserting data?

Unset a column?

const nPos = 9000
const nPosVel = 1000

func BenchmarkIterColumn(b *testing.B) {
	b.StopTimer()
	entities := column.NewCollection()
	if err := errors.Join(
		entities.CreateColumn("px", column.ForFloat64()),
		entities.CreateColumn("py", column.ForFloat64()),
		entities.CreateColumn("vx", column.ForFloat64()),
		entities.CreateColumn("vy", column.ForFloat64()),
		entities.CreateColumn("foo", column.ForBool()),
	); err != nil {
		b.Fatal(err)
	}

	entities.Query(func(txn *column.Txn) error {
		for i := 0; i < nPos; i++ {
			_, err := txn.Insert(func(r column.Row) error {
				r.SetFloat64("px", 1.0)
				r.SetFloat64("py", 2.0)

				return nil
			})
			if err != nil {
				return err
			}
		}

		for i := 0; i < nPosVel; i++ {
			_, err := txn.Insert(func(r column.Row) error {
				r.SetFloat64("px", 1.0)
				r.SetFloat64("py", 2.0)
				r.SetFloat64("vx", 1.0)
				r.SetFloat64("vy", 2.0)
				return nil
			})
			if err != nil {
				return err
			}
		}
		return nil
	})

	b.StartTimer()
	for i := 0; i < b.N; i++ {
		entities.Query(func(txn *column.Txn) error {
			posX := txn.Float64("px")
			posY := txn.Float64("py")
			velX := txn.Float64("vx")
			velY := txn.Float64("vy")
			txn.With("px", "py", "vx", "vy").Range(func(idx uint32) {
				px, _ := posX.Get()
				py, _ := posY.Get()
				vx, _ := velX.Get()
				vy, _ := velY.Get()
				posX.Set(px + vx)
				posY.Set(py + vy)
			})
			return nil
		})
	}
	b.StopTimer()

	count := 0
	entities.Query(func(txn *column.Txn) error {
		count = txn.With("px", "py", "vx", "vy").Count()
		return nil
	})
	assert.Equal(b, nPosVel, count)
}

Note I can add Position and Velocity easily, and the number of PosVel are correct (1000). but how would I then remove the vx/vy later?

The performance seems low...

I modified the million benchmark to billion and it took forever to insert.

Is it possible to expand the commit.Buffer to speed up the transaction?

How to stream filtered values ?

Imagine I get a stream of messages like below from a incoming socket

{"name": "Alice", "grandslams": 20, "country": "switzerland"}
{"name": "Bob", "grandslams": 10, "country": "spain"}
{"name": "Charlie", "grandslams": 12, "country": "serbia"}

Now I want to filter all the messages that satisfies a certain criteria and push it to an outgoing socket in a streaming fashion. Like say stream all messages where grandslams > 10 and country = "spain". While I am aware few ways to implement this I wonder how to do it effectively using this library?

question about chunk and buffer

really great project! I am reading the source code and find it a bit hard to understand the chunk and buffer layout, is there any documentation on the format of the buffer? It will help a lot for understanding!

Background reading?

This is a fascinating area, but somewhat obscure. Are there any papers or books you could recommend for someone trying to get a broad understanding of column stores in general and Kelindar in particular?

Different results from query

When I query with WithValue or WithString I'm able to return different results.

func TestQuery(t *testing.T) {
	model := column.NewCollection()
	model.CreateColumnsOf(map[string]interface{}{
		"ID": "",
	})

	model.Query(func(txn *column.Txn) error {
		for i := 0; i < 20000; i++ {

			txt := fmt.Sprint(i)
			txn.Insert(map[string]interface{}{
				"ID": txt,
			})
		}
		return nil
	})

	model.Query(func(txn *column.Txn) error {
		fmt.Println(txn.WithValue("ID", func(v interface{}) bool {
			return v.(string) == "5"
		}).Count())
		fmt.Println(txn.WithString("ID", func(v string) bool {
			return v == "5"
		}).Count())
		return nil
	})
}

The problem gets more pronounced the higher the loop count. For example, at 10,000 there are no issues, but at 1,000,000 there are 62. The first instance happens at 16,390 loops

Changing the txt format to txt := fmt.Sprintf("%2d", i) removes the problem ( I tried up to 100,000,000 iterations)

I noted this problem when generating with uuid's, the example above is just simpler to generate

Persistency (at least for fast recovery - maybe some log of requests?)

This question sounds rather dumb in the context of a memory DB, but I'm asking it anyway 😉.

Do you plan to support some kind of persistency in terms of power outage with the following two guarantees?

  1. the DB can deal with all corrupted data (e.g. by detecting them and throwing them away)
  2. the DB will "return" to the caller first after the data were safely persisted (i.e. any corruption happening after this "return" would be guaranteed to not affect such data)

Or maybe just abstract the low-level storage API into some "VFS" and let the community create their backend? In this case the "VFS" API should account for the client per-request choice whether it shall treat it as best effort (to minimize latency and maximize throughput with no power outage guarantee but still with all transactional guarantees) or as persistency guarantee.

Any thoughts?

Count() incorrect for tx rollback.

package main_test

import (
	"errors"
	"fmt"
	"log"
	"strconv"
	"testing"

	"github.com/kelindar/column"
	"github.com/zeebo/assert"
)

func TestTransaction(t *testing.T) {
	players := column.NewCollection()
	err := errors.Join(players.CreateColumn("name", column.ForString()),
		players.CreateColumn("class", column.ForString()),
		players.CreateColumn("balance", column.ForFloat64()),
		players.CreateColumn("age", column.ForInt16()))
	if err != nil {
		log.Fatalf("failed to create columns")
	}
	addPlayers := func() {
		err = players.Query(func(tx *column.Txn) error {
			for i := 0; i < 20; i++ {
				_, err := tx.Insert(func(r column.Row) error {
					r.SetString("name", "merlin")
					r.SetString("class", "mage")
					r.SetFloat64("balance", 99.95)
					r.SetInt16("age", 107)
					return nil
				})
				if err != nil {
					return err
				}
			}
			return nil
		})
	}

	addPlayers()
	assert.Nil(t, err)
	assert.Equal(t, players.Count(), 20)

	addPlayersError := func() error {
		err = players.Query(func(tx *column.Txn) error {
			for i := 0; i < 20; i++ {
				_, err = tx.Insert(func(r column.Row) error {
					r.SetString("name", "merlin")
					r.SetString("class", "mage")
					r.SetFloat64("balance", 99.95)
					r.SetInt16("age", 107)
					return nil
				})
				if err != nil {
					return err
				}
			}
			return errors.New("SHOULD NOT PASS")
		})
		return err
	}

	err = addPlayersError()
	assert.Error(t, err)                 //should be error.
	assert.Equal(t, players.Count(), 20) //transaction failed should still be 20.
}

Seems like counts are not accounting for rollbacks. The last transaction should fail, and the count should remain 20 but it doesn't?

[Feature Request] more powerful index

right now this project supports bitmap index, it works well in some cases. if it support b-tree index, it will suport more uses scenarios

thank you very much build such simple and powerful project !

Duplicate index entries

I have a question about using Ascend with CreateSortIndex. In the result duplicate entries are not output. Is there a way that these are returned?
In this case only one Alice is returned:

package main
import (
"github.com/kelindar/column"
)
func main() {
// Erstellen einer neuen Tabelle
table := column.NewCollection()
table.CreateColumn("name", column.ForString())
table.CreateColumn("age", column.ForInt())
table.CreateColumn("country", column.ForString())
table.Insert(func(r column.Row) error {
r.SetString("name", "Charlie")
r.SetInt("age", 20)
r.SetString("country", "Großbritannien")
return nil
})
table.Insert(func(r column.Row) error {
r.SetString("name", "Alice")
r.SetInt("age", 25)
r.SetString("country", "USA")
return nil
})
table.Insert(func(r column.Row) error {
r.SetString("name", "Alice")
r.SetInt("age", 34)
r.SetString("country", "Österreich")
return nil
})
table.Insert(func(r column.Row) error {
r.SetString("name", "Bob")
r.SetInt("age", 30)
r.SetString("country", "Kanada")
return nil
})
table.Insert(func(r column.Row) error {
r.SetString("name", "David")
r.SetInt("age", 35)
r.SetString("country", "Australien")
return nil
})
table.CreateSortIndex("index1", "name")
table.Query(func(txn *column.Txn) error {
name := txn.String("name")
age := txn.Int("age")
country := txn.String("country")
return txn.With("name").Ascend("index1", func(idx uint32) {
print("idx: ", idx)
valueName, _ := name.Get()
print(" name: ", valueName)
valueAge, _ := age.Get()
print(" age: ", valueAge)
valueCountry, _ := country.Get()
println(" contry: ", valueCountry)
})
})
}

Duplicate key records getting inserted

Thanks for such an excellent implementation, your efforts are commendable.

I am able to insert multiple records with same key, is this default behavior? if column has been flagged ForKey then should it allows for duplicate keys to be inserted?

// Create a new columnar collection
type Tcolumns struct {
players *column.Collection
}

func (gC *Tcolumns) LoadData() {

gC.players = column.NewCollection()

gC.players.CreateColumn("dataelementid", column.ForKey())
gC.players.CreateColumn("periodid", column.ForFloat64())
gC.players.CreateColumn("sourceid", column.ForFloat64())
gC.players.CreateColumn("catid", column.ForFloat64())
gC.players.CreateColumn("attribid", column.ForFloat64())
gC.players.CreateColumn("value", column.ForEnum())

// Load the items into the collection
loaded := loadFixture("pem1.json")
gC.players.Query(func(txn *column.Txn) error {
	for _, v := range loaded {
		txn.InsertObject(v)
	}
	return nil
})

gC.players.Query(func(txn *column.Txn) error {
	b := []byte(`{"dataelementid": 56594, "periodid": 58044, "sourceid": 48464, "catid": 244, "attribid": 244, "value": "blah"}`)
	var f column.Object
	if err := json.Unmarshal(b, &f); err != nil {
		panic(err)
	}
	// fmt.Println(f)		
	// fmt.Println(b)

	txn.InsertObject(f)
	c := []byte(`{"dataelementid": 56594, "periodid": 58044, "sourceid": 48464, "catid": 244, "attribid": 244, "value": "blah"}`)
			//n := map[string]interface{}b
	var g column.Object
	if err := json.Unmarshal(c, &g); err != nil {
		panic(err)
	}
	// fmt.Println(f)		
	// fmt.Println(b)

	txn.InsertObject(g)
	return nil
})

}

Running in the background?

Hi, thanks for this library, it's extremely helpful.

This might be a more general Go question so I apologise in advance if it is, or a really silly question... how do I go about running this in-memory database on a webserver, so code executed from an inbound HTTP request can access it and run queries & insert data into an already established column database?

For context, I'm new to GoLang and come from an Elixir background, where we'd spin up a lightweight process to keep the ets (in-memory) cache alive, and then each HTTP request is handled by a separate lightweight process and would be allowed access to that ets memory space or send messages to the process with the in-memory db. Is there something similar in GoLang to achieve this? 🤔

Thanks so much 🙏🏼

possible memory leak when taking a snapshot

I created a poc with column, it got hit with incoming data for a bout an hour, and after that I ran a loop that created a snapshot of the data once a minute.

I monitored the memory consumption and the memory usage jumped (and was not released) every time a snapshot was taken.

Other than closing the underlying writer, is there anything else I should have released?

if not, then I suspect a memory leak. pprof points the finger at the column.grow and s2.NewWriter as possible culprits.

check for dynamic columns

Hey, recently i have found this awesome project.
When in my situation, there came to the cases where the column is dynamic, eg: add another column after insert some records.
Then when ranging over all the records. query a column on an old record, it could cause panic.
Could you export the columnAt function for Txn or is there any method to check where a column is exists.

Really cool project!

I stumbled onto this project from your other one, kelindar/bitmap, while evaluating bitmap libraries. I don't know if I'll use this library, but I wanted to let you know I thought it was really cool, and a great exhibition of how powerful thoughtful technology can be. :)

[Feature Request] set pk column after columns creation

when create columns with CreateColumnsOf as below, there is no way to set a PK field.

` obj := map[string]any{
"name": "Roman",
"age": 35,
"wallet": 50.99,
"health": 100,
"mana": 200,
}

col := NewCollection()
col.CreateColumnsOf(obj)`

if there is a new API such as
collection.setPK(colName string)
then we can set an existing column as PK. Of course, this api only available when the collection is emtpy.

Database query with time.Time

Hi, I have a question about the BinaryMarshaler with the datatype time.Time.
When I make a query with txn.WithValue I can not compare with the datatype time.Time.
The comparison is always false. I've already tried a few things to convert the value before comparing.
Here's a short example. The problem is in the line 'which datatyp should I use here?'

func main() {
	table := column.NewCollection()
	table.CreateColumn("name", column.ForString())
	table.CreateColumn("country", column.ForString())
	table.CreateColumn("birthdate", column.ForRecord(func() *time.Time { return new(time.Time) }))
	table.Insert(func(r column.Row) error {
		r.SetString("name", "Charlie")
		r.SetString("country", "Großbritannien")
		r.SetRecord("birthdate", time.Date(1987, 10, 25, 12, 0, 0, 0, time.Local))
		return nil
	})
	srcDate := time.Date(1999, 3, 2, 12, 0, 0, 0, time.Local)
	table.Insert(func(r column.Row) error {
		r.SetString("name", "Alice")
		r.SetString("country", "USA")
		r.SetRecord("birthdate", srcDate)
		return nil
	})
	table.Insert(func(r column.Row) error {
		r.SetString("name", "Alice")
		r.SetString("country", "Österreich")
		r.SetRecord("birthdate", time.Date(1965, 1, 9, 12, 0, 0, 0, time.Local))
		return nil
	})
	table.Insert(func(r column.Row) error {
		r.SetString("name", "Bob")
		r.SetString("country", "Kanada")
		r.SetRecord("birthdate", time.Date(1965, 1, 9, 12, 0, 0, 0, time.Local))
		return nil
	})
	table.Insert(func(r column.Row) error {
		r.SetString("name", "David")
		r.SetString("country", "Australien")
		r.SetRecord("birthdate", time.Date(1978, 3, 4, 12, 0, 0, 0, time.Local))
		return nil
	})
	table.Query(func(txn *column.Txn) error {
		name := txn.String("name")
		country := txn.String("country")
		birthdate := txn.Record("birthdate")
		return txn.WithValue("birthdate", func(v interface{}) bool {
			return v == srcDate // which datatyp should I use here?
		}).Range(func(idx uint32) {
			valueName, _ := name.Get()
			print(" Name: ", valueName)
			valueCountry, _ := country.Get()
			print(" Contry: ", valueCountry)
		        birthdate, _ := birthdate.Get()
			str := fmt.Sprintf("%v", birthdate)
			println(" Birthdate: ", str)
		})
	})
}

Cross collection transactions?

Since its in-memory would it be possible to have single TX across multiple collections? For example in your grid/movement ecs example it would make sense to be able to update both as a single transaction.

Especially for ECS semantics a system generally has full control on the state and the TX semantics are at a different level.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.