Giter Club home page Giter Club logo

go-pilosa's Introduction

Go Client for Pilosa

This repo archived Sept 2022 as part of the transition from Pilosa to FeatureBase. Please contact community[at]featurebase[dot]com with any questions.

GoDoc

Go client for Pilosa high performance distributed index.

What's New?

See: CHANGELOG

Requirements

  • Go 1.12 and higher.

Install

Download the library in your GOPATH using:

go get github.com/pilosa/go-pilosa

After that, you can import the library in your code using:

import "github.com/pilosa/go-pilosa"

Usage

Quick overview

Assuming Pilosa server is running at localhost:10101 (the default):

package main

import (
	"fmt"

	"github.com/pilosa/go-pilosa"
)

func main() {
	var err error

	// Create the default client
	client := pilosa.DefaultClient()

	// Retrieve the schema
	schema, err := client.Schema()

	// Create an Index object
	myindex := schema.Index("myindex")

	// Create a Field object
	myfield := myindex.Field("myfield")

	// make sure the index and the field exists on the server
	err = client.SyncSchema(schema)

	// Send a Set query. If err is non-nil, response will be nil.
	response, err := client.Query(myfield.Set(5, 42))

	// Send a Row query. If err is non-nil, response will be nil.
	response, err = client.Query(myfield.Row(5))

	// Get the result
	result := response.Result()
	// Act on the result
	if result != nil {
		columns := result.Row().Columns
		fmt.Println("Got columns: ", columns)
	}

	// You can batch queries to improve throughput
	response, err = client.Query(myindex.BatchQuery(
		myfield.Row(5),
		myfield.Row(10)))
	if err != nil {
		fmt.Println(err)
	}

	for _, result := range response.Results() {
		// Act on the result
		fmt.Println(result.Row().Columns)
	}
}

Documentation

Data Model and Queries

See: Data Model and Queries

Executing Queries

See: Server Interaction

Importing and Exporting Data

See: Importing and Exporting Data

Other Documentation

Contributing

See: CONTRIBUTING

License

See: LICENSE

go-pilosa's People

Contributors

alanbernstein avatar benbjohnson avatar bmuller avatar codysoyland avatar jaffee avatar kcrodgers24 avatar seebs avatar shaqque avatar tgruben avatar travisturner avatar yuce avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-pilosa's Issues

need to fix perf issue with integer import in importbatch.go

Detailed in a TODO comment as usual.

		// TODO(jaffee) I think this may be very inefficient. It looks
		// like we're copying the `ids` and `values` slices over
		// themselves (an O(n) operation) for each nullIndex so this
		// is effectively O(n^2). What we could do is iterate through
		// ids and values each once, while simultaneously iterating
		// through nullindices and keeping track of how many
		// nullIndices we've passed, and so how far back we need to
		// copy each item.
		//
		// It was a couple weeks ago that I wrote this code, and I
		// vaguely remember thinking about this, so I may just be
		// missing something now. We should benchmark on what should
		// be a bad case (an int field which is mostly null), and see
		// if the improved implementation helps a lot.

Now I've actually run into it:

		// Update: I ran into this on a largish batch size (4M) with a
		// very small percentage of nils (0.5%) - was very obvious in
		// the CPU profile

Unable to read back bits set in pilosa

I 'm running a pilosa docker container: https://hub.docker.com/u/pilosa/ on my OSX host.

When running the code on the your github page from the quick overview I do not retrieve any results back.
However, I can see in the web-ui of pilosa that the bit 42 is set on row 5 of myindex, myframe.
The code itself is not capable of reading the bit back out. It does not return an error but rather empty result arrays:

go run cmd/test2/main.go
Got bits:  []
[]
[]

I have the same behaviour in the real application i'm developing. I can set bits, retrieve them from the webui, but simple Bitmap queries from code do not work.

client.go ExperimentalReplayImport() races against client.go logImport.func1()

go version go1.14.4 darwin/amd64

At tip, 28cb67f, running against a tip pilosa/pilosa server (at 9dc1775b93464f78acc8573cfad2f405b1175fb5), make test-all-race detected the following race:

(base) jaten@Jasons-MacBook-Pro ~/go/src/github.com/pilosa/go-pilosa (master) $ make test-all-race
PILOSA_BIND=http://:10101 /Applications/Xcode.app/Contents/Developer/usr/bin/make test-all TESTFLAGS=-race
PILOSA_BIND=http://:10101 go test -count=1 ./... -race
==================
WARNING: DATA RACE
Write at 0x00c00012caa0 by goroutine 11:
  bytes.(*Buffer).Read()
      /usr/local/go/src/bytes/buffer.go:297 +0x4a
  io.ReadAtLeast()
      /usr/local/go/src/io/io.go:310 +0x98
  io.ReadFull()
      /usr/local/go/src/io/io.go:329 +0x93
  encoding/gob.decodeUintReader()
      /usr/local/go/src/encoding/gob/decode.go:120 +0x40
  encoding/gob.(*Decoder).recvMessage()
      /usr/local/go/src/encoding/gob/decoder.go:81 +0xa7
  encoding/gob.(*Decoder).decodeTypeSequence()
      /usr/local/go/src/encoding/gob/decoder.go:143 +0x1f2
  encoding/gob.(*Decoder).DecodeValue()
      /usr/local/go/src/encoding/gob/decoder.go:211 +0x17f
  encoding/gob.(*Decoder).Decode()
      /usr/local/go/src/encoding/gob/decoder.go:188 +0x236
  github.com/pilosa/go-pilosa.(*Client).ExperimentalReplayImport()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:1302 +0x396
  github.com/pilosa/go-pilosa.TestImportWithReplayErrors.func1()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client_internal_it_test.go:159 +0x5b

Previous write at 0x00c00012caa0 by goroutine 77:
  bytes.(*Buffer).Write()
      /usr/local/go/src/bytes/buffer.go:169 +0x42
  encoding/gob.(*Encoder).writeMessage()
      /usr/local/go/src/encoding/gob/encoder.go:82 +0x41a
  encoding/gob.(*Encoder).EncodeValue()
      /usr/local/go/src/encoding/gob/encoder.go:253 +0x881
  encoding/gob.(*Encoder).Encode()
      /usr/local/go/src/encoding/gob/encoder.go:176 +0x5b
  github.com/pilosa/go-pilosa.(*Client).logImport.func1()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:1237 +0x2b9

Goroutine 11 (running) created at:
  github.com/pilosa/go-pilosa.TestImportWithReplayErrors()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client_internal_it_test.go:158 +0x929
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:991 +0x1eb

Goroutine 77 (finished) created at:
  github.com/pilosa/go-pilosa.(*Client).logImport()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:1225 +0xfb
  github.com/pilosa/go-pilosa.(*Client).importRoaringBitmap()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:885 +0x98a
  github.com/pilosa/go-pilosa.(*Client).importColumnsRoaring()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:606 +0x5fb
  github.com/pilosa/go-pilosa.(*Client).importColumns()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:533 +0x946
  github.com/pilosa/go-pilosa.(*Client).importColumns-fm()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:521 +0xca
  github.com/pilosa/go-pilosa.importRecords()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/import_manager.go:204 +0x1be
  github.com/pilosa/go-pilosa.recordImportWorker()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/import_manager.go:166 +0x5eb
==================
==================
WARNING: DATA RACE
Read at 0x00c00012ca80 by goroutine 11:
  bytes.(*Buffer).empty()
      /usr/local/go/src/bytes/buffer.go:69 +0x5f
  bytes.(*Buffer).Read()
      /usr/local/go/src/bytes/buffer.go:298 +0x94
  io.ReadAtLeast()
      /usr/local/go/src/io/io.go:310 +0x98
  io.ReadFull()
      /usr/local/go/src/io/io.go:329 +0x93
  encoding/gob.decodeUintReader()
      /usr/local/go/src/encoding/gob/decode.go:120 +0x40
  encoding/gob.(*Decoder).recvMessage()
      /usr/local/go/src/encoding/gob/decoder.go:81 +0xa7
  encoding/gob.(*Decoder).decodeTypeSequence()
      /usr/local/go/src/encoding/gob/decoder.go:143 +0x1f2
  encoding/gob.(*Decoder).DecodeValue()
      /usr/local/go/src/encoding/gob/decoder.go:211 +0x17f
  encoding/gob.(*Decoder).Decode()
      /usr/local/go/src/encoding/gob/decoder.go:188 +0x236
  github.com/pilosa/go-pilosa.(*Client).ExperimentalReplayImport()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:1302 +0x396
  github.com/pilosa/go-pilosa.TestImportWithReplayErrors.func1()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client_internal_it_test.go:159 +0x5b

Previous write at 0x00c00012ca80 by goroutine 77:
  bytes.(*Buffer).tryGrowByReslice()
      /usr/local/go/src/bytes/buffer.go:108 +0x196
  bytes.(*Buffer).Write()
      /usr/local/go/src/bytes/buffer.go:170 +0x8f
  encoding/gob.(*Encoder).writeMessage()
      /usr/local/go/src/encoding/gob/encoder.go:82 +0x41a
  encoding/gob.(*Encoder).EncodeValue()
      /usr/local/go/src/encoding/gob/encoder.go:253 +0x881
  encoding/gob.(*Encoder).Encode()
      /usr/local/go/src/encoding/gob/encoder.go:176 +0x5b
  github.com/pilosa/go-pilosa.(*Client).logImport.func1()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:1237 +0x2b9

Goroutine 11 (running) created at:
  github.com/pilosa/go-pilosa.TestImportWithReplayErrors()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client_internal_it_test.go:158 +0x929
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:991 +0x1eb

Goroutine 77 (finished) created at:
  github.com/pilosa/go-pilosa.(*Client).logImport()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:1225 +0xfb
  github.com/pilosa/go-pilosa.(*Client).importRoaringBitmap()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:885 +0x98a
  github.com/pilosa/go-pilosa.(*Client).importColumnsRoaring()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:606 +0x5fb
  github.com/pilosa/go-pilosa.(*Client).importColumns()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:533 +0x946
  github.com/pilosa/go-pilosa.(*Client).importColumns-fm()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/client.go:521 +0xca
  github.com/pilosa/go-pilosa.importRecords()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/import_manager.go:204 +0x1be
  github.com/pilosa/go-pilosa.recordImportWorker()
      /Users/jaten/go/src/github.com/pilosa/go-pilosa/import_manager.go:166 +0x5eb
==================
--- FAIL: TestImportWithReplayErrors (0.43s)
    testing.go:906: race detected during execution of test
go-pilosa 2020/06/22 15:51:20 invalidating shard node cache, uri mismatch at 0 old: [zzz://:0], new: [http://localhost:10101]
go-pilosa 2020/06/22 15:51:58 299 pilosa/2.0 "FAKE WARNING: Deprecated PQL version: PQL v2 will remove support for SetBit() in Pilosa 2.1. Please update your client to support Set() (See https://docs.pilosa.com/pql#versioning)." "Sat, 25 Aug 2019 23:34:45 GMT"
FAIL
FAIL	github.com/pilosa/go-pilosa	44.363s
ok  	github.com/pilosa/go-pilosa/csv	0.051s
?   	github.com/pilosa/go-pilosa/examples/multicol-csv-import	[no test files]
?   	github.com/pilosa/go-pilosa/gopilosa_pbuf	[no test files]
ok  	github.com/pilosa/go-pilosa/gpexp	2.752s
?   	github.com/pilosa/go-pilosa/lru	[no test files]
FAIL
make[1]: *** [test-all] Error 1
make: *** [test-all-race] Error 2
(base) jaten@Jasons-MacBook-Pro ~/go/src/github.com/pilosa/go-pilosa (master) $ git log|head
commit 6bc638d761338d5189736e9beed8546a4bc6e5ce
Merge: d55c16e 28cb67f
Author: Travis Turner <[email protected]>
Date:   Sat Nov 30 22:00:22 2019 -0600

    Merge pull request #262 from travisturner/groupby-having
    
    add having clause support to GroupByBuilder

commit 28cb67f61c4a7db69c0907c64d8b3363b587ad9f
(base) jaten@Jasons-MacBook-Pro ~/go/src/github.com/pilosa/go-pilosa (master) $

Error: can't skip unknown wire type 7 for internal.QueryResponse

Hi,
I got the following error: proto: can't skip unknown wire type 7 for internal.QueryResponse

func BenchmarkIntersectSegments(b *testing.B) {
	q := index.Intersect(frame.Bitmap(2), frame.Bitmap(3))
	if q.Error() != nil {
		b.Error(q.Error())
		return
	}
	response, err := client.Query(q, nil)
	if err != nil {
		b.Error(err) // this is where the error happens
		return
	}
	if response.ErrorMessage != "" {
		b.Error(response.ErrorMessage)
		return
	}

	for _, result := range response.Results() {
		if len(result.Bitmap.Bits) == 0 {
			b.Error("bitmap is 0")
			return
		}
	}
	return
}

The query works in the webUI:
Intersect(Bitmap(segment_id=2,frame='segments'), Bitmap(segment_id=3,frame='segments'))

deprecate NewClient* functions and ClientOptions struct

Now that we have the NewClient function and we're using functional options, we should deprecate the other functions and options struct to simplify the API. I think at first we can just put some deprecation warnings in there and make sure that all of our tooling is using the new interface, then we can unexport those things so that they aren't exposed anymore.

concurrent map access in augmentHeaders

augmentHeaders is called from httpRequest which is often called (in Query for example) with protobufHeaders which is a global variable. I've observed a concurrent map access due to this.

We should (a) write a test demonstrating the issue which fails under go test -race and (b) fix it... perhaps by augmenting them a single time at startup, or making a copy each time.

schema compatibility hard to verify, currently not verified

schema.Index() and index.Field() are both get-or-create methods, but that means that if you specified options, you don't necessarily know that the options you specified are the same as the options already present.

In some cases, empty/default options promote to non-zero-value options, so it's not necessarily possible to trivially check whether the set of options in the schema returned by the server is incompatible with or different from user-specified options. The key weak points are the FieldTypeDefault and CacheTypeDefault values; for instance, if you explicitly specify a CacheType of "", you will end up with the default cache type of "ranked", but not until the field options are transmitted to the server, and the schema is queried again. (SyncSchema won't update the local copy to reflect this.)

Since these functions don't return errors, and haven't previously returned nil, it's unclear how to safely change the API for this. Don't have a good plan, filing issue so it doesn't get forgotten.

[Poll] CSV format related code in the client

The code that reads data from a CSV file in the client probably doesn't belong to the go-pilosa package. What should we do about it?

  • πŸŽ‰ Move it to a sub package
  • πŸ‘Ž : Delete the code
  • πŸ‘ Don't do anything about it, keep it as is.

memory usage got large

Memory usage is unreasonably large. This is partially because of a recent change which makes the record importers accept slices of records over their channels, but partially because, for quite a long time before that, the channels in question have been requesting a buffer length of the current batch size. I don't think that makes much sense, but it's particularly a problem in cases where imports are waiting on the server most of the time, because if you set a reasonably large batch size (say, 65536), and you have 16 records per slice dumped into the work queue, you're now up to 1M records waiting in that work queue, per import worker, and suddenly you have 30GB of memory in use and the GC is chewing up a lot of time.

add tracing support

I think it should be possible to get traces which show timings from when an (e.g.) import request is made on the client to when it's processed on the server, and back. This would allow us to see how much latency is eaten up in network and net/http processing before getting to Pilosa's handler.

Pilosa has numerous examples of how to implement tracing in Go including going between hosts.

How to use "Rows(field)"

Rows(, previous=<UINT|STRING>, limit=, column=<UINT|STRING>, from=, to=)

I try to execute this method, but error happened. see following:

curl "localhost:10101/index/repository/query" -XPOST -d 'Rows(stargazer)'
{"error":"parsing: parsing: \nparse error near IDENT (line 1 symbol 6 - line 1 symbol 15):\n"stargazer"\n"}

Am i use it wrong? or What are the special requirements for the field。

Do not roaring import if trackExistence is true

Currently trackExistence is not handled on the client-side on roaring import. So roaring import should be disabled when the index has trackExistence=true until we support it on the client-side or server-side.

Issue with error handling in import_manager.go

Hi Everyone!

Think found an issue with error handling in import_manager.go line 147.

			for shard, records := range batchForShard {
				if len(records) == 0 {
					continue
				}
				err = importRecords(shard, records)
		=====>		if err != nil {
					break
				}
				batchForShard[shard] = nil
			}

Right now when importRecords func returns an error - it silently aborts importing other shards by breaking the loop and continues to read from recordChan, does not free memory and then continue appending records to batchForShard which leads to unbounded growth of batchForShard that eventually causes OOM if importRecords always return error.

Think would be good to improve this behavior,
error returned from ImportRecord should not be silent - takes a while to understand what is going on when errors returned by pilosa cluster silently dropped,
also should be a way to avoid OOM,
and maybe not aborting other shards imports in case the error returned by cluster is shard specific (for example in my case the situation was caused by Pilosa node returning 412 code since it didn't own the shard trying to import to it)

proposal: remove import strategy option

I think having import strategy be a separate option creates unnecessary API surface - the usual arguments apply here - more to document, more to test, more to support, more potential sources of confusion for users, etc. This, in particular is sticky because it's not necessarily clear how the strategy option interacts with other import options Which other options apply to which strategies? Do some apply to all strategies, or a subset. If you pass an option that doesn't apply to a strategy, will it be silently ignored, or will there be an error?

The batch import strategy is basically what we've been doing and allows the user to tune a pretty straightforward latency/throughput tradeoff. What's missing is having timeouts to import smaller batches in the case that data isn't coming in at a high rate.

I think we can add this functionality in conjunction with the batch import strategy without having them be mutually exclusive. Whenever a new batch starts, we can start a timer which will be the upper bound on how long to wait before importing that batch. The expected case would be that the data reaches the batch size and is imported well before that timer fires, but in the case that incoming data is slow or sporadic, the timer ensures that data is indexed in a timely fashion and not buffered indefinitely.

revisit the QueryResult interface

We may be able to simplify the QueryResult interface.

// QueryResult represents one of the results in the response.
type QueryResult interface {
	Type() uint32
	Bitmap() BitmapResult
	CountItems() []CountResultItem
	Count() int64
	Sum() int64
	Changed() bool
}

Because each function is specific to the type, it's effectively an empty interface. And every implementation is left with unused methods:

func (BitmapResult) Type() uint32                  { return QueryResultTypeBitmap }
func (b BitmapResult) Bitmap() BitmapResult        { return b }
func (BitmapResult) CountItems() []CountResultItem { return nil }
func (BitmapResult) Count() int64                  { return 0 }
func (BitmapResult) Value() int64                  { return 0 }
func (BitmapResult) Changed() bool                 { return false }

I could see it being something like this instead:

type QueryResult interface{}

type BitmapResult struct {
    QueryResult

    Attributes map[string]interface{} `json:"attrs"`
    Bits       []uint64               `json:"bits"`
    Keys       []string               `json:"keys"`
}

In this case the QueryResult inside the BitmapResult is not really necessary because everything implements the empty interface, but if for some reason QueryResult needs to have a function defined, this would ensure BitmapResult implements the interface.

This is not at all a high priority, but may be worth revisiting at some point.

FrameOptions needs to support CacheType and CacheSize

On frame creation, pilosa supports the following frame options:

type FrameOptions struct {
	RowLabel       string      `json:"rowLabel,omitempty"`
	InverseEnabled bool        `json:"inverseEnabled,omitempty"`
	CacheType      string      `json:"cacheType,omitempty"`
	CacheSize      uint32      `json:"cacheSize,omitempty"`
	TimeQuantum    TimeQuantum `json:"timeQuantum,omitempty"`
}

But this client only supports:

type FrameOptions struct {
	RowLabel string
	TimeQuantum TimeQuantum
	InverseEnabled bool
}

We should add support for caching parameters.

make import-roaring the default

We talked about this a bit in FeatureBaseDB/pdk#117 but the more I think about, the more I think it makes sense for import-roaring to be the default.

It's enormously faster, and avoids a lot of memory pressure on Pilosa for time fields. Since turning it on is really an implementation detail from the perspective of a user of the client library, I don't see much downside to doing it.

On the other hand, if we leave it off by default, users of go-pilosa will see bad performance by default.

Since we're probably going to move everything in the direction of roaring imports and deprecate regular imports, it would make sense to me to exercise them by default.

Need better field support.

Field support feels a bit weird right now. if you have a *Frame, you can call Field(name) to get a field as you would expect, but there is no way to set any options on that field, and an error is returned as a member of the field struct rather than a second return value.

Based on the way everything else works, I'd expect to do something like

schema, err := client.Schema()
index, err := schema.Index(name)
frame, err := index.Frame(name)
field, err := frame.IntField(name, min, max)
err = client.SyncSchema(schema)
client.Query(field.Range(...))

Though, I think we could actually do away with the errors on everything but Schema and SyncSchema and do all the validation in those methods.

Currently, there is no easy way to "GetOrCreate" a field which is what is prompting this issue.

Client.ImportNode ignores error

 func (c *Client) importNode(request *internal.ImportRequest) error {
	data, _ := proto.Marshal(request)
	// request.Marshal never returns an error

proto.Marshal attempts to cast request to a Marshaler, but *internal.ImportRequest doesn't implement Marshal(), so proto.Marshal goes down another path which it appears may return an error. We should probably just handle the error anywhere proto.Marshal is called regardless of whether the object in question is a Marshaler in order to be consistent, suppress warnings, and protect against future changes to generated code.

Examples in README need corrections.

The README needs to be reviewed for errors.
For example, there are several instances of bitmap calls like this: stargazer.Bitmap(1, 100). It is my understanding that Bitmap() only takes one argument.

Broken import with Pilosa repo master

Trying to import using Pilosa cluster master branch - returns error "starting field import for segment: doing import: Server error 415 Unsupported Media Type body:'Unsupported media type" - server expects "application/x-protobuf" but client sends "application/x-binary" to roaring import endpoint

retry requests when temporary errors occur

Pilosa clients need a way to distinguish between temporary and permanent errors when querying Pilosa.

If a temporary error occurs, the client should retry the request a few times, sleeping successively longer intervals between retries.

The motivating use case for this is when a Pilosa cluster goes into state STARTING briefly because it thinks a node is down due to a hiccup in memberlist. This is often a transient condition and it would be convenient if it didn't cause things like imports to fail.

ImportField doesn't work. Time out in case of Pilosa in Linux container on Docker Desktop for Windows

Hello!
Linux container and Windows host have different ip addresses. And, unfortunately, Docker Desktop for Windows can’t route traffic to Linux containers. (see https://docs.docker.com/docker-for-windows/networking/)

In the same time, we have "(c *Client) fetchFragmentNodes(indexName string, shard uint64) ([]fragmentNode, error)"
with http request to "/internal/fragment/nodes?shard=%d&index=%s " and as result "fragmentNodeURIs.Host" contains ip address of the lunix container and during the import this address is unreachable from windows host where a pilosa's client is trying to make import.

import to multi-node cluster concurrently

Pilosa's /import endpoint accepts data only for slices which reside on the node being queried. The client is responsible for splitting data in a given import out to the various nodes in the cluster. Currently it does this one node at a time for a given batch of bits being imported, but it should be straightforward to make these requests concurrently.

I don't expect to see great speedups immediately as most import requests probably only contain data for a single slice (perhaps two), but if we change the way column ids are allocated to round robin from the next few slices, we could see pretty big gains here.

integration tests should use MustRunCluster and not rely on external Pilosa

github.com/pilosa/pilosa/test exposes a "MustRunCluster" which is useful for creating and destroying temporary clusters to test against. We no longer need to rely on an externally running Pilosa instance for client integration tests.

Furthermore, we should remove the // +build integration tags so that all tests run on every test run. The build tags sometimes break automated code analysis and refactoring tools (such as gorename which ignores them by default), so removing them will simplify things.

Option to dump imports to file to easily reproduce workloads.

We've come across a number of scenarios where it would be nice to be able to capture the import requests that go-pilosa is making during an ingest job. This would be useful for sharing things that (e.g.) trigger bugs or performance problems in Pilosa without needing to share the raw data or custom ingest code.

We would also need the ability to fairly easily replay the data.

This would also be nice for isolated performance testing, removing any possible overhead or bottlenecks in parsing and sending through channels, and just having the raw imports ready to send to Pilosa.

I'm envisioning that this option would dump the data into a file on disk while also working normally (ingesting to Pilosa). The replay would produce exactly the same set of requests that would have been made during the original import, so we would probably define some kind of Request object as protobuf or whatever that we can serialize a stream of.

Should create only 1 net/http client per node and re-use it

Currently, go-pilosa is structured such that it creates lots of net/http.Client objects during the course of execution (especially during imports which was partially addressed in #46). The http.Client object is goroutine safe, and meant to be re-used (it may keep TCP connections open internally which would improve latency/efficiency). I think we just need to create one http.Client per host in the cluster, and re-use that in all interactions with a given host.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.