featurebasedb / pdk Goto Github PK
View Code? Open in Web Editor NEWPilosa Dev Kit - implementation tooling and use case examples are here!
License: BSD 3-Clause "New" or "Revised" License
Pilosa Dev Kit - implementation tooling and use case examples are here!
License: BSD 3-Clause "New" or "Revised" License
Hi guys:
I'm a freshman of Pilosa and go. I read some docs in your web site, and I am not clear with Pilosa and it's data model so far.
For example, some records of relational database like below:
name | age |
---|---|
A | 18 |
B | 19 |
Field name
ID | 1 | 2 | |
---|---|---|---|
A | 1 | 1 | 0 |
B | 2 | 0 | 1 |
Field age (BSI)
age | 18 | 19 |
---|---|---|
comp 0 | 1 | 0 |
comp 2 | 0 | 0 |
comp 3 | 1 | 1 |
comp 4 | 1 | 1 |
comp 5 | 0 | 0 |
not_null | 1 | 1 |
Is that right?
If I understand the above correctly. I have the following confusion:
Field name
, how can I transform A
to Row id 1 while ingestion. The map A to 1 need to be maintained in other server of user's system or pilosa will do it automatically?If it's latter, can I get the value A
while query Row(name=1)?Excepting to hear from you guys and thanks for helping me to use and understand Pilosa better !
For the use case work, we put together a CSV import system that is specific to the two use cases, but lays some groundwork for working with more general data sources. The scope is limited to well-formatted, well-defined tabular data, so users will be responsible for providing clean data.
Since the Go client supports importing using more than one worker, it may be useful to have an option to specify the number of import workers.
Can we default usage of go-pilosa to OptImportRoaring(true)? It's definitely a significant improvement for time fields. @yuce
The current Pilosa proxy has a couple limitations.
We should probably rewrite it using something like https://golang.org/pkg/net/http/httputil/#ReverseProxy to handle most of the proxying details, and we can add in column translation on the response as well.
pilosa#919 fixed pilosa#875 which corrects the import endpoint to accept field values that are negative as well as positive.
This breaks some stuff in the PDK like the Indexer.AddValue method.
We've seen some interest in ingesting json data from S3. It should be pretty easy to write a pdk.Source which returns json objects one by given an S3 bucket or list of S3 files.
The net
subcommand contains an in-memory nexter implementation, and a generic string mapper. taxi
has separated its mapping functionality out into a library, and I think it has come kind of nexter as well. We should get all mapping functionality into the pdk library, with as uniform an interface as possible and make sure both commands are sharing nexter code if possible.
Need to expose options so that PDK ingestion code can configure go-pilosa to use tracing.
Additionally, the usual configuration will probably be that go-pilosa is writing to the same tracing infra as Pilosa, so it might be worth Pilosa exposing its tracing config through an endpoint which go-pilosa can consume, thereby allowing hands-off config by default... 🤔 This second part is not too relevant directly to the PDK... I think we still need the first part anyway.
While doing some multi-cloud benchmarks, I noticed that the taxi import would use a "normal" (~10GB) amount of memory for a long time, and then suddenly over the course of about 90 seconds spike up and OOM even on boxes as large as 128GB.
It did not happen at the same point in the import each time - I observed it at 115.5GB, 159.6GB, 162.4GB, and 165.3GB among others.
I saw this happen both on AWS and Azure instances, but not OCI. The OCI instances I was using did have >200GB of RAM though - I did not measure their memory usage during operation to see if it spiked up over 128GB at any point.
With the schema updates, running an import will fail unless dbs and frames are created first. This should be done transparently.
setupFrame accesses some maps in Index, but also launches some goroutines which access the same maps. I think we can pass the accessed values to the goroutines rather than having them access the maps directly.
(This is in generic-improvements)
In order to make PDK a more focused tool, we may consider moving the educational code liek examples, uses cases and tutorials to their own repo.
@jaffee commented on Tue Feb 07 2017
Exploring implementing a use case in pilosa
Currently the ImportClient handles setting (row, column)
. It would be nice to be able to additionally import (row, column, timestamp)
.
The generic parser currently returns an error any time it fails to parse a field's value, and stops processing for that record. In reality, there are lots of innocuous reasons why a certain field might fail to parse, and it doesn't indicate that the entire record is suspect.
We should have stat/log options for notifying and counting which fields are failing, why, and how often, but we should make a best effort to parse any record and return some data.
I don't think we need to go crazy with configurability, some count stats that use the field path and encapsulate what the error is (e.g. null value, unsupported type, etc.). pdk/ingest.go has an example of a simple stats interface, that we should probably extend throughout the codebase so that it can be configured at the top level.
It's not a test, it just reads from kafka
currently, pdk/import.go
contains import functionality which relies on importing pilosa/ctl
and using that importer. We should remove this code and use the stuff in pdk/pilosa.go
which uses the Go client and has nascent support for BSI field values.
The README.md file references https://github.com/alanbernstein/pilosa-notebooks/blob/master/taxi-use-case.ipynb
We should probably move this repo to the pilosa org so we aren't referencing alan's personal account.
We should have CI enabled on this repo.
Do the taxi import, then issue the following query and observe the result (edited error message for readability):
curl 10.1.110.70:10101/index/taxi/query -d'TopN(speed_mph)'
curl 10.1.110.70:10101/index/taxi/query -d'TopN(speed_mph)'
{"error":
"executing: retrieving full counts: server error 500 Internal Server Error: 'PANIC: strconv.ParseInt: parsing \"9223372036854775808\": value out of range
goroutine 25657836 [running]:
runtime/debug.Stack(0xcc4b37c000, 0x1f4, 0xc57417be90)
/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
github.com/pilosa/pilosa/http.(*Handler).ServeHTTP.func1(0xe26100, 0xcc4b37c000, 0xc00020c780)
/home/ubuntu/go/src/github.com/pilosa/pilosa/http/handler.go:286 +0x91
panic(0xc4ef40, 0xc57417be90)
/usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/pilosa/pilosa/pql.(*Query).addNumVal(0xc84ffe8010, 0xc9bdde7cf5, 0x13)
/home/ubuntu/go/src/github.com/pilosa/pilosa/pql/ast.go:150 +0x758
github.com/pilosa/pilosa/pql.(*PQL).Execute(0xc84ffe8010)
/home/ubuntu/go/src/github.com/pilosa/pilosa/pql/pql.peg.go:474 +0x2ddc
github.com/pilosa/pilosa/pql.(*parser).Parse(0xc84ffe8000, 0xc84ffe8000, 0x0, 0x0)
/home/ubuntu/go/src/github.com/pilosa/pilosa/pql/parser.go:62 +0x19f
github.com/pilosa/pilosa.(*API).Query(0xc0002aeab0, 0xe26e40, 0xc57417b8c0, 0xc138a1c740, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/ubuntu/go/src/github.com/pilosa/pilosa/api.go:110 +0x1b7
github.com/pilosa/pilosa/http.(*Handler).handlePostQuery(0xc00020c780, 0xe26100, 0xcc4b37c000, 0xcddb1d8400)
/home/ubuntu/go/src/github.com/pilosa/pilosa/http/handler.go:458 +0x24b
github.com/pilosa/pilosa/http.(*Handler).handlePostQuery-fm(0xe26100, 0xcc4b37c000, 0xcddb1d8400)
/home/ubuntu/go/src/github.com/pilosa/pilosa/http/handler.go:254 +0x48
net/http.HandlerFunc.ServeHTTP(0xc0002de4d0, 0xe26100, 0xcc4b37c000, 0xcddb1d8400)
/usr/local/go/src/net/http/server.go:1964 +0x44
github.com/pilosa/pilosa/http.(*Handler).extractTracing.func1(0xe26100, 0xcc4b37c000, 0xcddb1d8300)
/home/ubuntu/go/src/github.com/pilosa/pilosa/http/handler.go:231 +0x153
net/http.HandlerFunc.ServeHTTP(0xd20d491fe0, 0xe26100, 0xcc4b37c000, 0xcddb1d8300)
/usr/local/go/src/net/http/server.go:1964 +0x44
github.com/pilosa/pilosa/http.(*Handler).queryArgValidator.func1(0xe26100, 0xcc4b37c000, 0xcddb1d8300)
/home/ubuntu/go/src/github.com/pilosa/pilosa/http/handler.go:222 +0xcc
net/http.HandlerFunc.ServeHTTP(0xc990d08000, 0xe26100, 0xcc4b37c000, 0xcddb1d8300)
/usr/local/go/src/net/http/server.go:1964 +0x44
github.com/pilosa/pilosa/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0xc000234af0, 0xe26100, 0xcc4b37c000, 0xcddb1d8300)
/home/ubuntu/go/src/github.com/pilosa/pilosa/vendor/github.com/gorilla/mux/mux.go:162 +0xf1
github.com/pilosa/pilosa/vendor/github.com/gorilla/handlers.(*cors).ServeHTTP(0xc00031e2d0, 0xe26100, 0xcc4b37c000, 0xcddb1d8100)
/home/ubuntu/go/src/github.com/pilosa/pilosa/vendor/github.com/gorilla/handlers/cors.go:51 +0xa32
github.com/pilosa/pilosa/http.(*Handler).ServeHTTP(0xc00020c780, 0xe26100, 0xcc4b37c000, 0xcddb1d8100)
/home/ubuntu/go/src/github.com/pilosa/pilosa/http/handler.go:294 +0xde
net/http.serverHandler.ServeHTTP(0xc00020b380, 0xe26100, 0xcc4b37c000, 0xcddb1d8100)
/usr/local/go/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc000214000, 0xe26d80, 0xc82f1b3540)
/usr/local/go/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2851 +0x2f5
'"}
Add in support for importing bits with time quantums to Indexer
The Go client supports importing using different strategies. It may be useful to have an option to specify that
currently we have a number of PDK commands for doing ingest, each of which tends to be based around a Source. Each one is pretty much the same with the exception of the Source and options on the source. We should try to unify them somehow so that they can share logic for setting up parser, mapper, pilosa, signal handling, etc.
We're seeing slightly different in Pilosa between imports of the taxi data. Errors have been observed while while downloading the CSV files from S3. e.g. 2017/07/06 02:54:19 scan error on https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2009-10.csv, err: unexpected EOF
In order to make imports repeatable and compare results between different versions of Pilosa, we should ensure that each CSV is downloaded completely before beginning to import it.
Here is the error message when I do go get github.com/pilosa/pdk
.
I am running go version go1.9.1 darwin/amd64
.
../pilosa/holder.go:443:18: multiple-value uuid.NewV4() in single-value context
When providing a gopilosa.ImportStatusUpdate
to pdk.SetupPilosa()
via:
statusChan := make(chan gopilosa.ImportStatusUpdate, 1000)
indexer, err := pdk.SetupPilosa(m.PilosaHosts, m.Index, frames,
gopilosa.OptImportStatusChannel(statusChan))
and when multiple frames are passed in to frames
, then two problems occur:
I'm still not sure if it's best to handle this in PDK or go-pilosa, but it seems we need a way to provide multiple channels (one per frame) for status updates. An alternative would be to modify go-pilosa such that the single status channel is aware of index/frame, and therefore supports multiple frames.
This needs more investigation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.