Giter Club home page Giter Club logo

blast's Introduction

blast's People

Contributors

mosuka avatar pablocastellano avatar radutopala avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blast's Issues

Make the API easier to work with

Have you used https://github.com/googleapis/gnostic ?

It would make lots of the code you wrote redundant and make it much easier to extend.
Also much less bugs.

To explain.
From GPRC it can gen all of your openAPI based REST !
OpenAPI is the main way to do REST. Swagger is dead.

It can also do the opposite amazingly. OPENAPI to GRPC.
But i think using GRPC as the source of truth is best for Blast.

Have a think about it.
I can help if your interested...

How to search documents with id prefix?

I have two kind of documents, A and B, I index them in bulk with different document id prefix. basically like:

{"fields":{"code":"xxx","name":"xxx"},"id":"path/A/10"}
{"fields":{"code":"xxx","name":"xxx"},"id":"path/A/11"}
{"fields":{"code":"xxx","name":"xxx"},"id":"path/B/20"}
{"fields":{"code":"xxx","name":"xxx"},"id":"path/B/21"}

Now I only want to search documents with id prefix "path/A/". How to do this?

Indexing content / faceted search

Hey,

Hope you are all well !

I wanted to extend the following golang project, https://github.com/hoop33/limo, using also bleve, with blast. Actually, I forked and updated limo in order to automatically fetch and add the repo topics in order to pre-fill limo's tags index.

But I would like to create a web-ui and create some facets to explore my starred repository. So after a couple of searches, I found your project wrapping the bleve package.

Can you provide an example or more explanation about how to index content ?

Questions:

Thanks in advance.

Cheers,
Richard

segfault when port 8080 is already bound

Hello,

Thanks again for your last bugfix. Here is another issue:

./bin/blast-index start --node-id=index1 --data-dir=/tmp/blast/index1 --bind-addr=:6060 --grpc-addr=:5050 --http-addr=:8080 --index-mapping-file ./example/index_mapping.json
2019/03/14 16:26:27.068886 github.com/mosuka/blast/index/server.go:131 [ERR] listen tcp :8080: bind: address already in use
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0xb51108]

goroutine 39 [running]:
github.com/mosuka/blast/index.(*Server).Start(0x0)
/home/rbastic/go-src/src/github.com/mosuka/blast/index/server.go:147 +0x68
created by main.execStart
/home/rbastic/go-src/src/github.com/mosuka/blast/cmd/blast-index/start.go:78 +0x802

mac build not up to date

Doing the full mac build, i think it yells because the framework files were updated on the latest MacOS.
I had exactly the same problem using opengl from golang also.

It does build and does run.

The source of the bug is here: golang/go#26073

x-MacBook-Pro:blast apple$ make build
## mac with all ext
cd /Users/apple/workspace/go/src/github.com/mosuka/blast &&  GOOS=darwin \
    CGO_LDFLAGS="-L/usr/local/opt/icu4c/lib -L/usr/local/opt/rocksdb/lib -lrocksdb -lstdc++ -lm -lz -lbz2 -lsnappy -llz4 -lzstd" \
    CGO_CFLAGS="-I/usr/local/opt/icu4c/include -I/usr/local/opt/rocksdb/include" \
    CGO_ENABLED=1 \
    BUILD_TAGS="full" \
    make build
>> building binaries
   VERSION     = 0.4.0
   GOOS        = darwin
   GOARCH      = amd64
   CGO_ENABLED = 1
   CGO_CFLAGS  = -I/usr/local/opt/icu4c/include -I/usr/local/opt/rocksdb/include
   CGO_LDFLAGS = -L/usr/local/opt/icu4c/lib -L/usr/local/opt/rocksdb/lib -lrocksdb -lstdc++ -lm -lz -lbz2 -lsnappy -llz4 -lzstd
   BUILD_TAGS  = full
./cmd/blast
# crypto/x509
ld: warning: text-based stub file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation.tbd and library file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//Security.framework/Security.tbd and library file /System/Library/Frameworks//Security.framework/Security are out of sync. Falling back to library file for linking.
./cmd/blastd
# github.com/mosuka/blast/cmd/blastd
ld: warning: text-based stub file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation.tbd and library file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//Security.framework/Security.tbd and library file /System/Library/Frameworks//Security.framework/Security are out of sync. Falling back to library file for linking.

An usable web client for blast just like kibana for elasticsearch?

For using blast easily, a web client for accessing blast seems to be a good way.

Is there an an usable web client for blast just like kibana for elasticsearch?

Or, is there an open source project to build a web client quickly for querying?

And I have created a simple go-web for querying term and want it to support more query method like numeric range, prefix, date and so on.

May you have some suggestion? thanks.

REST structure

So I want to load into Blast using my own go RESTful client code.

In your example

$ cat ./example/document_1.json | xargs -0 ./bin/blast put document --request

I see we are doing a PUT to 0.0.0.0:8000/rest/ID where ID is the ID of the document.

However, how is the JSON document sent in that PUT command? Is this a BODY only or is there an association that needs to take place. I have used the HTTPie package with

http -v -j PUT 0.0.0.0:8000/rest/1 @sodataset.json

but it comes back 400 bad request. If we are using HTTPie or curl what would a PUT command look like?

I'm trying to load 50K JSON-LD documents (schema.org) from Minio into Blast.

Thanks!
Doug

Management of federated cluster

What's the suggested plan for management of the federated cluster ?

Try to be DRY and use the standard tracing, logging, metrics solutions out there in golang land.
This enables a dev to bring blast up and have the insight without learning new tools.

Am wondering if we should build a web based dashboard to provide the ability for a developer ( rather than a SRE ) to play with blast but also to have a single management pane seperate from the normal SRE stuff

key issue

trying to get started with blast...

tried the command and got
"key path not found"

fils@xps:~/src/git/blast/bin$ cat ../doc1.json  | xargs -0 ./blast put document --request
Error: Key path not found
Usage:
  blast put document [flags]

Flags:
      --grpc-server-address string   Blast server to connect to using gRPC (default "0.0.0.0:5000")
      --dial-timeout int             dial timeout (default 5000)
      --request-timeout int          request timeout (default 5000)
      --id string                    document id
      --fields string                document fields
      --request string               request file
  -h, --help                         help for document

Global Flags:
      --output-format string   output format (default "json")
  -v, --version                show version number

with document

{
	"document": {
		"id": "1",
		"fields": {
			"name": "Bleve",
			"description": "Bleve is a full-text search and indexing library for Go.",
			"category": "Library",
			"popularity": 3.0,
			"release": "2014-04-18T00:00:00Z",
			"type": "document"
		}
	}
}

any ideas what I am doing wrong?

ignore fields for index

Hi,

in an attempt to reduce the index size, I preprocessed the data. However, when I will use the API, I want to get the human readable data back so I can show it to the user. Is there a way to exclude fields when building an index? I tried to use "x" for the preprocessed field and "_x" for the original text. Unfortunately, this increased the index by a lot so I believe the field starting with "_" was not excluded. Is there a way to do this? My only other idea is to build wrapper API which stores the text to an ID in a dictionary and then returns that. But that seems like it should already be supported.

guidance on generating index mapping?

Thanks for the previous help... I've come back to playing with this and have a question or two if you have time.

  1. I've generated a simple Dockerfile and docker-ized blast https://github.com/earthcubearchitecture-project418/garden/tree/master/newindex/Blast

  2. I've not included any of the config files in this container.. though I don't know yet what is default in the Go code... though I did go through it. I've using Bleve as well in my code at https://github.com/earthcubearchitecture-project418/gleaner but not with the sophistication you are. :)

  3. I've been playing with loading schema.org JSON-LD (type Dataset) into Blast and trying to search (docs at https://github.com/earthcubearchitecture-project418/garden/tree/master/newindex/Blast/examples ) where sodataset.json is a JSON-LD doc wrapper with the

{
    "id": "1",
    "fields": {

I think they need ??

My question is this:

If one wanted to leverage Blast for other JSON documents, what are the basic steps needed?
I was curious why my test failed since I thought that Bleve instance in Blast would simply use the

indexMapping := bleve.NewIndexMapping()

as default an give me simple default index of the JSON structure. My plan was to build out a more focused mapping from there. However, that doesn't even seem to work since when I load the document and search for exact matches of known words in the document I get nothing. I am wondering if it is trying to force my document into a mapping that it does not fit, resulting in no search results.

In the process

cat example/sodataset.json | xargs -0 ./bin/blast put document --request

./bin/blast get document --id 1

cat search_requestv2.json | xargs -0 ~/src/git/blast/bin/blast search --request > ../searchoutput.json

The first two work fine, I can load and retrieve the document. I am not able to structure a valid search with either of the search request documents inside https://github.com/earthcubearchitecture-project418/garden/tree/master/newindex/Blast/examples

Any guidance appreciated!

Thanks
Doug

Consensus Protocol Implement method?

I know from Readme, the cluster is built based on Raft consensus algorithm.

But when I try to use the cluster mode, when I kill the leader node, re-election didn't happened.

Does Blast support leader re-election?

And for consensus, The write operation(indexing, PUT) should only happen on Leader node. I use http indexing request to follower node when leader has been killed, it still works well, so I am a little confused. Can write operation work on followers?

If write operation can work on followers, when different write operations happen at the same time to different nodes, consensus and sequence may not be guaranteed

re-index / update an individual document

Sorry if this had been addressed but I could not find anywhere. It is not an issue but rather a feature. Saying I have a collection of documents in MongoDB and indexed using blast. Now I update an individual document in MongoDB database, how I can reindex this document with blast?

Is there anyway that I can retrieve indexed Id of this document so that I can delete and recreate new index? Or Are there any better solution?

More Query String Query Examples

Now, some query examples(simple and prefix) are given, but it seems to be not enough to get quick start for more complicated query logic.

Query String Query supports all kinds of query such as Phrases, Field Scoping, Boolean Query, Numeric Ranges, fuzzy search and so on.

If more examples are given, it seems to be more friendly to tiro.

Error getting document by ID over rest API

I get an error when issuing a very simple query over rest endpoint:

flaviostutz-Mac:fess flaviostutz$ curl -vv --location --request GET 'localhost:6000/v1/documents/a2b'
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 6000 (#0)
> GET /v1/documents/a2b HTTP/1.1
> Host: localhost:6000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/json
< Grpc-Metadata-Content-Type: application/grpc
< Grpc-Metadata-Content-Type: application/grpc
< Date: Fri, 17 Jan 2020 21:27:59 GMT
< Content-Length: 130
<
* Connection #0 to host localhost left intact
{"error":"unknown message type \"map[string]interface {}\"","code":2,"message":"unknown message type \"map[string]interface {}\""}* Closing connection 0

Seems like a Golang cast issue. I could not find the location this is handled on code. Anyone could help?

Custom header?

Hi,

I need to add a custom header for environment selection behind a load balancer. I noticed that I can’t append to the outgoing context of the grpc client, since it’s private.

I am using tag 0.7.1. Is adding custom headers implemented in higher versions?

Thanks

Possible to build a Blind index using blast?

We have security requirements that require the use of encryption at rest. We would like to be able to build an index of the unencrypted plain text, and then ship the index and the encrypted data files, without leaking the plain text. Is this something that is possible with blast/bleve?

Mongodb example

Looks like this project could be a pretty good alternative of Elastic.
I am considering of this project for an alternative of Elasticsearch which is super heavy memory consuming index search engine. This can be an alternative for a medium-scale or large scale project?
The problem is my main db is Mongodb. Should I extract json from Mongodb periodically and send back to Blast to build indexes?
What is the best option for my situation??
I need an example mongodb connector to communicate with Blast via GRPC to realtime build index like the elastic search doing

One more question, Is it a good idea to interact with Blast server from end-user clients.
My situation is I want to let users do search/filter items in the browser directly. how about grpc-web?(I know grpc-web project is immature) What about HTTP2 + json(Rest) ?

Feature idea: KVS pub sub

The KVS is a generic bucket store.
I was thinking of extending it to publish changes over GRPC.

Use Case ?
When you build Clients using this system, its very useful to know when anything has change and he nature of the change. Then as a subscriber i can update my many microservices or even gui client using this. It keeps them all in sync basically.

The event would be like:

  • namespace: the bucket namespace
  • eventtype: Create/Read/Update/Delete
  • data: protobuf or json.

Also should have the ability to turn off eventtype Read because no one normally needs that, but it can be useful for dynamically knowing who is reading where.

Implementation:
GRPC Middleware might be the perfect fit !
https://github.com/grpc-ecosystem/go-grpc-middleware
Also great for adding other things like security etc.

Because its GRPC, it will be easy to then recieve it and put it onto NATS message broker later also as a 2nd bit of work.

Index.
I thought also about the index but thinks its not worth the effort. What you can do is make each index query output to a cache into the KVS store. then you can use PUB SUB from that. Almost like a Materialized View with pub sub on top of it. And also gives you caching for free to a degree.

Failing docker containers with "No help topic for 'blastd'"

summary

The docker container seems to be broken. Instead of starting up properly they fail and exit.

steps to reproduce

docker pull mosuka/blast:latest
docker-compose up

expected result

A cluster of blast should be running and available over the network.

actual result

The terminal output is No help topic for 'blastd'for every node and the nodes are restarted. They exit with blast_blast1_1 exited with code 3.

Elasticsearch backward

Is there any chance to have elasticsearch backward.
Major probem with elastic search is system requirement that if blast has same api via elasticsearch many developers will be replace it.
32gb for one node 😢

GEO searches

Is it possible to use blast for Geo localized searches ? It is nowadays an important feature in many projects.
If yes would it be possible to include an example at one point ? ( mapping, and search request)

I have put a very basic example of Geo search with bleve there : https://github.com/hubyhuby/bleve-search-example/blob/444d18d810064302fef76693f86f07c533552897/main.go#L138

There are two main search I see :
*Geo box search around a point
*and Geo Sort search by distance to a point.

I understand it is a WIP, but I am really looking forward to this project.
Thanks for this nice project !

Possible to run as embedded service?

Is it possible to run blast as an embedded service into an existing application cluster? That is, if there is already a service running 2+ instances and it wants to add indexing, could it run a blast cluster using the existing service nodes? I'm thinking of this like the way Nats.io server can be embedded instead of having to run standalone gnatsd processes.

basic example

Any chance you can make a golang example that uses the GRPC API and the test data in the "examples" directory ?

It would make it much easier to get going and helping to fix things too.

Search with prefix

Hi,

I was wondering if I’m doing anything wrong, but other than doing a match query I can’t seem to perform a prefix query at all. According to the bleve documentation I should be able to do that by using “prefix”: “searchterm”. However it doesn’t yield any results. Do you have an example search query for prefix?

Thanks!

segfault on ubuntu

Hello,

I ran a fresh build off the latest master and then a segfault happened when I tried to startup blast.

rbastic@asgard:~/go-src/src/github.com/mosuka/blast$ ./bin/blastd data \
>     --raft-addr=127.0.0.1:10000 \
>     --grpc-addr=127.0.0.1:10001 \
>     --http-addr=127.0.0.1:10002 \
>     --raft-node-id=node1 \
>     --raft-dir=/tmp/blast/node1/raft \
>     --store-dir=/tmp/blast/node1/store \
>     --index-dir=/tmp/blast/node1/index \
>     --index-mapping-file=./etc/index_mapping.json

    ____   __              __ 
   / __ ) / /____ _ _____ / /_
  / __ \ / // __ '// ___// __/  The lightweight distributed
 / /_/ // // /_/ /(__  )/ /_    indexing and search server.
/_.___//_/ \__,_//____/ \__/    version 0.4.0

2019/03/14 02:04:44.592113 github.com/mosuka/blast/node/data/service/service.go:104 [ERR] no analyzer with name or type 'ja' registered
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x9b2706]

goroutine 1 [running]:
github.com/mosuka/blast/index.(*Index).Close(0x0, 0xd71c18, 0xc0001f96c8)
	/home/rbastic/go-src/src/github.com/mosuka/blast/index/index.go:175 +0x26
github.com/mosuka/blast/node/data/service.(*Service).Stop(0xc0000a70e0, 0x1, 0xc0001c1640)
	/home/rbastic/go-src/src/github.com/mosuka/blast/node/data/service/service.go:166 +0x33
main.data(0xc0000b14a0, 0xe56c20, 0xc0003a01c0)
	/home/rbastic/go-src/src/github.com/mosuka/blast/cmd/blastd/data.go:135 +0x1559
github.com/mosuka/blast/vendor/github.com/urfave/cli.HandleAction(0xc011c0, 0xd73868, 0xc0000b14a0, 0xc0000a6f00, 0x0)
	/home/rbastic/go-src/src/github.com/mosuka/blast/vendor/github.com/urfave/cli/app.go:490 +0xc8
github.com/mosuka/blast/vendor/github.com/urfave/cli.Command.Run(0xd3d58b, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0xd47611, 0x11, 0x0, ...)
	/home/rbastic/go-src/src/github.com/mosuka/blast/vendor/github.com/urfave/cli/command.go:210 +0x92f
github.com/mosuka/blast/vendor/github.com/urfave/cli.(*App).Run(0xc0000adba0, 0xc0000aa000, 0xa, 0xa, 0x0, 0x0)
	/home/rbastic/go-src/src/github.com/mosuka/blast/vendor/github.com/urfave/cli/app.go:255 +0x69b
main.main()
	/home/rbastic/go-src/src/github.com/mosuka/blast/cmd/blastd/main.go:51 +0x2b2

Looks like a very exciting project, curious to know if I might be building it wrong somehow (Ubuntu 18.10)

docker no http beyond root

I get 404 for any query except /

curl -X GET http://localhost:10002/
{
  "status": 200,
  "version": "v0.8.1"
}

failure

$ curl -X GET http://localhost:10002/v1/liveness_check 
404 page not found
  blast:
     image: mosuka/blast:v0.8.1
     ports:
       - 10000:10000
       - 10001:10001
       - 10002:10002
     ulimits:
       nofile:
         soft: "65536"
         hard: "65536"
#     env_file:
#       - ./example.env
     environment:
       - SERVICE_PORTS=10000, 10001, 10002
     volumes:
       - "/data/volumes/blast:/data"
#     networks:
#       - web
# 0.3.0
#     command: ["start", "--bind-addr=:10000", "--grpc-addr=:10001", "--http-addr=:10002", "--node-id=node1"]
# v0.8.1
     command: ["blast", "indexer","start","--data-dir=/data", "--node-address=:10000", "--grpc-address=:10001", "--http-address=:10002", "--node-id=node1"]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.