Comments (19)
Desired API
We should try hard to support the exact API our REST client has. It has two key benefits:
- Switching between two clients will be transparent
- The Bulk API is notoriously hard to get right in languages such as Clojure, that use core data structures (collections, maps) instead of designated classes.
The REST API already has a decent way of representing bulk requests in JSON, which maps ideally to Clojure
data structures. Lets just use that.
Implementation Details
This is for @dsabanin, who's expressed interest in implementing this feature.
Every operation in the native client has a few functions beyond what is exposed in the public API. Namely,
clojurewerkz.elastisch.native includes functions that mirror org.elasticsearch.client.Client
Specifically for this feature, we need a function that will delegate to Client#bulk.
Then there are conversion functions to bulk request and from bulk response in clojurewerkz.elastisch.native.conversion. Finally, there is a function (or several) that will be our public API for bulk requests. It glues everything together.
Since BulkRequest
in the ElasticSearch Java client is actually a collection, we'll have to implement conversion
from Bulk API maps to specific request classes. Largely this is what the functions in clojurewerkz.elastisch.native.conversion
already do but they don't take just a single map as inputs. One
good way around this is to add a polymorphic function that takes a map and builds the corresponding request
action from it, using the rest of the conversion functions as needed. Sounds like a job for a multimethod.
The public API should be in clojurewerkz.elastisch.native.bulk
, mirroring clojurewerkz.elastisch.rest.bulk
.
Because conversion functions are annoying to develop, I highly recommend doing it all in the REPL first, then
adding a few tests for the public API.
ElasticSearch Version
Elastisch 1.1's native client targets ElasticSearch 0.90.0-beta1. Because binary format used by the client and
ES nodes changes over time, using mismatching client/server versions will result in obscure I/O exceptions
(typically saying that a request couldn't be read fully).
from elastisch.
Thanks for a really comprehensive description of the task. I'm going to give it a try this weekend.
from elastisch.
It looks like I won't get a chance to work on it. Our initial requirements changed and we moved in another direction, ended up not using native bulk API. Sorry about wasting your time on the instruction, although I'm sure some future implementor will find them very useful.
from elastisch.
That's ok. I'd be happy to hear what you ended up doing as an alternative (if it involves Elastisch at all).
from elastisch.
The problem I was trying to solve is initial indexing of a big amount of data from the DB. This was a one time job so I decided to go with a C program fetching data from DB and generating a limited subset of JSON and then importing it to ES with curl through bulk API.
We are still using Elastisch for real-time indexing -- it's a great piece of software. Too bad I couldn't contribute back.
from elastisch.
Just wanted to let you know I started working on this, and I am making some good progress :) Will try to have something ready by the end of this weekend.
from elastisch.
@mitchelkuijpers sounds good, thank you.
from elastisch.
I have a first version working i currently give this structure back after a bulk operation:
{:took 8,
:has-failures? false,
:items
[{:index "people",
:type "person",
:id "AUuxP_sQUECUznwbZ-36",
:version 1,
:op-type "create",
:failed? false}
{:index "people",
:type "person",
:id "AUuxP_sQUECUznwbZ-37",
:version 1,
:op-type "create",
:failed? false}]}
I case of failure the failed? property will be true and a failure-message property will be added. I notice the rest api gives properties back like :_index and :_type. Do you think I should add those @michaelklishin ?
from elastisch.
@mitchelkuijpers yes, as aliases for :index and
:type`, of course. We do this in other places. Better be compatible than sorry :)
from elastisch.
@michaelklishin Added, and also :_index
and :_version
.
The only thing that I think we should not implement is: bulk-with-index
and bulk-with-index-and-type
. This is kind of hairy to implement and this would mean I cannot reuse the ->delete-request
and the index->request
functions from conversion.clj
. What are your thoughts on this?
Another idea could be to reuse the bulk-index
and the bulk-delete
functions from rest/bulk
and just parse that structure when users call bulk
or the other two functions.
from elastisch.
After thinking about it some more I am leaning towards the option of reusing the bulk-index and bulk-delete functions from rest/bulk
this way you can just switch the native and rest bulk
, bulk-with-index
and bulk-with-index-and-type
calls without changing anything :)
from elastisch.
That's fine. I find those functions a bit odd in the REST API. 100% compatibility is not the goal. If we can keep it at 95%, that's already huge for our users.
from elastisch.
@mitchelkuijpers so, native would delegate to REST? This should be made very clear in the docs because the two use different ports and some installations may be locked down to only one protocol at the firewall.
from elastisch.
@michaelklishin No Native would only re-use these functions from rest:
(def ^:private special-operation-keys
[:_index :_type :_id :_retry_on_conflict :_routing :_percolate :_parent :_timestamp :_ttl])
(defn index-operation
[doc]
{"index" (select-keys doc special-operation-keys)})
(defn delete-operation
[doc]
{"delete" (select-keys doc special-operation-keys)})
(defn bulk-index
"generates the content for a bulk insert operation"
([documents]
(let [operations (map index-operation documents)
documents (map #(apply dissoc % special-operation-keys) documents)]
(interleave operations documents))))
(defn bulk-delete
"generates the content for a bulk delete operation"
([documents]
(let [operations (map delete-operation documents)]
operations)))
And then if you call bulk or the other two bulk functions (those are easy to implement this way) this will be transformed to either and IndexRequest or an DeleteRequest and we can simply call the native api :)
from elastisch.
@mitchelkuijpers ah, perfect. Feel free to extract them into a separate internal namespace, e.g. elastisch.common.bulk
.
from elastisch.
from elastisch.
@mitchelkuijpers any other way I can help you? I'd be very excited to see this feature added :)
from elastisch.
I'll try to have a pull request ready tomorrow, had a busy week ^^
from elastisch.
Contributed by @mitchelkuijpers in #144.
from elastisch.
Related Issues (20)
- Enable the use of Elastic Shield HOT 9
- _version_type ignored in native bulk HOT 2
- Native get-mapping ignores index name in 2-arity version, returns all index mappings HOT 2
- doc link is dead HOT 1
- Node info endpoint HOT 4
- HTTP API nodes stats endpoint - unable to select specific groups of stats
- Support Elasticsearch 6.x (and 5.x to the extent it makes sense) HOT 17
- Delete by query is not working with HTTP (wrong url?) HOT 3
- Index creation failure in Elasticsearch 5.X HOT 1
- Async version of the REST API HOT 5
- Fix Delete By Query for Elasticsearch 5 HOT 4
- rest.index/refresh fails for Elasticsearch 5 HOT 5
- How can i change port number at which the queries are sent? HOT 1
- Update doc string for search HOT 1
- Cannot create a new index HOT 1
- HTTP client should be configured to use {:content-type :json} by default HOT 1
- Support clojure 1.10.0 HOT 6
- 3.0.1 release date? HOT 3
- Elasticsearch 7.x support? HOT 1
- Unable to create index
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elastisch.