We need to support bulk operations in the native client with the same Clojure API we<b

Thanks for a really comprehensive deion of the task. I'm going to give it a try

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Support bulk operations in the native client about elastisch HOT 19 CLOSED

clojurewerkz commented on June 26, 2024

Support bulk operations in the native client

from elastisch.

Comments (19)

michaelklishin commented on June 26, 2024

Desired API

We should try hard to support the exact API our REST client has. It has two key benefits:

Switching between two clients will be transparent
The Bulk API is notoriously hard to get right in languages such as Clojure, that use core data structures (collections, maps) instead of designated classes.

The REST API already has a decent way of representing bulk requests in JSON, which maps ideally to Clojure
data structures. Lets just use that.

Implementation Details

This is for @dsabanin, who's expressed interest in implementing this feature.

Every operation in the native client has a few functions beyond what is exposed in the public API. Namely,
clojurewerkz.elastisch.native includes functions that mirror org.elasticsearch.client.Client

Specifically for this feature, we need a function that will delegate to Client#bulk.

Then there are conversion functions to bulk request and from bulk response in clojurewerkz.elastisch.native.conversion. Finally, there is a function (or several) that will be our public API for bulk requests. It glues everything together.

Since BulkRequest in the ElasticSearch Java client is actually a collection, we'll have to implement conversion
from Bulk API maps to specific request classes. Largely this is what the functions in clojurewerkz.elastisch.native.conversion already do but they don't take just a single map as inputs. One
good way around this is to add a polymorphic function that takes a map and builds the corresponding request
action from it, using the rest of the conversion functions as needed. Sounds like a job for a multimethod.

The public API should be in clojurewerkz.elastisch.native.bulk, mirroring clojurewerkz.elastisch.rest.bulk.

Because conversion functions are annoying to develop, I highly recommend doing it all in the REPL first, then
adding a few tests for the public API.

ElasticSearch Version

Elastisch 1.1's native client targets ElasticSearch 0.90.0-beta1. Because binary format used by the client and
ES nodes changes over time, using mismatching client/server versions will result in obscure I/O exceptions
(typically saying that a request couldn't be read fully).

from elastisch.

dsabanin commented on June 26, 2024

Thanks for a really comprehensive description of the task. I'm going to give it a try this weekend.

from elastisch.

dsabanin commented on June 26, 2024

It looks like I won't get a chance to work on it. Our initial requirements changed and we moved in another direction, ended up not using native bulk API. Sorry about wasting your time on the instruction, although I'm sure some future implementor will find them very useful.

from elastisch.

michaelklishin commented on June 26, 2024

That's ok. I'd be happy to hear what you ended up doing as an alternative (if it involves Elastisch at all).

from elastisch.

dsabanin commented on June 26, 2024

The problem I was trying to solve is initial indexing of a big amount of data from the DB. This was a one time job so I decided to go with a C program fetching data from DB and generating a limited subset of JSON and then importing it to ES with curl through bulk API.

We are still using Elastisch for real-time indexing -- it's a great piece of software. Too bad I couldn't contribute back.

from elastisch.

mitchelkuijpers commented on June 26, 2024

Just wanted to let you know I started working on this, and I am making some good progress :) Will try to have something ready by the end of this weekend.

from elastisch.

michaelklishin commented on June 26, 2024

@mitchelkuijpers sounds good, thank you.

from elastisch.

mitchelkuijpers commented on June 26, 2024

I have a first version working i currently give this structure back after a bulk operation:

{:took 8,
 :has-failures? false,
 :items
   [{:index "people",
     :type "person",
     :id "AUuxP_sQUECUznwbZ-36",
     :version 1,
     :op-type "create",
     :failed? false}
    {:index "people",
     :type "person",
     :id "AUuxP_sQUECUznwbZ-37",
     :version 1,
     :op-type "create",
     :failed? false}]}

I case of failure the failed? property will be true and a failure-message property will be added. I notice the rest api gives properties back like :_index and :_type. Do you think I should add those @michaelklishin ?

from elastisch.

michaelklishin commented on June 26, 2024

@mitchelkuijpers yes, as aliases for :index and:type`, of course. We do this in other places. Better be compatible than sorry :)

from elastisch.

mitchelkuijpers commented on June 26, 2024

@michaelklishin Added, and also :_index and :_version.

The only thing that I think we should not implement is: bulk-with-index and bulk-with-index-and-type. This is kind of hairy to implement and this would mean I cannot reuse the ->delete-request and the index->request functions from conversion.clj. What are your thoughts on this?

Another idea could be to reuse the bulk-index and the bulk-delete functions from rest/bulk and just parse that structure when users call bulk or the other two functions.

from elastisch.

mitchelkuijpers commented on June 26, 2024

After thinking about it some more I am leaning towards the option of reusing the bulk-index and bulk-delete functions from rest/bulk this way you can just switch the native and rest bulk, bulk-with-index and bulk-with-index-and-type calls without changing anything :)

from elastisch.

michaelklishin commented on June 26, 2024

That's fine. I find those functions a bit odd in the REST API. 100% compatibility is not the goal. If we can keep it at 95%, that's already huge for our users.

from elastisch.

michaelklishin commented on June 26, 2024

@mitchelkuijpers so, native would delegate to REST? This should be made very clear in the docs because the two use different ports and some installations may be locked down to only one protocol at the firewall.

from elastisch.

mitchelkuijpers commented on June 26, 2024

@michaelklishin No Native would only re-use these functions from rest:

(def ^:private special-operation-keys
  [:_index :_type :_id :_retry_on_conflict :_routing :_percolate :_parent :_timestamp :_ttl])

(defn index-operation
  [doc]
  {"index" (select-keys doc special-operation-keys)})

(defn delete-operation
  [doc]
  {"delete" (select-keys doc special-operation-keys)})

(defn bulk-index
  "generates the content for a bulk insert operation"
  ([documents]
     (let [operations (map index-operation documents)
           documents  (map #(apply dissoc % special-operation-keys) documents)]
       (interleave operations documents))))

(defn bulk-delete
  "generates the content for a bulk delete operation"
  ([documents]
     (let [operations (map delete-operation documents)]
       operations)))

And then if you call bulk or the other two bulk functions (those are easy to implement this way) this will be transformed to either and IndexRequest or an DeleteRequest and we can simply call the native api :)

from elastisch.

michaelklishin commented on June 26, 2024

@mitchelkuijpers ah, perfect. Feel free to extract them into a separate internal namespace, e.g. elastisch.common.bulk.

from elastisch.

mitchelkuijpers commented on June 26, 2024

@michaelklishin 👍

from elastisch.

michaelklishin commented on June 26, 2024

@mitchelkuijpers any other way I can help you? I'd be very excited to see this feature added :)

from elastisch.

mitchelkuijpers commented on June 26, 2024

I'll try to have a pull request ready tomorrow, had a busy week ^^

from elastisch.

michaelklishin commented on June 26, 2024

Contributed by @mitchelkuijpers in #144.

from elastisch.

Support bulk operations in the native client about elastisch HOT 19 CLOSED

Comments (19)

Desired API

Implementation Details

ElasticSearch Version

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent