Giter Club home page Giter Club logo

Comments (19)

michaelklishin avatar michaelklishin commented on June 26, 2024

Desired API

We should try hard to support the exact API our REST client has. It has two key benefits:

  • Switching between two clients will be transparent
  • The Bulk API is notoriously hard to get right in languages such as Clojure, that use core data structures (collections, maps) instead of designated classes.

The REST API already has a decent way of representing bulk requests in JSON, which maps ideally to Clojure
data structures. Lets just use that.

Implementation Details

This is for @dsabanin, who's expressed interest in implementing this feature.

Every operation in the native client has a few functions beyond what is exposed in the public API. Namely,
clojurewerkz.elastisch.native includes functions that mirror org.elasticsearch.client.Client

Specifically for this feature, we need a function that will delegate to Client#bulk.

Then there are conversion functions to bulk request and from bulk response in clojurewerkz.elastisch.native.conversion. Finally, there is a function (or several) that will be our public API for bulk requests. It glues everything together.

Since BulkRequest in the ElasticSearch Java client is actually a collection, we'll have to implement conversion
from Bulk API maps to specific request classes. Largely this is what the functions in clojurewerkz.elastisch.native.conversion already do but they don't take just a single map as inputs. One
good way around this is to add a polymorphic function that takes a map and builds the corresponding request
action from it, using the rest of the conversion functions as needed. Sounds like a job for a multimethod.

The public API should be in clojurewerkz.elastisch.native.bulk, mirroring clojurewerkz.elastisch.rest.bulk.

Because conversion functions are annoying to develop, I highly recommend doing it all in the REPL first, then
adding a few tests for the public API.

ElasticSearch Version

Elastisch 1.1's native client targets ElasticSearch 0.90.0-beta1. Because binary format used by the client and
ES nodes changes over time, using mismatching client/server versions will result in obscure I/O exceptions
(typically saying that a request couldn't be read fully).

from elastisch.

dsabanin avatar dsabanin commented on June 26, 2024

Thanks for a really comprehensive description of the task. I'm going to give it a try this weekend.

from elastisch.

dsabanin avatar dsabanin commented on June 26, 2024

It looks like I won't get a chance to work on it. Our initial requirements changed and we moved in another direction, ended up not using native bulk API. Sorry about wasting your time on the instruction, although I'm sure some future implementor will find them very useful.

from elastisch.

michaelklishin avatar michaelklishin commented on June 26, 2024

That's ok. I'd be happy to hear what you ended up doing as an alternative (if it involves Elastisch at all).

from elastisch.

dsabanin avatar dsabanin commented on June 26, 2024

The problem I was trying to solve is initial indexing of a big amount of data from the DB. This was a one time job so I decided to go with a C program fetching data from DB and generating a limited subset of JSON and then importing it to ES with curl through bulk API.

We are still using Elastisch for real-time indexing -- it's a great piece of software. Too bad I couldn't contribute back.

from elastisch.

mitchelkuijpers avatar mitchelkuijpers commented on June 26, 2024

Just wanted to let you know I started working on this, and I am making some good progress :) Will try to have something ready by the end of this weekend.

from elastisch.

michaelklishin avatar michaelklishin commented on June 26, 2024

@mitchelkuijpers sounds good, thank you.

from elastisch.

mitchelkuijpers avatar mitchelkuijpers commented on June 26, 2024

I have a first version working i currently give this structure back after a bulk operation:

{:took 8,
 :has-failures? false,
 :items
   [{:index "people",
     :type "person",
     :id "AUuxP_sQUECUznwbZ-36",
     :version 1,
     :op-type "create",
     :failed? false}
    {:index "people",
     :type "person",
     :id "AUuxP_sQUECUznwbZ-37",
     :version 1,
     :op-type "create",
     :failed? false}]}

I case of failure the failed? property will be true and a failure-message property will be added. I notice the rest api gives properties back like :_index and :_type. Do you think I should add those @michaelklishin ?

from elastisch.

michaelklishin avatar michaelklishin commented on June 26, 2024

@mitchelkuijpers yes, as aliases for :index and:type`, of course. We do this in other places. Better be compatible than sorry :)

from elastisch.

mitchelkuijpers avatar mitchelkuijpers commented on June 26, 2024

@michaelklishin Added, and also :_index and :_version.

The only thing that I think we should not implement is: bulk-with-index and bulk-with-index-and-type. This is kind of hairy to implement and this would mean I cannot reuse the ->delete-request and the index->request functions from conversion.clj. What are your thoughts on this?

Another idea could be to reuse the bulk-index and the bulk-delete functions from rest/bulk and just parse that structure when users call bulk or the other two functions.

from elastisch.

mitchelkuijpers avatar mitchelkuijpers commented on June 26, 2024

After thinking about it some more I am leaning towards the option of reusing the bulk-index and bulk-delete functions from rest/bulk this way you can just switch the native and rest bulk, bulk-with-index and bulk-with-index-and-type calls without changing anything :)

from elastisch.

michaelklishin avatar michaelklishin commented on June 26, 2024

That's fine. I find those functions a bit odd in the REST API. 100% compatibility is not the goal. If we can keep it at 95%, that's already huge for our users.

from elastisch.

michaelklishin avatar michaelklishin commented on June 26, 2024

@mitchelkuijpers so, native would delegate to REST? This should be made very clear in the docs because the two use different ports and some installations may be locked down to only one protocol at the firewall.

from elastisch.

mitchelkuijpers avatar mitchelkuijpers commented on June 26, 2024

@michaelklishin No Native would only re-use these functions from rest:

(def ^:private special-operation-keys
  [:_index :_type :_id :_retry_on_conflict :_routing :_percolate :_parent :_timestamp :_ttl])

(defn index-operation
  [doc]
  {"index" (select-keys doc special-operation-keys)})

(defn delete-operation
  [doc]
  {"delete" (select-keys doc special-operation-keys)})

(defn bulk-index
  "generates the content for a bulk insert operation"
  ([documents]
     (let [operations (map index-operation documents)
           documents  (map #(apply dissoc % special-operation-keys) documents)]
       (interleave operations documents))))

(defn bulk-delete
  "generates the content for a bulk delete operation"
  ([documents]
     (let [operations (map delete-operation documents)]
       operations)))

And then if you call bulk or the other two bulk functions (those are easy to implement this way) this will be transformed to either and IndexRequest or an DeleteRequest and we can simply call the native api :)

from elastisch.

michaelklishin avatar michaelklishin commented on June 26, 2024

@mitchelkuijpers ah, perfect. Feel free to extract them into a separate internal namespace, e.g. elastisch.common.bulk.

from elastisch.

mitchelkuijpers avatar mitchelkuijpers commented on June 26, 2024

@michaelklishin 👍

from elastisch.

michaelklishin avatar michaelklishin commented on June 26, 2024

@mitchelkuijpers any other way I can help you? I'd be very excited to see this feature added :)

from elastisch.

mitchelkuijpers avatar mitchelkuijpers commented on June 26, 2024

I'll try to have a pull request ready tomorrow, had a busy week ^^

from elastisch.

michaelklishin avatar michaelklishin commented on June 26, 2024

Contributed by @mitchelkuijpers in #144.

from elastisch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.