Giter Club home page Giter Club logo

elastic_record's Introduction

ElasticRecord

Build Status Code Climate

ElasticRecord is an Elasticsearch 7.x ORM.

Setup

Include ElasticRecord into your model:

class Product < ActiveRecord::Base
  include ElasticRecord::Model
end

Connection

There are two ways to set up which server to connect to:

# config/initializers/elastic_search.rb
ElasticRecord.configure do |config|
  config.servers = "es1.example.com:9200"
end
# config/elasticsearch.yml:
development:
  servers: es1.example.com:9200
  timeout: 10
  retries: 2

Search API

ElasticRecord adds the method 'elastic_search' to your models. It works similar to active_record scoping:

search = Product.elastic_search

Filtering

If a simple hash is passed into filter, a term or terms query is created:

search.filter(color: 'red')         # Creates a 'term' filter
search.filter(color: %w(red blue))  # Creates a 'terms' filter
search.filter(color: nil)           # Creates a 'must not exist' filter

If a hash containing hashes is passed into filter, it is used directly as a filter DSL expression:

search.filter(prefix: { name: "Sca" }) # Creates a prefix filter

An Arelastic object can also be passed in, working similarily to Arel:

# Name starts with 'Sca'
search.filter(Product.arelastic[:name].prefix("Sca"))

# Name does not start with 'Sca'
search.filter(Product.arelastic[:name].prefix("Sca").negate)

# Size is greater than 5
search.filter(Product.arelastic[:size].gt(5))

Helpful Arel builders can be found at https://github.com/matthuhiggins/arelastic/blob/master/lib/arelastic/builders/queries.rb.

Querying

To create a query string, pass a string to search.query:

search.query("red AND fun*") # Creates {query_string: {"red AND fun*"}}

Complex queries are done using either a hash or an arelastic object:

search.query(match: {description: "amazing"})

Ordering

search.order(:price)          # sort by price
search.order(:color, :price)  # sort by color, then price
search.order(price: :desc)    # sort by price in descending order

Offsets and Limits

To change the 'size' and 'from' values of a query, use offset and limit:

search.limit(40).offset(80)   # Creates a query with {size: 40, from: 80}

Aggregations

Aggregations are added with the aggregate method:

search.aggregate('popular_colors' => {'terms' => {'field' => 'color'}})

Results are retrieved at query time within aggregations:

search = search.aggregate('popular_colors' => {'terms' => {'field' => 'color'}})
search.aggregations['popular_colors'].buckets

Getting Results

A search object behaves similar to an active_record scope, implementing a few methods of its own and delegating the rest to Array, and your class.

search.count        # Return the number of search results
search.first        # Limit results to 1 and return the first result or nil
search.find(id)     # Add an ids filter to the existing query
search.as_elastic   # Return the json hash that will be sent to elastic search.

The search object behaves like an array when necessary:

search.each do |product|
  ...
end

Class methods can be executed within scopes:

class Product
  def self.increase_prices
    all.each do { |product| product.increment(:price, 10) }
  end
end

# Increase the price of all red products by $10.
Product.filter(color: 'red').increase_prices

Percolators

ElasticRecord supports representing query documents as a model. Queries are registered and unregistered as query models are created and destroyed.

First, include ElasticRecord::PercolatorModel into your model. Specify the target model to percolate and how the model should be indexed as an ElasticSearch query.

class ProductQuery
  include ElasticRecord::PercolatorModel

  self.percolates_model = Product

  def as_search_document(**)
    Product.filter(status: status).as_elastic
  end
end

Use the percolate method to find records with queries that match.

  product = Product.new(price: 5.99)
  matching_product_queries = ProductQuery.percolate(product)

Index Configuration

To avoid elasticsearch dynamically mapping fields, you can directly configure elastic_index.mapping and elastic_index.settings:

class Product
  include ElasticRecord::Model

  elastic_index.mapping = {
    properties: {
      name: {type: "text"},
      status: {type: "keyword"}
    }
  }
end

Inheritance

When one model inherits from another, ElasticRecord makes some assumptions about how the child index should be configured. By default:

  • alias_name - Same as parent
  • mapping - Same as parent
  • settings - Same as parent

These can all be overridden. For instance, it might be desirable for the child documents to be in a separate index.

Join fields

ElasticSearch supports declaring a join field that specifies a parent-child relationship between documents of different types in the same index (docs). ElasticRecord provides a short-(but not-so-short)-cut for declaring the mapping:

class State
  include ElasticRecord::Model
end

class City
  include ElasticRecord::Model
end

class PostalCode
  include ElasticRecord::Model
end

class Country
  include ElasticRecord::Model

  has_es_children(
    join_field: 'pick_a_name_for_the_join_field',
    children:   [
      State,
      { zip: PostalCode },
      ElasticRecord::Model::Joining::JoinChild.new(klass: City, parent_id_accessor: :country_code)
    ]
  )
end

has_es_children accepts an optional name argument, with a sane default. In the above example, it would default to country. The name can later be used to construct has_parent queries. ElasticRecord will define a getter method with the same name as the value provided to join_field on both the parent and all children (and grandchildren).

The children argument can be:

  • a Class that includes ElasticRecord::Model
  • a Hash (with names as keys and classes as values)
  • an instance of ::ElasticRecord::Model::Joining::JoinChild.new (when you need to override any of the options below besides name)
  • or an Array containing any combination thereof.

::ElasticRecord::Model::Joining::JoinChild.new accepts additional, optional arguments:

  • name: defaults to the snake case version of the value provided to klass (e.g. state in the example above). Can be used to construct has_child queries.
  • children: Another instance of ::ElasticRecord::Model::Joining::JoinChild or an Array of instances. Defaults to an empty Array. Theoretically, an arbitrary number of layers of parent-child joins can be achieved this way.
  • parent_id_accessor: Determines how the ID of the parent is retrieved. Can be a proc, which will be executed in the context of the child object, or a symbol corresponding to the name of a method defined on the child object. In the above example, it would default to country_id.
  • parent_accessor: Determines how the parent is retrieved. Can be a proc, which will be executed in the context of the child object, or a symbol corresponding to the name of a method defined on the child object. In the above example, it would default to country. The is used to retrieve routing for multi-layered parent-child joins.

Note: Creating, deleting and updating mapping on the index must be handled via the Top-Level parent. In the above example, running rake index:create CLASS=State would have no effect.

Load Documents from Source

To fetch documents without an additional request to a backing ActiveRecord database you can load the documents from _source.

Product.elastic_index.loading_from_source do
  Product.elastic_search.filter(name: "Pizza")
end

Call load_from_source! to configure an index without ActiveRecord. Finder methods will be delegated to the ElasticRecord module.

class Product
  include ActiveModel::Model
  include ElasticRecord::Record
  elastic_index.load_from_source!
end

Index Management

If you need to manage multiple indexes via the rake tasks, you will need to declare them explicitly:

ElasticRecord.configure do |config|
  config.model_names = %w(Product Order Location)
end

Create the index:

rake index:create CLASS=Product

Index Admin Functions

Core and Index APIs can be accessed with Product.elastic_index. Some examples include:

Product.elastic_index.create_and_deploy  # Create a new index
Product.elastic_index.reset              # Delete related indexes and deploy a new one
Product.elastic_index.refresh            # Call the refresh API
Product.elastic_index.get_mapping        # Get the index mapping defined by elastic search
Product.elastic_index.update_mapping     # Update the elastic search mapping of the current index

Development

# Setup the database
$ cp test/dummy/.env.example test/dummy/.env
$ bundle exec rake app:db:prepare app:index:reset
$ bundle exec rake app:db:prepare RAILS_ENV=test

# Run tests
$ bundle exec rake

elastic_record's People

Contributors

alexmooney avatar andrek-data-axle avatar aschlick avatar austenmadden avatar betesh-at-data-axle avatar bullfight avatar dcromer avatar ebarendt avatar j1wilmot avatar johnkeith avatar kadwanev avatar kstevens715 avatar kunruh9 avatar malept avatar matthinea avatar matthuhiggins avatar monde avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elastic_record's Issues

Support :retry connection option

A failed connection does not attempt a retry on different server. The following should be supported:

ElasticRecord::Connection.new(servers, retries: 3)

Mapping diff status

  • Deep diff mapping vs get_mapping
  • Give a clear ouput of adds, updates, deletes

Bulk add fails to report errors

Error format for bulk_add:

Model.elastic_index.bulk_add [Model.find(1), Model.find(2)], 'models_index'

  • where 1 is successful and 2 is a failure

{"took"=>10, "items"=>[{"index"=>{"_index"=>"models_index", "_type"=>"model", "_id"=>"1", "_version"=>2, "ok"=>true}}, {"index"=>{"_index"=>"models_index", "_type"=>"model", "_id"=>"2", "error"=>"MapperParsingException[failed to parse [poops]]; nested: ElasticSearchIllegalArgumentException[unknown property [skid_mark]]; "}}]}

MySQL specific function used

https://github.com/data-axle/elastic_record/blob/master/lib/elastic_record/relation.rb#L77

INTERNAL ERROR!!! PG::UndefinedFunction: ERROR:  function field(integer, integer, integer) does not exist
LINE 1: ...icles"  WHERE "articles"."id" IN (2, 1)  ORDER BY FIELD("id"...
                            ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
: SELECT "articles".* FROM "articles"  WHERE "articles"."id" IN (2, 1)  ORDER BY FIELD("id", 2,1)

Cannot create PercolatorModel

I've been trying to wrap my head around how ElasticRecord handles Percolate queries. When working directly with the Elasticsearch API, it makes sense. But in its current state, I don't see how ElasticRecord utilizes this.

The issue I'm seeing is that a percolator query requires documents in a given index to contain a percolator type field (i.e. "query": { "type": "percolator" }). Then, one can execute a percolator query with a given document.

However, I don't see how PercolatorModel can do this unless the model it is percolating has already been index with the query value explicitly defined, like so:

Document.create({
    content: 'String of words',
    user: 'johndoe',
    query: { content: 'words' }
})

Only then can I see something like this working:

DocumentQuery.percolate({
    content: 'String of words'
})

The README indicates that "Queries are registered and unregistered as query models are created and destroyed". However, PercolatorModel does not respond to any methods like create.

To me, it seems like there needs to be some code added to get Percolation working correctly for this library. I was planning on making these fixes since I'm working in the library right now, but wanted to make sure I'm not missing something here with its intended usage before I started down that path. Am I using the PercolatorModel incorrectly?

Log notifications tripping string formatting

This is no longer bearable.

E, [2013-06-06T13:43:39.059383 #89009] ERROR -- : Could not log "request.elastic_record" event. ArgumentError: malformed format string - %D ["/Users/nevillek/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/bundler/gems/elastic_record-ab67817617d4/lib/elastic_record/log_subscriber.rb:26:in `%'", "/Users/nevillek/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/bundler/gems/elastic_record-ab67817617d4/lib/elastic_record/log_subscriber.rb:26:in `request'", "/Users/nevillek/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/activesupport-4.0.0.beta1/lib/active_support/log_subscriber.rb:113:in `finish'", 

`filter` and `where` do not work together

Summary of problem

We are able to chain the filter method and the where method on elastic_relation, however if the where method actually removes any records this query fails.

Sample test that fails on master

def test_where_with_filter
    Widget.create! name: '747', color: 'red'
    Widget.create! name: 'A220', color: 'green'
    widget = Widget.create! name: 'A220', color: 'red'

    widgets = Widget.elastic_relation.filter(color: 'red').where(name: 'A220')
    widgets = widgets.to_a

    assert_equal 1, widgets.count
    assert_equal widget, widgets.first
end

Error on from above test:

Error:
ElasticRecord::Relation::SearchMethodsTest#test_where_with_filter:
ActiveRecord::RecordNotFound: Couldn't find all Widgets with 'id': (1, 3) [WHERE "widgets"."name" = $1] (found 1 results, but was looking for 2).
    /Users/Ben/.rvm/gems/ruby-2.7.0/gems/activerecord-5.2.4.2/lib/active_record/relation/finder_methods.rb:351:in `raise_record_not_found_exception!'

Why does this happen?

This error is raised because we apply the 'filter' by chaining the ActiveRecord find method onto our relation here:
https://github.com/data-axle/elastic_record/blob/master/lib/elastic_record/relation.rb#L57
The find method throws an error if you pass in more IDs than what the relation is able to find. In other words if the where clause ever does filter something that the filter didn't filter the above error will be thrown.
Find docs explaining the error in more detail: https://api.rubyonrails.org/classes/ActiveRecord/FinderMethods.html#method-i-find

Reason this is important

Without being able to chain where clause the join feature isn't all that useful. For example we can't do this:
relation.filter(color: 'red').joins(:warehouse).where(warehouses: {name: 'Boeing'})

Possible solution

We can replace the klass.find search_hits.to_ids with klass.where(klass.primary_key => search_hits.to_ids).to_a so that the error isn't thrown.
However, there are a few problems with this solution.

  1. Find returns the relation in the order that the ids were given and where does not.
  2. Most of the methods in Relation::FinderMethods assume that only Elastic Search is filtering the records and those methods would need to be adjusted.
  3. There might be a better way to add the extra filtering to the AR scope so that things to to_sql actually return the real query that will be run.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.