elastic / elasticsearch-rs Goto Github PK

View Code? Open in Web Editor NEW

690.0 275.0 68.0 3.56 MB

Official Elasticsearch Rust Client

Home Page: https://www.elastic.co/guide/en/elasticsearch/client/rust-api/current/index.html

License: Apache License 2.0

Rust 99.60% Shell 0.40%

elasticsearch rust client elasticsearch-rs

elasticsearch-rs's Introduction

elasticsearch

Official Rust Client for Elasticsearch.

Full documentation is available at https://docs.rs/elasticsearch

The project is still very much a work in progress and in an alpha state; input and contributions welcome!

Compatibility

The Elasticsearch Rust client is forward compatible; meaning that the client supports communicating with greater minor versions of Elasticsearch. Elasticsearch language clients are also backwards compatible with lesser supported minor Elasticsearch versions.

Features

The following are a list of Cargo features that can be enabled or disabled:

native-tls (enabled by default): Enables TLS functionality provided by native-tls.
rustls-tls: Enables TLS functionality provided by rustls.
beta-apis: Enables beta APIs. Beta APIs are on track to become stable and permanent features. Use them with caution because it is possible that breaking changes are made to these APIs in a minor version.
experimental-apis: Enables experimental APIs. Experimental APIs are just that - an experiment. An experimental API might have breaking changes in any future version, or it might even be removed entirely. This feature also enables beta-apis.

Getting started

The client exposes all Elasticsearch APIs as associated functions, either on the root client, Elasticsearch, or on one of the namespaced clients, such as Cat, Indices, etc. The namespaced clients are based on the grouping of APIs within the Elasticsearch and X-Pack REST API specs from which much of the client is generated. All API functions are async only, and can be awaited.

Installing

Add elasticsearch crate and version to Cargo.toml. Choose the version that is compatible with the version of Elasticsearch you're using

[dependencies]
elasticsearch = "8.7.0-alpha.1"

The following optional dependencies may also be useful to create requests and read responses

serde = "~1"
serde_json = "~1"

Async support with tokio

The client uses reqwest to make HTTP calls, which internally uses the tokio runtime for async support. As such, you may require to take a dependency on tokio in order to use the client. For example, in Cargo.toml, you may need the following dependency,

tokio = { version = "*", features = ["full"] }

and to attribute async main function with #[tokio::main]

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // your code ...
    Ok(())
}

and attribute test functions with #[tokio::test]

#[tokio::test]
async fn my_test() -> Result<(), Box<dyn std::error::Error>> {
    // your code ...
    Ok(())
}

Create a client

Build a transport to make API requests to Elasticsearch using the TransportBuilder, which allows setting of proxies, authentication schemes, certificate validation, and other transport related settings.

To create a client to make API calls to Elasticsearch running on http://localhost:9200

use elasticsearch::Elasticsearch;

fn main() {
    let client = Elasticsearch::default();
}

Alternatively, you can create a client to make API calls against Elasticsearch running on a specific url

use elasticsearch::{
    Elasticsearch, Error,
    http::transport::Transport
};

fn main() -> Result<(), Error> {
    let transport = Transport::single_node("https://example.com")?;
    let client = Elasticsearch::new(transport);
    Ok(())
}

If you're running against an Elasticsearch deployment in Elastic Cloud, a client can be created using a Cloud ID and credentials retrieved from the Cloud web console

use elasticsearch::{
    auth::Credentials,
    Elasticsearch, Error,
    http::transport::Transport,
};

fn main() -> Result<(), Error> {
    let cloud_id = "cluster_name:Y2xvdWQtZW5kcG9pbnQuZXhhbXBsZSQzZGFkZjgyM2YwNTM4ODQ5N2VhNjg0MjM2ZDkxOGExYQ==";
    // can use other types of Credentials too, like Bearer or ApiKey
    let credentials = Credentials::Basic("<username>".into(), "<password>".into());
    let transport = Transport::cloud(cloud_id, credentials)?;
    let client = Elasticsearch::new(transport);
    Ok(())
}

More control over how a Transport is built can be achieved using TransportBuilder to build a transport, and passing it to Elasticsearch::new() create a new instance of Elasticsearch

use url::Url;
use elasticsearch::{
    Error, Elasticsearch,
    http::transport::{TransportBuilder,SingleNodeConnectionPool},
};

fn main() -> Result<(), Error> {
    let url = Url::parse("https://example.com")?;
    let conn_pool = SingleNodeConnectionPool::new(url);
    let transport = TransportBuilder::new(conn_pool).disable_proxy().build()?;
    let client = Elasticsearch::new(transport);
    Ok(())
}

Making API calls

The following will execute a POST request to /_search?allow_no_indices=true with a JSON body of {"query":{"match_all":{}}}

use elasticsearch::{Elasticsearch, Error, SearchParts};
use serde_json::{json, Value};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Elasticsearch::default();

    // make a search API call
    let search_response = client
        .search(SearchParts::None)
        .body(json!({
            "query": {
                "match_all": {}
            }
        }))
        .allow_no_indices(true)
        .send()
        .await?;

    // get the HTTP response status code
    let status_code = search_response.status_code();

    // read the response body. Consumes search_response
    let response_body = search_response.json::<Value>().await?;

    // read fields from the response body
    let took = response_body["took"].as_i64().unwrap();

    Ok(())
}

The client provides functions on each API builder struct for all query string parameters available for that API. APIs with multiple URI path variants, where some can contain parts parameters, are modelled as enums.

Elasticsearch also has an async send function on the root that allows sending an API call to an endpoint not represented as an API function, for example, experimental and beta APIs

use elasticsearch::{http::Method, Elasticsearch, Error, SearchParts};
use http::HeaderMap;
use serde_json::Value;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Elasticsearch::default();
    let body = b"{\"query\":{\"match_all\":{}}}";
    let response = client
        .send(
            Method::Post,
            SearchParts::Index(&["tweets"]).url().as_ref(),
            HeaderMap::new(),
            Option::<&Value>::None,
            Some(body.as_ref()),
            None,
        )
        .await?;
    Ok(())
}

License

This is free software, licensed under The Apache License Version 2.0..

elasticsearch-rs's People

Contributors

Stargazers

Watchers

elasticsearch-rs's Issues

[ENHANCEMENT] Implement SniffingConnectionPool

Is your feature request related to a problem? Please describe.
The client can work against an Elasticsearch cluster behind a single endpoint e.g. a single node cluster, a cluster behind a proxy/load balancer. It is also useful however to be able to seed a client with a collection of endpoints, where each endpoint connects to an Elasticsearch node. In doing so, it allows for future features such as retries/failover, executing requests on a node matching a predicate e.g. data nodes only, etc. Since Elasticsearch clusters can change in size, the client should be able to reseed itsefl with the nodes (endpoints) available within the cluster. This is commonly referred to across the other Elasticsearch clients as sniffing.

Describe the solution you'd like
A SniffingConnectionPool that implements ConnectionPool that can

be seeded with an initial collection of Url
returns the next Connection from the pool when next() is called. In the initial implementation, this can simply iterate over connections, but in future, can mark connections dead/alive, weight them by aliveness, etc. to determine the next connection to return
can periodically reseed the connections by sniffing the cluster

Additional context

An example implementation of SniffingConnectionPool from the .NET client and the behaviour

[ENHANCEMENT] deserialize to specific Hit type?

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[BUG] Failure in certificate validation

Describe the bug
Four tests related to certificate validation fail.

To Reproduce
Steps to reproduce the behavior:

cargo test

Expected behavior
test result ok. 7 passed.

Environment (please complete the following information):

Debian GNU/Linux 10 (buster)
rustc 1.42.0 (b8cedc004 2020-03-09)

Additional context
Add any other context about the problem here.

running 7 tests
test full_certificate_ca_validation ... ok
test fail_certificate_certificate_validation ... FAILED
test certificate_certificate_ca_validation ... ok
test default_certificate_validation ... FAILED
test certificate_certificate_validation ... FAILED
test none_certificate_validation ... ok
test full_certificate_validation ... FAILED

failures:

---- fail_certificate_certificate_validation stdout ----
Error: ErrorMessage { msg: "Expected error but response was 200 OK" }
thread 'fail_certificate_certificate_validation' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', <::std::macros::panic macros>:5:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- default_certificate_validation stdout ----
Error: ErrorMessage { msg: "Expected error but response was 200 OK" }
thread 'default_certificate_validation' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', <::std::macros::panic macros>:5:6

---- certificate_certificate_validation stdout ----
Error: ErrorMessage { msg: "Expected error but response was 200 OK" }
thread 'certificate_certificate_validation' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', <::std::macros::panic macros>:5:6

---- full_certificate_validation stdout ----
Error: ErrorMessage { msg: "Expected error but response was 200 OK" }
thread 'full_certificate_validation' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', <::std::macros::panic macros>:5:6


failures:
    certificate_certificate_validation
    default_certificate_validation
    fail_certificate_certificate_validation
    full_certificate_validation

test result: FAILED. 3 passed; 4 failed; 0 ignored; 0 measured; 0 filtered out

[DISCUSS] Should 400-599 HTTP responses all be an Err result?

All HTTP status code responses currently return Ok(Response) in Result<Response, Error>.

Should status codes 400-599 return Err(Error)?

How to create a request that overwrites older values?

The only method that uses Method::Put is 'PutScript', but I don't think that is what I am looking for. I have a Rust struct that looks something like this:

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct User {
    pub user_id: Uuid,
    pub username: String
}

I want to insert it into my 'user' index in ElasticSearch (to eventually search them by username). This is what I tried:

pub fn update(user: User) -> BResult<()> {
    let elastics = // Some elastic search connection
    let user_id = user.user_id.to_string(); // I want to use the user id as id in ElasticSearch
    let mut update = elastics .create(CreateParts::IndexId("user", &user_id)).body(user);
    // At this point, somehow update.body is None, although I did set it. Strange...

    // I don't use Tokio in my project so I use the Tokio runtime to block
    let response = log_if_err_retry(db_session, Runtime::new().unwrap().block_on(update.send()))?;

    assert_ok(db_session, response.status_code())?;

    Ok(())
}

When I manually provide JSON to the body of 'update' (by doing json!(...)), I get the error 409 when executing the same request twice. That is indeed what the docs are saying in Rust, but I just want the values to overwrite. I think I want to mimic this request (that I execute in Kibana):

PUT /user/_doc/USERID
{ 
    USER JSON
}

But I don't know how to do it in Rust. So:

Why is my update.body None?
How can I insert/update json for a given ID?

I couldn't find a put request in the tests folder.

[DISCUSS] Fluent builder with Sender trait or fluent (closure) function

The current implementation implements APIs as structs that follow the consuming builder pattern, mutating self when functions are called to set values. For example,

#[derive(Default)]
pub struct CatAliases {
    client: Elasticsearch,
    error_trace: Option<bool>,
    // ... etc.
}
impl CatAliases {
    pub fn new(client: Elasticsearch) -> Self {
        CatAliases {
            client,
            ..Default::default()
        }
    }
    #[doc = "Include the stack trace of returned errors."]
    pub fn error_trace(mut self, error_trace: Option<bool>) -> Self {
        self.error_trace = error_trace;
        self
    }
    
    // ...etc. 
}

Each builder struct also implements the Sender trait, a terminal function for sending the actual request, which consumes (transfers ownership of) the builder

impl Sender for CatAliases {
    fn send<T>(self) -> Result<ElasticsearchResponse<T>>
    where
        T: DeserializeOwned,
    {
        // send actual request
    }
}

This allows for the following usage pattern

let cat_response = client.cat()
                         .aliases() // <-- returns new instance of CatAliases 
                         .error_trace(Some(true))
                         .send()?;

An alternative implementation considered would be for API functions to accept a function parameter where the argument is the builder struct. For example

let cat_response = client.cat()
                         .aliases(|p| { p.error_trace(Some(true)) });

This would remove the need for each builder to implement the Sender trait, and the explicit send() function to execute the request. It would however make it trickier to compose default values for a request and use it as the basis for multiple requests. For example,

let cat_aliases = client.cat()
                         .aliases() // <-- returns new instance of CatAliases 
                         .error_trace(Some(true));

let mut cat_aliases_clone = cat_aliases.clone();
// add v parameter to the clone
cat_aliases_clone = cat_aliases_clone.v(Some(true));

let cat_response = cat_aliases.send()?;
let cat_clone_response = cat_aliases_clone.send()?;

Interested to hear technical arguments for each approach.

[ENHANCEMENT] Builder fields and Parts enums accepting references

Builder structs and their associated UrlParts enum accept all arguments as owned types. For example, for CatCount and CatCountUrlParts

#[derive(Debug, Clone, PartialEq)]
#[doc = "Url parts for the Cat Count API"]
pub enum CatCountUrlParts {
    None,
    Index(Vec<String>),
}
impl CatCountUrlParts {
    pub fn build(self) -> Cow<'static, str> {
        match self {
            CatCountUrlParts::None => "/_cat/count".into(),
            CatCountUrlParts::Index(ref index) => {
                let index_str = index.join(",");
                let mut p = String::with_capacity(12usize + index_str.len());
                p.push_str("/_cat/count/");
                p.push_str(index_str.as_ref());
                p.into()
            }
        }
    }
}
#[derive(Clone, Debug)]
#[doc = "Request builder for the Cat Count API"]
pub struct CatCount {
    client: Elasticsearch,
    parts: CatCountUrlParts,
    error_trace: Option<bool>,
    filter_path: Option<Vec<String>>,
    format: Option<String>,
    h: Option<Vec<String>>,
    help: Option<bool>,
    human: Option<bool>,
    pretty: Option<bool>,
    s: Option<Vec<String>>,
    source: Option<String>,
    v: Option<bool>,
}

Arguments that accept

a collection string values accept Vec<String>
a string value accept String

This forces a consumer of the client to create types when wanting to use the API

let response = client
    .cat()
    .count(CatCountUrlParts::Index(vec!["index-1".into()]))
    .filter_path(Some(vec!["some_path".into()]))
    .send()
    .await?;

A more idiomatic way would be to allow a consumer to pass references and slices, something like the following

let response = client
    .cat()
    .count(CatCountUrlParts::Index(&["index-1"]))
    .filter_path(Some(&["some_path"]))
    .send()
    .await?;

Such a change likely requires lifetimes to be specified on the builders and enums. See the example/reference-args branch for an example with the Cat Count API.

[ENHANCEMENT] Ability to add HTTP headers to a request

Every request made by the client should have the ability to allow a consumer to optionally specify additional HTTP headers to include in the request, for example, X-Opaque-Id.

Possible implementation

Generated builder structs include a HeaderMap field, with an associated function that accepts a header name and value. It may make sense to expose the same signature as the reqwest crate, the HTTP crate being used

pub fn header<K, V>(mut self, key: K, value: V) -> Self
where
    HeaderName: HttpTryFrom<K>,
    HeaderValue: HttpTryFrom<V>,
{
    //...
}

[BUG] Url serialize Option<Vec<String>>

Serde's UrlEncodedSerializer<Target> does not know how to serialize an Option<Vec<String>> with some value. This should be serialized as one value, where the vec values are joined by commas.

[DISCUSS] Builder associated functions accepting Option<T>

Each builder struct models all fields as Option<T>. Fields include all query string parameters and the body of the request, if applicable. The associated functions to assign values to fields on the builder struct all accept Option<T> too. For example

let mut response = client
    .search(SearchUrlParts::None)
    .pretty(Some(true))
    .q(Some("title:Elasticsearch".into()))
    .send()?;

This issue is to discuss whether the associated functions should accept Option<T> or simply T in each case. The above example would then become

let mut response = client
    .search(SearchUrlParts::None)
    .pretty(true)
    .q("title:Elasticsearch".into())
    .send()?;

One advantage of accepting Option<T> is that it is possible to clone a builder and pass None for those values that one does not wish to send on the clone. If functions only accept T this would not be possible, although it can be argued how often are consumers likely to want to clone and None out values? One disadvantage of accepting Option<T> everywhere is that it makes using the client more verbose because each value needs to be passed as Some(T).

Interested to hear relative merits of each approach.

[ENHANCEMENT] Add BulkBodyBuilder/BulkRequestBuilder?

The current bulk API is a little raw/not very "Rusty". I understand that it matches the actual bulk REST API and that this library, in general, is meant to be a thin abstraction over the REST API.

That being said, is it within the scope of this library to add something like a BulkDocumentOperation from the elastic crate—used like this? It would abstract over update vs. index vs. delete etc. and expose options for each like setting source: true for update to return the updated doc (which is available on elastic-rs/elastic#master).

[ENHANCEMENT] Add pass through features for reqwest

In an effort to reduce dependency duplication as reqwest is a very commonly used crate; it would be handy to have some features we can pass through to reqwest.

Some candidates are:

rustls-tls = ["reqwest/rustls-tls"]
cookies = ["reqwest/cookies"]
socks = ["reqwest/socks"]

Do we need the native-tls feature specifically? It looks like it adds some more features vs. default-tls? Can rustls-tls be used instead? rusttls-tls makes building a static binary via musl libc much easier...

[DOCS] How to use the Scroll API?

Please describe the documentation
Currently the Scroll API does not provide any examples. It would be great if these could be provided, such that it would be easier to use for beginners.

Describe where the documentation should for
The Scroll struct should be documented with examples on how to use them.

[BUG] BulkParts is not convinient

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when I'm not able to return back BulkParts from a function.

Often we just create Index by using part of my data but the following code won't compile

fn build_bulk<'a>(
    proximity_uuid: &'a str,
    alerts: Vec<Alert>,
) -> (BulkParts<'a>, Vec<JsonBody<Value>>) {
    let bulk_parts = BulkParts::Index(&format!("alerts.{}", proximity_uuid));
    let body: Vec<JsonBody<_>> = alerts
        .into_iter()
        .map(|alert| {
            vec![
                JsonBody::from(json!({"index": {"_id": Uuid::new_v4().to_simple().to_string() }})),
                JsonBody::from(json!(alert)),
            ]
        })
        .flatten()
        .collect();

    (bulk_parts, body)
}

Describe the solution you'd like
A clear and concise description of what you want to happen.

Make it possible to return back a BulkParts from a function

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[INFRA] Setting up pull request CI

Currently, there are a couple of basic checks in place for PRs:

CLA check
Clippy check

In line with other official client repositories, a Continuous Integration should be configured to:

run unit tests
run integration tests
check for dead links in documentation

on PRs.

[DISCUSS] Currently requires nightly release of rustc due to external_doc

This crate appears to require the nightly release of rustc, solely due to its use of the external_doc feature. It would be convenient if the crate were buildable with the stable release. Is there any way to avoid the use of this feature for the time being?

[ENHANCEMENT] Request and Response types

Is your feature request related to a problem? Please describe.
For some time I was using the TypeScript client for Elasticsearch. The major source of bugs and inconvenience is that the exposed client API is mostly untyped. When moving to Rust I was hoping that the strong statically-typed nature of the language will enforce creating a strictly-typed client in it. But as I see the client methods just use T: serde::Serialize which is frustrating.

The problem with this approach is that the API is so-called stringy, i.e. it is super-easy to mistype the JSON object keys, pass the JSON value of the wrong type or if the value is a string of a fixed set of possible values it is very easy to misspell the string enum. Besides that, you get poor developer experience from IDE, since you get no hints about the object shape that is expected, no completions and go-to-definitions to inspect the possible set of properties you can write, etc..

At the end of the day, such a stringy API just means that all the type-checks are moved from the compile-time to run-time, thus requiring extensive testing of your application, but it always happens that you forget to test some rare code-path where you e.g. misspelled the query object key and it gets to production and boom...

I know you understand these concerns and the decision on using the stringy JSON API in Rust was deliberate. Maybe, because it allows you to bootstrap the client with much less time and effort.
I agree that it is a good short-term decision, it did let you create the crate very rapidly, didn't it?
So maybe it's time to do more long-term design improvements?...

Describe the solution you'd like
I don't have the ideal API proposal here. I'd like to hear your thoughts on that. I saw that you were interested in rs-es crate that does provide a good strongly-typed API, but unfortunately, this crate is likely unmaintained...
Also, I'd like to note that it is important to not only define the input types to the client but also strongly type the response objects.

One thing that I'd like to warn you is the downside of static typing, anyway. I noticed it in diesel-rs (which has a very cool strongly-typed API which I'd like this crate to aspire).

E.g. If we implement the query builder it is necessary to ensure that it will allow for dynamic query building, i.e. let boxing the query builder like this is done in diesel in into_boxed()

Example of the problem:

impl BoolQueryBuilder {
    must() -> MustQueryBuilder;
    should() -> ShouldQueryBuilder;
}

let query = if cond {
    bool().must()
} else {
    bool().should() // this will fail because the types of branches are different
}

Describe alternatives you've considered
As a crazy alternative, we can use macros for DSL so that we preserve the JSON-like syntax for building queries, but the macros will validate the query shape and raise a compile-time error if something invalid is encountered.

However, as to me, the API should expose something like a document object model (DOM) or abstract syntax tree (AST) which lets you safely create your requests. By now this crate already uses an AST, but this is the AST of the JSON language as a whole.

So at the high level, the goal is to narrow down the JSON language to Elasticsearch DSL JSON subset, which doesn't allow all the possible object shapes that are defined by JSON, but only the shapes that Elasticsearch server understands and works with.

Additional context
Option::<AdditionalContext>::None

[ENHANCEMENT] Implement Basic Authentication

The client can currently connect to an Elasticsearch node at a single URI without any kind of authentication. The ability to provide Basic Authentication credentials to the client to use to connect to a URI should be implemented.

Possible implementation

ConnectionSettings is intended to be the struct that collects all relevant details needed to establish a connection to Elasticsearch, so the ability to pass Basic Authentication credentials logically sits as a field on ConnectionSettings.

Connection really depends on ConnectionSettings, so should arguably be passed as a ctor argument to create one, with Elasticsearch client now depending only on Connection. In future, there may be client specific settings unrelated to the Connection, but this dependency flow can be refactored if that need arises. Similarly, when #7 is implemented, Connection would be constructed by a ConnectionPool implementation, but again, this can be refactored when implementing connection pools.

[ENHANCEMENT] Update api_generator to v7.4.0+ REST API spec structure

The api_generator crate has been written against the v7.3.1 REST API spec structure. This structure has changed in v7.4.0+ to group URL parts with paths, and to move deprecated paths into their own section. The api_generator crate should be updated to reflect these changes.

[ENHANCEMENT] Default Cat APIs to accept text/plain header

Is your feature request related to a problem? Please describe.
In implementing the yaml test runner in #19, there is a need to be able to return the response as a plain text string, to make assertions against.

Describe the solution you'd like
All Cat APIs should set text/plain accept and content-type headers by default, with the ability to read the response as plain text

objekt renamed to dyn-clone

I might be missing something but objekt does not seem to be on crates.io?

crossbeam-channel v0.3.9 fails to compile

crossbeam-channel v0.3.9 fails to compile and fails with the following

error[E0432]: unresolved import `crossbeam_utils::atomic`
 --> /home/r/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-channel-0.3.9/src/flavors/tick.rs:8:22
  |
8 | use crossbeam_utils::atomic::AtomicCell;
  |                      ^^^^^^ could not find `atomic` in `crossbeam_utils`

It's related to crossbeam_utils 0.66 being compiled with a nightly feature that got changed. See crossbeam-rs/crossbeam#435 for more details.

To address for now, force crossbeam_utils to use 0.65:

cargo update -p crossbeam-utils --precise 0.6.5

[DISCUSS] Exposing request details on the response

Should ElasticsearchResponse expose details of the request? If so, which ones?

HttpMethod?
The constructed url?
The request body? If so, JSON string representation, bytes, struct, something else?

Does it make sense to expose these on ElasticsearchResponse, or should they be exposed through some other means e.g. event hooks? Some of the discussion around this would feed into a conversation about diagnostic tracing/logging within the client.

[BREAKING] Rename read_body and read_body_as_text

With the introduction of read_body_as_text in #72 for handling plain text responses for the cat APIs, the naming of these functions is somewhat wieldy.

This issue is a proposal to rename these functions

read_body would become json
read_body_as_text would become text

[ENHANCEMENT] Implement Single Node Connection pool

The client send function currently delegates to Connection's send function

    pub fn send<B, Q>(
        &self,
        method: HttpMethod,
        path: &str,
        query_string: Option<&Q>,
        body: Option<B>,
    ) -> Result<ElasticsearchResponse, ElasticsearchError>
    where
        B: Serialize,
        Q: Serialize + ?Sized,
    {
        self.connection.send(method, path, query_string, body)
    }

This is a seam to implement connection pooling.

A connection pool is a pool of nodes that the client knows about, to which an API request can be sent. The simplest type of connection pool is a single node connection pool, that simply contains a single node.

The client should be updated to accept a ConnectionPool, asking it for a node to which to make an API request. Initial thoughts here are that ConnectionPool would be a trait with a function that can be used to retrieve a "node" to which an API request can be made. A SingleNodeConnectionPool is part of the implementation, and other connection pools can be implemented in separate issues.

[ENHANCEMENT] Retry API calls

All Elasticsearch clients are intended to retry API calls whose response is a 502, 503 or 504 HTTP status code when there is another node that the client knows about against which the request can be retried. The .NET client documentation summarizes the behaviour well.

Retries should also be implemented for the client, once ConnectionPool implementations that accept multiple urls have been implemented.

Best practice for converting search results to Rust structs?

When using the search api, the queried objects are in an array inside "hits" and another "hits". Furthermore, the actual objects are again nested inside a "_source" object. So to extract the data from a search, I have this code:

let value = response.read_body::<Value>().unwrap();

let mut hits = value["hits"]["hits"]
    .as_array()
    .unwrap()
    .clone();

if hits.is_empty() {
    return Ok(vec![])
}

Ok(hits
    .into_iter()
    .map(|e| e["_source"].clone())
    .map(|e| serde_json::from_value(e).unwrap())
    .collect::<BResult<_>>()?)

This looks like code that can be somewhere in this library (else everyone needs to write something like this). Is there a standard way of doing what I am doing?

[ENHANCEMENT] Scroll API helper functions

Similar to #62, The scroll API can be used to retrieve a large number of documents from Elasticsearch by issuing a search request with the scroll parameter, and using the scroll_id returned in a response to fetch the next batch of documents with a search request, continuing until all documents are retrieved.

Many of the existing Elasticsearch clients provide a "scroll helper" for this purpose. The helper can issue a search request, and continue to issue search requests until all documents are retrieved. The scroll can be sliced, allowing concurrent scrolls to be executed.

The Rust client should provide a similar, idiomatic way of helping consumers retrieve a large collection of documents.

[BUG] Scroll API fails when query hits too many shards

Describe the bug

The size of the scroll_id token is proportional to the number of shards hit by the query.

We found info on scroll_id's generating large strings elastic/elasticsearch-py#971 . And looking over the Scroll code in rust, it always puts the scroll_id in to a GET request and in to the query params. So large queries will cause the Elasticsearch cluster's to report An HTTP line is larger than 4096 bytes..

To Reproduce
Steps to reproduce the behavior:

Hit a lot of shards in a query
Watch ES reject the second page of your scroll logic

Expected behavior

The Scroll code in rust should always use a POST so it can have long scroll_ids.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS: [e.g. Windows 10 Pro]
rustc version [e.g. rustc --version]

Additional context
Add any other context about the problem here.

[ENHANCEMENT] Enable passing a custom reqwest client to the TransportBuilder

Is your feature request related to a problem? Please describe.
I wanted to disable certificate validation for testing, but realized that there is no TransportBuilder method which allows me to pass in a custom reqwest client.

Describe the solution you'd like
I'd like to first generate a client like so.
(I made an edit to remove calling build on the reqwest client)

let transport_client = Client::builder()
  .danger_accept_invalid_hostnames(true)
  .danger_accept_invalid_certs(true);

and then pass it to a TransportBuilder like so:

let transport = TransportBuilder::new(conn_pool)
  .disable_proxy()
  .with_client(transport_client)
  .build()?;

Describe alternatives you've considered
My first thought was to make client_builder public, but it may not be obvious that TransportBuilder.client_builder was a reqwest client.

Additional context
For some weird reason, when I tried this, it compiles and runs. I don't understand how that's possible, but even so it still revokes self-signed certs.

pub use client::new;

pub mod client {
  use elasticsearch::http::transport::{TransportBuilder,SingleNodeConnectionPool};
  use elasticsearch::{Elasticsearch,Error};
  use url::Url;

  pub fn new(url: &str) -> Result<Elasticsearch, Error> {
    let url = Url::parse(url).unwrap();
    let conn_pool = SingleNodeConnectionPool::new(url);
    let transport = TransportBuilder::new(conn_pool).disable_proxy();
    transport.client_builder // I thought this was private
      .danger_accept_invalid_hostnames(true) // but this doesn't cause a compiler error
      .danger_accept_invalid_certs(true);
    transport.build()?;
    return Ok(Elasticsearch::new(transport));
  }
}

[BUG] error decoding response body: missing field `<field>` at line x column y

Describe the bug
It seems that there are some problems with deserializing structs.

*To Reproduce

I created an index and pushed my structs using the following code:

let response = client
            .index(IndexParts::IndexId(index, id))
            .body(&self)
            .send()
            .await?;

        Ok(response.status_code().is_success())

I can verify in Kibana that the entities are correctly created.
When I try to retrieve them using the following code:

let mut response = client
            .get(GetParts::IndexId(index, id))
            .send()
            .await?;

response.read_body().await?

I receive the error error decoding response body: missing field at line x column y.

The interesting part however is that if I do:

let mut response = client
            .get(GetParts::IndexId(index, id))
            .send()
            .await?;

        let value: Value = response.read_body().await.unwrap();
        let value =  value.get("_source").unwrap();
        let value: Self = serde_json::from_value(value.clone()).unwrap();
        Ok(value)

It can successfully decode the response.

The struct I use has the following format:

pub struct MyStruct
{
    pub a: String,
    pub b: String,
    pub c: Vec<HashMap<String, String>>,
    pub d: u64,
}

The error stated that it was unable to find missing field a.

EDIT: As a bonus I printed out the value from the second (working) example, and the JSON I printed contained all the parameters of MyStruct.

Expected behavior
Expected response.read_body() to successfully deserialize the response.

Environment (please complete the following information):

OS: Windows 10 Pro
rustc 1.41.1 (f3e1a954d 2020-02-24)

[ENHANCEMENT] async/await API functions

Now that async/await has landed in Rust stable, the send functions of builder structs should be updated to be asynchronous functions. This would probably necessitate removing the Sender trait as async in traits is early days and the trait probably doesn't serve a great deal of value currently.

[DOCS] Client example documentation

Each official client has started to implement all of the console examples within the Elasticsearch reference documentation on the master branch, with the aim to port the examples to a future "current" branch once a high number of the doc examples are implemented.

An example of the client doc examples is match query doc page. Switching to C# in a console example shows

Client examples should be implemented for the Rust client

Possible implementation

the approach to implementing the YAML rest spec test runner in #19 will likely require access to the ASTs used to generate the client, in order to construct correct client calls in generated test functions. Similarly, the client doc examples would be able to benefit from having access to the ASTs in order to generate correct client examples.

[ENHANCEMENT] Implement From<T> for UrlParts enums

Each API models the API url parts as an enum, and where an API has more than one enum variant, the API function on the root/namespace client takes the enum as an argument. For example, for search

let response = client
    .search(SearchUrlParts::Index(&["index-1"])) // <-- accepts enum
    .send()
    .await?;

Currently, the Rust compiler cannot infer the enum type based on the parameter, meaning the complete enum path needs to be specified as above, instead of simply the following pseudo

let response = client
    .search(Index(&["index-1"]))
    .send()
    .await?;

To make the API somewhat simpler to use, it is proposed to implement From<T> traits for each enum such that a value, or tuple of values, can be used. Taking the example above

impl<'a> From<&'a [&'a str]> for SearchUrlParts<'a> {
    fn from(index: &'a [&'a str]) -> Self {
        SearchUrlParts::Index(index)
    }
}

impl<'a> From<(&'a [&'a str], &'a [&'a str])> for SearchUrlParts<'a> {
    fn from(index_type: (&'a [&'a str], &'a [&'a str])) -> Self {
        SearchUrlParts::IndexType(index_type.0, index_type.1)
    }
}

let response = client
    .search(&["posts"].into())
    .send()
    .await?;

[Question] Stable version of client

Hi,

Do you plan to make a stable version of the client ?
I would prefere use your client in stable version if it possible.
If you don't plan to do it, maybe I'll considere using unstable rust then.

regards,

[ENHANCEMENT] Pass reference to Elasticsearch instance

Each builder struct currently takes ownership of Elasticsearch, with instantiation of a builder struct from the root or a namespace client accepting a clone of the Elasticsearch instance on which the associated function is called. For example, in the case of search

#[derive(Clone, Debug)]
#[doc = "Builder for the [Search API](https://www.elastic.co/guide/en/elasticsearch/reference/master/search-search.html). Returns results matching a query."]
pub struct Search<'a, B> {
    client: Elasticsearch,
    parts: SearchParts<'a>,
    // ....
}

impl<'a, B> Search<'a, B>
where
    B: Body,
{
    #[doc = "Creates a new instance of [Search] with the specified API parts"]
    pub fn new(client: Elasticsearch, parts: SearchParts<'a>) -> Self {
        Search {
            client,
            parts,
            // ...
         }
    }
}

impl Elasticsearch {
    #[doc = "Returns results matching a query."]
    pub fn search<'a>(&self, parts: SearchParts<'a>) -> Search<'a, ()> {
        Search::new(self.clone(), parts)
    }
}

With an instance of Elasticsearch now instantiated with a Transport that has a ConnectionPool of Connections with which to make API calls to Elasticsearch, it is desirable for builder structs to share the same ConnectionPool, such that in the future, when other ConnectionPool implementations can refresh the collection of connections, this would be reflected in all builders.

Possible implementations

Same lifetime as `SearchParts<'a>`

Making Elasticsearch now a reference, &Elasticsearch, requires giving it an explicit lifetime. One implementation of this would be to give it the same lifetime as SearchParts<'a>

pub struct Search<'a, B> {
    client: &'a Elasticsearch,
    parts: SearchParts<'a>,
    // ....
}

impl<'a, B> Search<'a, B>
where
    B: Body,
{
    pub fn new(client: &'a Elasticsearch, parts: SearchParts<'a>) -> Self {
        Search {
            client,
            parts,
            // ...
         }
    }
}

impl Elasticsearch {
    pub fn search<'a>(&'a self, parts: SearchParts<'a>) -> Search<'a, ()> {
        Search::new(&self, parts)
    }
}

New lifetime

An alternative is introducing a new lifetime for &Elasticsearch

pub struct Search<'a, 'b, B> {
    client: &'a Elasticsearch,
    parts: SearchParts<'b>,
    // ...
}

impl<'a, 'b, B> Search<'a, 'b, B>
where
    B: Body,
{
    pub fn new(client: &'a Elasticsearch, parts: SearchParts<'b>) -> Self {
        Search {
            client,
            parts,
            // ...
        }
    }
}

pub fn search<'a, 'b>(&'a self, parts: SearchParts<'b>) -> Search<'a, 'b, ()> {
    Search::new(&self, parts)
}

I think that &Elasticsearch should have a different lifetime to *Parts<'a> (second implementation) because it can be alive for a different (longer) scope than both *Parts<'a> and the returned builder struct.

[REFACTOR] Rename status_code to status?

Should we rename status_code everywhere to status to match reqwest?

[ENHANCEMENT] Ability to set global HTTP headers

Is your feature request related to a problem? Please describe.
The client exposes the ability to set HTTP headers per request, it should also expose
the ability to set global headers that are sent with every request.

Describe the solution you'd like
TransportBuilder should expose a function to add default HTTP headers to the underlying reqwest client

[DOCS] Document that the 'tokio' runtime is required

Please describe the documentation

I understand this crate requires the tokio runtime. It would be nice to note that somewhere in the readme.

Describe where the documentation should for

In the readme's "getting started" section: https://github.com/elastic/elasticsearch-rs#getting-started

Additional context
I tried using this crate with the std-async runtime, but it didn't work:

thread 'main' panicked at 'not currently running on the Tokio runtime.'
[...]
  35: <hyper_tls::client::HttpsConnecting<T> as core::future::future::Future>::poll
             at /src/github.com-1ecc6299db9ec823/hyper-tls-0.4.1/src/client.rs:144
  36: core::future::poll_with_context
             at /rustc/699f83f525c985000c1f70bf85117ba383adde87/src/libcore/future/mod.rs:84
  37: reqwest::connect::Connector::connect_with_maybe_proxy::{{closure}}
             at /src/github.com-1ecc6299db9ec823/reqwest-0.10.4/src/connect.rs:346
[...]

[BUG] CA chains are not supported with native-tls

Describe the bug
Using a CA pem cert CertificateValidation::Full only supports a single certificate and breaks with CAs that require an intermediate CA.

To Reproduce
Steps to reproduce the behavior:

Create an intermediate CA
cat the root CA and the intermediate CA into one PEM file
Create a server cert for this CA
Try to connect an elasticsearch client to it

Expected behavior
Either accept an array of certs or split the PEM file into individual certs and call the underlying reqwest method multiple times.

Environment (please complete the following information):
native-tls

[ENHANCEMENT] Model API Url part variants as enums

Relates: #2

The Url parts within a API Url variants should be modelled as an enum. For example, for the search API

pub enum SearchParts {
    None,
    Index(Vec<String>),
    IndexType(Vec<String>, Vec<String>),
}

(the Vec<String> would be reference such as &[&str], but this can be addressed later).

Modelling as enums prevents a user from being able to specify a URL part that can only be provided when another URL part is also specified. The parts enums can be generated from the REST API spec.

Add body field to builders that accept a body

For APIs that accept a request body, the builder struct should expose a function for setting a body for the request. The input parameter should probably be a type that implements serde's Serialize trait, to allow a user to supply their own type for a request, or use serde's json! macro and serde_json::Value.

[DOCS] Document client, with examples

rustdoc is a built-in tool that can help with generating documentation for a project.

An initial set of documentation should be created for the elasticsearch crate.

[BUG] There is no obvious way to support a reverse proxy

Describe the bug
When running towards elasticsearch behind a reverse proxy where the url is something like http://10.1.2.3/es/ I haven't found a way to use it. The reason for this appears to be the way Url::join works. When given a path that starts with / it removes everything after the domain and adds it to the root.

To Reproduce

    // The logging from reqwuest shows how the url is not as expected
    let client = Elasticsearch::new(Transport::single_node("http://10.1.2.3/es/")?);
    let search = client.search(SearchParts::Index(&["my-index"]));
    let res = search.send().await?;
    // [2020-01-14T10:25:31Z DEBUG reqwest::async_impl::response] Response: '200 OK' for https://10.1.2.3/my-index/_search

    // Digging further shows the issue is likely in how Url::join works
    let url = url::Url::parse("http://10.1.2.3/hello/").unwrap();
    assert_eq!(url.join("world").unwrap().as_str(), "http://10.1.2.3/hello/world");
    assert_eq!(url.join("/world").unwrap().as_str(), "http://10.1.2.3/world");

    // Note: The trailing backslash is important or it will be consithered a file name
    let url = url::Url::parse("http://10.1.2.3/hello/again").unwrap();
    assert_eq!(url.join("world").unwrap().as_str(), "http://10.1.2.3/hello/world");
    assert_eq!(url.join("/world").unwrap().as_str(), "http://10.1.2.3/world");

Expected behavior
The url used should be https://10.1.2.3/es/my-index/_search

Environment (please complete the following information):

OS: Probably doesn't matter. Tested on Windows and Linux
rustc version: rustc 1.42.0-nightly (859764425 2020-01-07)

[ENHANCEMENT] Allow str, bytes and Serialize request bodies

The current body associated fn on each builder struct expects a T that implements the Serialize trait. For example, for Search

pub fn body<T>(self, body: T) -> Search<'a, T>
    where
        T: Serialize,
{
    // ...
}

The body T is then passed to reqwest's json() fn to serialize as JSON.

Both String and Vec<u8>, and their reference/slice counterparts have Serialize impls, but the resulting request body is unexpected/not that useful; For String, the body will be a literal string value enclosed in double quotes, whilst Vec<u8> will be a JSON array of numbers.

What is probably more useful is to allow a consumer to pass a JSON string literal or JSON bytes to the body method, and write these using reqwest's body() fn.

[ENHANCEMENT] Implement YAML test runner

In a similar vein to other official clients, a YAML test runner should be implemented, to run the YAML spec tests.

Possible implementation

Create a tests directory in elasticsearch crate for integration tests
Use the yaml-rust crate to read YML files.
Use the quote crate and quote! macro to generate client calls to pass to the compiler based on calls defined within the YML test. The macro could generate a function that is called from another function containing the assertions, or quote! may be used to generate the entire test.

[ENHANCEMENT] Non-generic API functions when API accepts GET and POST

Some APIs allow sending a request either as a GET request or POST request. For those that allow sending as a GET request, a request body may not always need to be provided.

With the current implementation, an API that can send a POST request will model the inputs with a generic builder struct, where the generic type parameter is the body type. Take for example, search

#[doc = "http://www.elastic.co/guide/en/elasticsearch/reference/master/search-search.html"]
pub fn search<B>(&self) -> Search<B>
where
    B: Serialize,
{
    Search::new(self.clone())
}

With the current implementation, the generic type parameter B forms part of the search() function that returns the builder to build the request, meaning a user has to provide some value like () (unit), even if they may not be sending a request body, such as in the case of a GET request. This implementation can be changed to make search non-generic, and body on Search<B> generic

// 1. Make search() non-generic and return Search<()> by default

pub fn search(&self) -> Search<()>
where
    B: Serialize,
{
    Search::new(self.clone())
}

// 2. Make body() generic and return new Search<B> when body set

#[doc = "The body for the API call"]
pub fn body<B>(mut self, body: Option<B>) -> Search<B> {
    Search {
        client: self.client,
        _source: self._source,
        _source_excludes: self._source_excludes,
        //... assign all fields
        body: body,
    }
}

This can be generated from the REST API spec

[DOCS] How to create a Mapping?

Please describe the documentation
I was browsing the documentation and was unable to find some documentation on how to create a mapping using the API. Maybe it would be nice to provide an example on the IndicesPutMapping or put_mapping functions?

Describe where the documentation should for
A documented example should be added that shows how to create a mapping on an index.

[ENHANCEMENT] Bulk API helper functions

The bulk API can be used to index multiple documents into Elasticsearch by constructing a bulk request containing multiple documents, and executing against Elasticsearch. When the number of documents is large however, a consumer needs to construct multiple bulk requests, each containing a slice of the documents to be indexed, and execute these against Elasticsearch.

Many of the existing Elasticsearch clients provide a "bulk helper" for this purpose. The helper can be:

provided a large collection of documents: this could be as a stream, lazy iterable, etc.
slice the collection into "chunks": this could be by number of documents or by request byte size
execute multiple concurrent requests against Elasticsearch to index documents
optionally backoff and retry indexing documents that fail to be indexed signalled by a 429 Too Many Requests HTTP response.

An example helper is the BulkAllObservable from the C#/.NET client.

The Rust client should provide a similar, idiomatic way of helping consumers bulk index a large collection of documents.

Map URL parts to builders

Each API endpoint maps to one or more URL paths, where each path may contain placeholders for URL parts that should be replaced with values supplied by the user. For example, _search has the following paths in 7.3.1 Rest API spec

/_search
/{index}/_search

The builder for the _search API, Search, should contain an index field that when set, results in the API call using the path specifying the index. It's envisaged that this will require

An index field on the `Search struct
An index function on Search implementation that sets the index field
An implementation in the Sender trait send() function that checks whether index has a value and if it does, creates a path by replacing {index} in /{index}/_search with the index value, and uses this for the API function.

This implementation can be entirely generated from the REST API spec.

elastic / elasticsearch-rs Goto Github PK

elasticsearch-rs's Introduction

elasticsearch

Compatibility

Features

Getting started

Installing

Async support with tokio

Create a client

Making API calls

License

elasticsearch-rs's People

Contributors

Stargazers

Watchers

Forkers

elasticsearch-rs's Issues

Possible implementation

Possible implementation

Possible implementation

Possible implementations

Same lifetime as SearchParts<'a>

New lifetime

Possible implementation

Recommend Projects

Recommend Topics

Recommend Org

Same lifetime as `SearchParts<'a>`