Giter Club home page Giter Club logo

postgres-elasticsearch-fdw's People

Contributors

aileo avatar j0k3r avatar matthewfranglen avatar mikulas avatar opbro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

postgres-elasticsearch-fdw's Issues

More of a Discussion than an issue

We are modernizing Multicorn2 the support python 3.9+ and Postgres 12+. Perhaps this project could do the same? Dropping support for old Postgres versions and old python versions will make it sensible to continue moving this valuable project forward to support thru current versions of Postgres and ElasticSearch

[Feature proposal] Add sort option in the query to do delta export

Hi, again me :)

I think the main goal of this extension is to synchronize an ES index to a Postgres database.
The only option for now is to filter thanks to the URI search in ES.

If you have to synchronize regularly your ES index database, you need to only load new documents since the last import (if the ES documents are immutable such as logs). Personally, to do a delta import this way, I do that:

-- it loads every new documents since the last import.
-- it uses Query String Syntax. Do note that date filter is a ">" (as expected) and not ">=" due to the character "{" in the query string
--
-- @see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
-- @see https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
--
-- Do not try to modify it with CTE or put subquery from the WHERE clause in the FROM clause.
-- Otherwise, the optimizer tries to load all data from ES and then applies the WHERE clause on the timestamp on Postgres side.
insert into access_log
select
    *
from es_access_log
where query = (
    select
        CONCAT(
            'log_timestamp:{',
            to_char(MAX(log_timestamp), 'YYYY-MM-DD"T"HH24:MI:SS"Z"'),
            ' TO *}'
        )
    from access_log
);

The problem is that you can have logs in the same second interval that are not loaded. Generally, when you want to do such export, you export with a deterministic order as for a seek pagination

Whith pseudo SQL code, the filter would be:

WHERE log_timestamp >=  max_log_timestamp and id > id_for_max_log_timestamp
ORDER BY id

Adding sort capability (possible in the URI search from ES) would allow to export accurately your data for such usecase.

Could it support the 'join field'?

I create two tables with one-many relationship, and I want to use a 'join field' to handle this. But I've got a RequestError(400, 'mapper_parsing_exception', 'failed to parse') issue when I insert a child data. Could this extension support the 'join field' feature?

Could it support fields starting with an underscore, "_"?

Hi,

Are fields starting with an underscore in ElasticSearch, like _index or _source, supported? Because they showed empty whe I do a select in the foreing table of PostgreSQL, and they have information in Elasticsearch.
In the foreign table definition, I tried with index, _index, and "_index" without any luck.

Thanks in advance!

Could it support fields with capital letters?

Hi. Thanks for your amazing work!

A question: are capital letter fields in ElasticSearch, like client_source_IP and client_source_Port, supported? Because they showed empty, and they have information in Elasticsearch.
Also, te correct type for an ip datatype of Elasticsearch is INET in PostgreSQL, right? And for the port, that in Elasticsearch is a long datatype, I'm using BIGINT in PostgreSQL, and is empty too. Is this correct?

Thanks!

OpenSearch

Hello,
did this fdw also works with OpenSearch?

Regrads
Lutz

Array type field select error

When the field is defined as an array type, the insert succeeds, but an error is thrown on query.

CREATE FOREIGN TABLE "public"."_es_spt_product" ( "id" varchar(20) COLLATE "pg_catalog"."default", "product_id" varchar(20) COLLATE "pg_catalog"."default", "product_name" varchar(100) COLLATE "pg_catalog"."default", "cate_ids" text[] ) SERVER "multicorn_es" OPTIONS ( "host" '127.0.0.1', "port" '9200', "index" 'spt_product', "rowid_column" 'id', "query_dsl" 'true', "default_sort" 'product_id', "refresh" 'false', "timeout" '20', "username" 'elastic', "password" 'elastic' );

INSERT INTO "_es_spt_product" VALUES ( '194805', '194805', 'TIMBUK2', '{7JIA2_CATE_04,7JIA2_CATE_0402,7JIA2_CATE_040203}')

"hits" : [ { "_index" : "spt_product", "_type" : "_doc", "_id" : "194805", "_score" : 1.0, "_source" : { "product_id" : "194805", "product_name" : "TIMBUK2", "cate_ids" : [ "7JIA2_CATE_04", "7JIA2_CATE_0402", "7JIA2_CATE_040203" ] } }

SELECT * FROM "_es_spt_product"

ERROR: malformed array literal: "["7JIA2_CATE_04", "7JIA2_CATE_0401", "7JIA2_CATE_040101"]"
DETAIL: "[" must introduce explicitly-specified array dimensions.

ERROR: COUNT for /index1 failed: Expecting value: line 1 column 1 (char 0)

Hello,

I was trying to perform the query from Postgres to elasticsearch using "query" as shown above. But it was throwing me an error saying...

postgres=# select * from articles_es where query = '_id:1';
ERROR:  COUNT for /index1 failed: Expecting value: line 1 column 1 (char 0)

Here are the details of how I created the foreign table (providing info on a high level):

CREATE FOREIGN TABLE articles_es
    (
        id TEXT,
        query TEXT,
        ....
    )
SERVER multicorn_es
OPTIONS
    (
        host 'elasticsearch',
        port '9200',
        index 'article-index',
        type 'article',
        rowid_column 'id',
        query_column 'query',
        query_dsl 'false',
        refresh 'false',
        complete_returning 'false',
        timeout '20',
        username 'elastic',
        password 'elastic'
    )
;

Any suggestions, please.

Doesn't work with Elasticsearch v6 without a tweak

For posterity, I was able to get this to "work" with PostgreSQL 10 and Elasticsearch 6 by pinning the version of the elasticsearch package dependency (in setup.py, and requirements.txt for consistency). Without pinning the version, elasticsearch v7 is installed, which only works with Elasticsearch 7+.

The Dockerfile I used:

FROM postgres:10-alpine

RUN apk add --no-cache \
        python2 \
        python2-dev \
        py2-pip

RUN apk add --no-cache build-base

RUN pip install pgxnclient
RUN pgxn install multicorn

RUN pip install pg_es_fdw

ENTRYPOINT ["docker-entrypoint.sh"]

EXPOSE 5432
CMD ["postgres"]

I haven't run the full tests yet, but simple insert, delete, and select are working for me.

The documentation is pretty good, but I think it could be tweaked to make it even more clear up-front that quite likely you will have to accommodate your particular version Elasticsearch in addition to PostgreSQL and the most recent Multicorn that goes with your version of PostgreSQL.

Along with this, it would be better if the repo was configured to work with a particular set of versions, and those versions were clearly spelled out.

BTW, the current version of Multicorn (1.3.5 from 2018) is compatible with PostgreSQL 10 but not PostgreSQL 11.

could it support chinese?

when i try to insert into chinese,i got zhe flowing error:

postgres@linux-dtq3:/home/fdw/postgres-elasticsearch-fdw-master> 2021-03-05 05:37:13.358 EST [1705] 错误:  INDEX for /test-index/doc/1 and document {'istrue': u't', 'testjson': {}, 'createdate': datetime.date(2021, 3, 5), 'time': u'11:27:00', 'flag': u'111', 'num': 666L, 'testbytea': '\xe6\xb5\x8b\xe8\xaf\x95', 'address': u'\u5317\u4eac\u5e02\u4e30\u53f0\u533a', 'price2': u'66.666', 'message': u'\u6d4b\u8bd5\u5b57\u6bb5\u6d4b\u8bd5\u5b57\u6bb5\u6d4b\u8bd5\u5b57\u6bb5\u6d4b\u8bd5\u5b57\u6bb5', 'datetime': datetime.datetime(2021, 3, 5, 12, 22, 54), 'price': 11.01} failed: ({'istrue': u't', 'testjson': {}, 'createdate': datetime.date(2021, 3, 5), 'time': u'11:27:00', 'flag': u'111', 'num': 666L, 'testbytea': '\xe6\xb5\x8b\xe8\xaf\x95', 'address': u'\u5317\u4eac\u5e02\u4e30\u53f0\u533a', 'price2': u'66.666', 'message': u'\u6d4b\u8bd5\u5b57\u6bb5\u6d4b\u8bd5\u5b57\u6bb5\u6d4b\u8bd5\u5b57\u6bb5\u6d4b\u8bd5\u5b57\u6bb5', 'datetime': datetime.datetime(2021, 3, 5, 12, 22, 54), 'price': 11.01}, UnicodeDecodeError('ascii', '"\xe6\xb5\x8b\xe8\xaf\x95"', 1, 2, 'ordinal not in range(128)'))
2021-03-05 05:37:13.358 EST [1705] 语句:  insert into test_es(id,num,flag,istrue,createdate,price,price2,datetime,address,message,testbytea,testjson,time) 
	values(1,666,'111'::"bit",true,'2021-03-05',11.01,66.666,'2021-03-05 12:22:54','北京市丰台区','测试字段测试字段测试字段测试字段','测试'::bytea,'{}','11:27')

failed to parse field

[created_stamp] field is Timestamptz type, an error occurred when inserting data:
failed: RequestError(400, 'mapper_parsing_exception', "failed to parse field [created_stamp] of type [date] in document with id '1111111111'. Preview of field's value: '2018-05-28 19:00:15.359+08'")

Elasticsearch SSL authentication - errors

Hi,
Want to know whether "pg_es_fdw" supports SSL authentication.

I have an Elasticsearch cluster setup with SSL authentication. When I tried to insert data into elasticsearch after the foreign table creation step, I see below authentication error:

postgres=# INSERT INTO articles_es1
    (
        id,
        title,
        body,
        metadata
    )
VALUES
    (
        1,
        'foo',
        'spike',
        '{"score": 3}'::json
    );
ERROR:  INDEX for /article-index/1 and document {'title': 'foo', 'body': 'spike', 'metadata': {'score': 3}, 'query': None, 'score': None, 'sort': None} failed: ConnectionError(('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))) caused by: ProtocolError(('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))

Moreover, I tried with "SSL" option too by providing the pem file location as shown below, but no luck:

CREATE FOREIGN TABLE articles_es1
    (
        id BIGINT,
        title TEXT,
        body TEXT,
        metadata JSON,
        query TEXT,
        score NUMERIC,
        sort TEXT
    )
SERVER multicorn_es
OPTIONS
    (
        host '192.168.1.194',
        port '9200',
        index 'article-index',
        rowid_column 'id',
        query_column 'query',
        query_dsl 'false',
        score_column 'score',
        default_sort 'last_updated:desc',
        sort_column 'sort',
        refresh 'false',
        complete_returning 'false',
        timeout '20',
        username 'admin',
        password 'admin',
        ssl '/opt/root-ca.pem'
)
;

Thanks in advance !!

docker pg15 install issue

Hello,

It seems something has changed with the python3 or python3-pip packages, as meanwhile they implemented "externally managed" base environments.
https://peps.python.org/pep-0668/#specification
As a result, the docker image fails Dockerfile PG15 at line PYTHON_OVERRIDE=python3 make install, more specifically when it tries to run pip3 install .
I tried fixing it by myself, I also tried to install an earlier version of python3 but it seems they are not available anymore, and I am not very good at installing packages in linux.
I would greatly appreciate it if you could offer me a solution and maybe update the dockerfile.

Thank you,
Mihai

exemple (doc)

there is an issue with doc/example ... "body" and "content" that are mixed.
replace content by body and everything will be fine
ty

Authentication by headers

Hi,

Thank you for your very helpful project!

Currently, I’m using an authentication system passing identifiers using headers in the request to ElasticSearch.

Would it be possible to customize headers in the request made to ElasticSearch please ?

Thank you for your help,
Matthiosso

The select is too slowly

I put 100,000 rows data to elasticsearch,when I select 1 row by id through foreign table,it took 222 second, I wander is it fetch all data from elasticsearch to postgres and then filter the data?

Support request for pgsql-13.x

Hi!
As PostgreSQL 13 offers significant improvements in performance and usability, want to know whether any plan to support pgsql-13 version in the future.

I cant't connect with ElasticSearch from AWS

I'm trying to test this wrapper with AWS Elastic Search
My Elastic Object

 {
                "_index": "twitter",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "User": "olivere",
                    "Message": "Take Five",
                    "Retweets": 0
                }
            }
My foreing table
    CREATE FOREIGN TABLE articles_es
    (
        "User" TEXT,
        Message TEXT,
        Retweets NUMERIC
    )
SERVER multicorn_es
OPTIONS
    (
        host 'https://vpc-my-elastic-jxd4u4p3jwj4sl00ydrardrt7i.us-east-1.es.amazonaws.com',
        port '80',
        index 'twitter',
        type '',
        rowid_column 'id',
        query_column 'query',
        query_dsl 'false',
        score_column 'score',
        default_sort 'last_updated:desc',
        sort_column 'sort',
        refresh 'false',
        complete_returning 'false',
        timeout '20',
        username '',
        password ''
    )
;

I got this error

SQL Error [XX000]: ERROR: COUNT for /twitter failed: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f5373493750>: Failed to establish a new connection: [Errno -2] Name or service not known) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f5373493750>: Failed to establish a new connection: [Errno -2] Name or service not known)

Problem with current version elasticsearch==7.16.0

With elasticsearch==7.16.0 I have this error:

ERROR:  SEARCH for /index-name failed: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [] in order to sort on')

With previous elasticsearch==7.14.2 all work correctly.

Cast array or object property as json Postgres type

Hello,

Is it possible to cast an array as json easily with the extension? Same thing for json objects.
As far I know and by reading the Python code of the extension, it casts it as a Python string.

For example, the ES document:

        {
            "browser_name": "Chrome",
            "user_os_name": "Mac OS X",
            "email_domains": ["example.com", "example2.com"],
            "geoip": {
                "ip": "203.168.53.187",
                "latitude": -37.734,
                "location": {
                    "lat": -37.734,
                    "lon": 144.7378
                }
            }
        }

The foreign table definition (JSON for geoip or domain_names does not work):

CREATE FOREIGN TABLE es_access_log
(
    id TEXT,
    query TEXT,
    geoip TEXT,
    browser_name TEXT,
    user_os_name TEXT
    domain_names TEXT
)
SERVER multicorn_es
OPTIONS
(
    host 'elasticsearch',
        port '9200',
        index 'logstash-nginx',
        rowid_column 'id',
        query_column 'query',
        score_column 'score',
        timeout '20',
        scroll_size '1000',
        duration '30m'
    )
;
select email_domains from my_foreign_table;

[u'example.com', u'example2.com']
select geoip from my_foreign_table;

"ip"=>"203.168.53.187", "latitude"=>-37.734, "location"=>"lat"=>-37.734, "lon"=>144.7378

What I would like is to declare it as json or jsonb in order to be cast automatically by the wrapper.
Did I miss something? Is it supported?

Passing a JSON as query

Good morning,
I would like to query ElasticSearch using a query body defined in Postgres.
I am not able to pass the query body as a JSON using the query parameter.

Architecture Framework

  • 1 elastic search index
  • 1 postgres table

Elastic Search index

I have one index containing all my data in ElasticSearch target-index
With the mapping:

"mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "name": {
        "type": "text"
      },
      "city": {
        "type": "text"
      }
}
}

containing the data

id name city
1 Paul Ogier New York, NY
2 Joe Scott Boston, MA
3 Alice Sharp Boston, MA

Postgres Data

One table

id name city q_body
1 Paul New York {JSON query: see below
2 Joe Boston JSON Query
3 Alice Boston Json query

See the example of the JSON query for row 1: It's a multi-match query

{"query" :
  {"bool" : 
    {"should":
      [
        {"match": {"name": {"query": "Paul", "fuzziness": 1}}},
        {"match": {"city": {"query": "New York", "fuzziness": 1}}}
      ]
    }
   },
"explain" : true, "size" : 20}

Foreign Data Wrapper

CREATE FOREIGN TABLE es_fdw
    (
        id BIGINT,
        name varchar,
        query JSON,
        score NUMERIC
    )

Results. actual vs Expected:

Query using:

  • With es_fdw as a foreign data wrapper to the target-index
WITH mypgquery AS (SELECT pg_id, q_body FROM mypgtable )
SELECT a.pg_id, id as es_id, name, score
    FROM es_fdw, a
    WHERE query = mypgquery.q_body LIMIT 10;

Expected result

pg_id es_id name score
1 1 Paul Ogier 10
2 2 Json Scott 10
2 3 Alice Sharp 3
3 3 Alice Sharp 10
3 2 Json Scott 3

Actual result

ERROR: operator does not exist: json = json

I cannot pass a query parameter as json

Could it support the `RETURNING` ?

First, thank you for this awesome project ! Easy to set up and works very well.

I use RETURNING * almost every time I do an INSERT in postgres to retrieve the real inserted data (e.g: with default values).
When using it on a foreign table using this foreign data wrapper, I get an empty row.
Is it the expected behavior? If not, even if I am very new to FDW and python, I could try to fix it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.