tarantool / graphql.0 Goto Github PK

View Code? Open in Web Editor NEW

20.0 28.0 3.0 1.24 MB

Set of adapters for GraphQL query language to the Tarantool data model

License: Other

Makefile 0.10% Lua 30.43% HTML 0.19% CSS 1.58% JavaScript 67.47% Shell 0.23%

graphql.0's People

Stargazers

Watchers

Forkers

sudobobo hollow111 kasen

graphql.0's Issues

Support for float, double and boolean avro types

Write tests for a nullable index on shard

We should correctly handle nullable index with partial/full keys. Cases TBD.

Support nullable 1:1 connections

Consider #43 for proposed semantic.

Simple queries(without name) causes an error

Query:

{ 
  services(uid: 123)
  {
    uid
    p1
    p2
  }
}

Error:

2018-02-22 14:56:04.333 [60448] main/453/main utils.lua:188 E> [request_id: front-01-00000] Error catched: attempt to index field 'name' (a nil value)
2018-02-22 14:56:04.333 [60448] main/453/main utils.lua:190 E> [request_id: front-01-00000] Error occured at '...ida/.rocks/share/tarantool/graphql/tarantool_graphql.lua:422'
2018-02-22 14:56:04.333 [60448] main/453/main utils.lua:192 E> [request_id: front-01-00000]
2018-02-22 14:56:04.333 [60448] main/453/main utils.lua:171 E> [request_id: front-01-00000] [Lua ] function 'assert_gql_query_ast' at <...ida/.rocks/share/tarantool/graphql/tarantool_graphql.lua:422>
2018-02-22 14:56:04.333 [60448] main/453/main utils.lua:171 E> [request_id: front-01-00000] [Lua ] function 'compile' at <...ida/.rocks/share/tarantool/graphql/tarantool_graphql.lua:459>

Canonical data set

Add a canonical data set to the test suite to evaluate the effectiveness of the optimizer based on the cost model tarantool/graphql#22.

Having a canonical data set, and a test set of graphql queries, we can cover the optimizer with functional tests.

Develop a cost model with which each query, can be assigned a cost: estimated, before execution, and factual, after the fact. The cost model should be used by the planner to assess alternative query plans, and by tests to assess the overall quality of the planner.

Support destination_collection deducible from a parent object

http://graphql.org/learn/schema/#union-types

Current connection format:

{
    name = 'connection_name_bar',
    destination_collection = 'collection_baz',
    type = '1:1',
    parts = {
        {
            source_field = 'field_name_source_1',
            destination_field = 'field_name_destination_1'
        },
        ...
    },
    index_name = 'index_name'
}

Proposed the second connection format:

{
    name = 'connection_name_bar',
    type='1:1',
    variants = {
        {
           filter = {foo = 1, bar = 'id_1'},
           destination_collection = 'collection_baz',
           parts = {
               {   
                   source_field = 'field_name_source_1',
                   destination_field = 'field_name_destination_1'
               },  
               ... 
            }
            index_name = 'index_name'
         },  
         ... 
     }
}

We can move source_fields upward from variants, but I like the idea of maximum reusability of the current code (for now, at least). The format of 'filter' choosen from the same idea.

tarantool_graphql (validation): expand connection validation with support of the second connection format;
tarantool_graphql (graphql schema generation): for such connections: generate all possible types and construct the union as the graphql type of the corresponding connection field;
tarantool_graphql (resolve functions): expand from parameter of accessor:select() to be list of filter, from pairs, expand collection_name to be list of such collection names (in the corresponding order);
accessor_general: save select_internal as is, but before invoke it do the following: match parent with filters from the from argument one by one and choose the Nth collection_name from the list collection_name, pass the found collection name and certain from variant to the unchanged select_internal.

Debatable: the avro_schema_changes.org document restricts tag value type to number / string and utilizes type conversion, that seems not good for me. Maybe we must specify tag value as a value of some field, not as a key. I proposed more powerfull way that allows to reuse the existing code of our library as much as possible.

Choose index was set for a connection, not by filter data

Create list of planned optimizations

optimize offset
join reordering (default: top to bottom, but when there is an index in a nested level: reorder execution)
- further task: evaluate selectivity
we can start with simple join loop
block nested loop
map-reduce + pushdown conditions
determine index name when compile query for top-level objects when it has certain list of non-null arguments

Multi-threaded merger

Extend the reduce step in tarantool/shard merge to use openmp

Limit result size

Limit result list(s) length (overall items count or items count for each list) or limit result size in bytes.

Support directives: @skip, @include

It seems that grapql-lua supports directives already, so we need to write a test to check all works as expected. The example can be found here: http://graphql.org/learn/queries/#directives . I think the test can reuse the common_testdata dataset and conditionally include/skip a connection field. I think we need to check only a boolean variable as the expression of a directive and postpone more complex cases when it will be requested explicitly.

Test record in record case

check a shape of such object is right in an output of a query
check filtering by an inner field

Export execution plan

Allow to export execution plan in some plan text format, to be able to cover optimizer decisions with functional tests.

Support compound primary indexes

Option to define objects shape using space format

Push down filter function to storage nodes

Add a plan to push down filter function to storage nodes, to avoid excessive data transfer to the execution node.

Add to tarantoolctl rocks

Unify graphql configuration (collections, schemas, …) with our closed source projects

Avro schema with a record as a record field causes an error

Schema:

"user": {
    "type": "record",
    "name": "user",
    "fields": [
        {"name": "uid", "type": "long"},
        {"name": "p1", "type": "string"},
        {"name": "p2", "type": "string"},
        {
            "name": "nested",
            "type": {
                "type": "record",
                "name": "nested",
                "fields": [
                    {"name": "x", "type": "long"},
                    {"name": "y", "type": "long"}
                ]
            }
          }

    ]
}

Error:
Encountered multiple types named "nested"

Generate avro schema for a query

show nullability in these avro schemas (via avro union?)

Testing using test-run

Support index lookup by partial value

for connections by partial index
for top-level objects

partial mean prefix of full list of index fields

JOIN semantic for nested objects filtering

There are two approach to handle nested objects that are failed to fetch by a 1:1 connection:

Give an error. This is the current behaviour. Seems to be GraphQL-way.
Remove all parent objects while parent-child connections has 1:1 type (a first 1:N connected parent or top-level object will give the empty list). This is sort of backtraking and feels like SQL JOIN.

I don’t sure which option should be implemented (or both?).

Create tests for our non-bound long type

Refactor tarantool/shard reducer to be usable for graphql reduce

Add graphiql

Minimal documentation

Integrate travis-ci

Add offset by key, limit by number

Use GT iterator to implement offset. The start offset nil is used for initial position (offset 0)

Elaborate which kind of sorting would be natural for list results

By a primary key?

Add built-in filter function for every Avro class

like()
regexp()

<
or
and

Accessor for tarantool/shard

Maintain cluster-wide statistics about data and its distribution

To be able to perform cost-based query analysis, we need to maintain cluster-wide statistics about data distribution: number of records, index fan-out (unique records vs. all records), data set size for column/space.

Where to store this statistics is yet to be investigated (let's look at newsql vendors and see what they do).

Test record within a record

generate appropriate GraphQL types;
test for selecting such objects;
test for filtering by an internal fields using foo.bar: … or foo: {bar: …} syntax.

Add a support of indexes by service fields

We can add indexes only by schema fields now.

Sorting differs btw space and shard accessors for 1:N connections by partial index

Selects from spaces sorted by an underlied index, for a secondary index it will sorted by the secondary index, then by primary index (after 1).

Selects from shard will use only primary index for sorting, so items fetched by a partial index can have order that differs from the order we would see using spaces.

Distributed resource planner

With many graphql queries, we may overwhelm storage nodes with requests. We need a distributed planner which would allow to utilize many nodes, a kind of problem solved by Apache Mesos and its ecosystem. Each resource should be characterized with a cost vector, based on the cost model #22

Support for avro array / map

Generate GraphQL schema for arrays and maps.
Remove array / map fields from arguments list.
Write test.

Support shard with redundancy > 1

Additional select filtering is needed I guess.

Chunked data transfer from storage nodes

Do not transfer entire result set from storage nodes, it can be very big.

Create reproducer for the shard test segfault, enable the cases

tarantool/tarantool#3101

Accessor should consider avro schema evolution

We have two versions of Avro-schema and four service fields:
0001:

"service": {
    "type": "record",
    "name": "service",
    "fields": [
        {"name": "uid", "type": "string"},
        {"name": "p1", "type": "long"},
        {"name": "p2", "type": "long"}
    ]
}

0002:

"service": {
    "type": "record",
    "name": "service",
    "fields": [
        {"name": "uid", "type": "string"},
        {"name": "p1", "type": "long"},
        {"name": "p2", "type": "long"},
        {"name": "p3", "type": "string", "default": "test avro default"}
    ]
}

If data has been pushed in tarantool, then we the tuple ['79031234566', '2451111545', '0002', 1519231048.4021, '79031234566', 2, 2, 'test avro default'] will be stored. So if I try to receive data through graphql, I will receive an error in unflatten function. Because in the first version of the schema we don't have 'p3' field.

Support for non-POD graphql arguments (define inner types)

Maybe it worth to implement #13 using that. Don’t sure.

Improve algorithm of choosing an index for top-level objects

The following cases can be handled with index lookup:

Handle case when fields forms a prefix of existing index parts.
Handle case when we can get partial (prefix) or entire index by skipping some fields.

Support query execution timeout

Allow to interrupt execution of a complex query when a timeout expires.

Block-nested loop

Change the built-in graphql execution algorithm (the one which works with "accessors") from simple nested loop to block-nested loop.
https://en.wikipedia.org/wiki/Block_nested_loop

Whether it worth to deduce connection type (1:1 or 1:N) from index parts provided

The proposal was to consiner:

full primary index lookup as unique,
partial primary index lookup as non-uniue,
full/partial secondary index as non-unique.

I don’t sure we want to reduce flexibility here, like use the unique secondary index as an uniue one. From the other side, for space/shard accessors that affects only result shape: get an object or list of one item which is that object. Need to elaborate how that really constraint us in possible accessor implementations.

The original decision to move indexes description from the graphql part was inspired by feeling that it looks more like accessor’ part, then graphql’s one. But there is possible compromise: link a connection with an index in data accessor, then fetch connection types information in graphql part from the accessor.

tarantool / graphql.0 Goto Github PK

graphql.0's People

Stargazers

Watchers

Forkers

graphql.0's Issues

Recommend Projects

Recommend Topics

Recommend Org