tarantool / graphql.0 Goto Github PK
View Code? Open in Web Editor NEWSet of adapters for GraphQL query language to the Tarantool data model
License: Other
Set of adapters for GraphQL query language to the Tarantool data model
License: Other
We should correctly handle nullable index with partial/full keys. Cases TBD.
Consider #43 for proposed semantic.
Query:
{
services(uid: 123)
{
uid
p1
p2
}
}
Error:
2018-02-22 14:56:04.333 [60448] main/453/main utils.lua:188 E> [request_id: front-01-00000] Error catched: attempt to index field 'name' (a nil value)
2018-02-22 14:56:04.333 [60448] main/453/main utils.lua:190 E> [request_id: front-01-00000] Error occured at '...ida/.rocks/share/tarantool/graphql/tarantool_graphql.lua:422'
2018-02-22 14:56:04.333 [60448] main/453/main utils.lua:192 E> [request_id: front-01-00000]
2018-02-22 14:56:04.333 [60448] main/453/main utils.lua:171 E> [request_id: front-01-00000] [Lua ] function 'assert_gql_query_ast' at <...ida/.rocks/share/tarantool/graphql/tarantool_graphql.lua:422>
2018-02-22 14:56:04.333 [60448] main/453/main utils.lua:171 E> [request_id: front-01-00000] [Lua ] function 'compile' at <...ida/.rocks/share/tarantool/graphql/tarantool_graphql.lua:459>
Add a canonical data set to the test suite to evaluate the effectiveness of the optimizer based on the cost model tarantool/graphql#22.
Having a canonical data set, and a test set of graphql queries, we can cover the optimizer with functional tests.
Develop a cost model with which each query, can be assigned a cost: estimated, before execution, and factual, after the fact. The cost model should be used by the planner to assess alternative query plans, and by tests to assess the overall quality of the planner.
http://graphql.org/learn/schema/#union-types
Current connection format:
{
name = 'connection_name_bar',
destination_collection = 'collection_baz',
type = '1:1',
parts = {
{
source_field = 'field_name_source_1',
destination_field = 'field_name_destination_1'
},
...
},
index_name = 'index_name'
}
Proposed the second connection format:
{
name = 'connection_name_bar',
type='1:1',
variants = {
{
filter = {foo = 1, bar = 'id_1'},
destination_collection = 'collection_baz',
parts = {
{
source_field = 'field_name_source_1',
destination_field = 'field_name_destination_1'
},
...
}
index_name = 'index_name'
},
...
}
}
We can move source_fields upward from variants, but I like the idea of maximum reusability of the current code (for now, at least). The format of 'filter' choosen from the same idea.
from
parameter of accessor:select()
to be list of filter
, from
pairs, expand collection_name to be list of such collection names (in the corresponding order);parent
with filters from the from
argument one by one and choose the Nth collection_name from the list collection_name
, pass the found collection name and certain from
variant to the unchanged select_internal
.Debatable: the avro_schema_changes.org
document restricts tag value type to number / string and utilizes type conversion, that seems not good for me. Maybe we must specify tag value as a value of some field, not as a key. I proposed more powerfull way that allows to reuse the existing code of our library as much as possible.
Extend the reduce step in tarantool/shard merge to use openmp
Limit result list(s) length (overall items count or items count for each list) or limit result size in bytes.
It seems that grapql-lua supports directives already, so we need to write a test to check all works as expected. The example can be found here: http://graphql.org/learn/queries/#directives . I think the test can reuse the common_testdata dataset and conditionally include/skip a connection field. I think we need to check only a boolean variable as the expression of a directive and postpone more complex cases when it will be requested explicitly.
Allow to export execution plan in some plan text format, to be able to cover optimizer decisions with functional tests.
offset
fieldAdd a plan to push down filter function to storage nodes, to avoid excessive data transfer to the execution node.
Schema:
"user": {
"type": "record",
"name": "user",
"fields": [
{"name": "uid", "type": "long"},
{"name": "p1", "type": "string"},
{"name": "p2", "type": "string"},
{
"name": "nested",
"type": {
"type": "record",
"name": "nested",
"fields": [
{"name": "x", "type": "long"},
{"name": "y", "type": "long"}
]
}
}
]
}
Error:
Encountered multiple types named "nested"
partial mean prefix of full list of index fields
There are two approach to handle nested objects that are failed to fetch by a 1:1 connection:
I don’t sure which option should be implemented (or both?).
Use GT iterator to implement offset. The start offset nil is used for initial position (offset 0)
By a primary key?
like()
regexp()
<
or
and
To be able to perform cost-based query analysis, we need to maintain cluster-wide statistics about data distribution: number of records, index fan-out (unique records vs. all records), data set size for column/space.
Where to store this statistics is yet to be investigated (let's look at newsql vendors and see what they do).
foo.bar: …
or foo: {bar: …}
syntax.We can add indexes only by schema fields now.
Selects from spaces sorted by an underlied index, for a secondary index it will sorted by the secondary index, then by primary index (after 1).
Selects from shard will use only primary index for sorting, so items fetched by a partial index can have order that differs from the order we would see using spaces.
With many graphql queries, we may overwhelm storage nodes with requests. We need a distributed planner which would allow to utilize many nodes, a kind of problem solved by Apache Mesos and its ecosystem. Each resource should be characterized with a cost vector, based on the cost model #22
See also: https://asterix.ics.uci.edu/pub/AsterixDBOverview.pdf
https://asterix.ics.uci.edu/pub/ICDE11_conf_full_690.pdf
Additional select filtering is needed I guess.
Do not transfer entire result set from storage nodes, it can be very big.
We have two versions of Avro-schema and four service fields:
0001:
"service": {
"type": "record",
"name": "service",
"fields": [
{"name": "uid", "type": "string"},
{"name": "p1", "type": "long"},
{"name": "p2", "type": "long"}
]
}
0002:
"service": {
"type": "record",
"name": "service",
"fields": [
{"name": "uid", "type": "string"},
{"name": "p1", "type": "long"},
{"name": "p2", "type": "long"},
{"name": "p3", "type": "string", "default": "test avro default"}
]
}
If data has been pushed in tarantool, then we the tuple ['79031234566', '2451111545', '0002', 1519231048.4021, '79031234566', 2, 2, 'test avro default']
will be stored. So if I try to receive data through graphql, I will receive an error in unflatten function. Because in the first version of the schema we don't have 'p3' field.
Maybe it worth to implement #13 using that. Don’t sure.
The following cases can be handled with index lookup:
Allow to interrupt execution of a complex query when a timeout expires.
Change the built-in graphql execution algorithm (the one which works with "accessors") from simple nested loop to block-nested loop.
https://en.wikipedia.org/wiki/Block_nested_loop
The proposal was to consiner:
I don’t sure we want to reduce flexibility here, like use the unique secondary index as an uniue one. From the other side, for space/shard accessors that affects only result shape: get an object or list of one item which is that object. Need to elaborate how that really constraint us in possible accessor implementations.
The original decision to move indexes description from the graphql part was inspired by feeling that it looks more like accessor’ part, then graphql’s one. But there is possible compromise: link a connection with an index in data accessor, then fetch connection types information in graphql part from the accessor.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.