Giter Club home page Giter Club logo

plyql's Introduction

[ DEPRECATED ] PlyQL

A SQL-like interface to plywood

⛔️ This project has been deprecated due to SQL support in Druid.

Installation

PlyQL is built on top of node so make sure you have node >= 4.x.x installed.

npm install -g plyql

The global install will make the plyql command available system wide.

Usage

Currently only queries to Druid are supported. More support will come in the future.

The CLI supports the following options:

Option Description
--help print this help message
--version display the version number
-v, --verbose display the queries that are being made
-h, --host the host to connect to
-s, --source use this source for the query (supersedes FROM clause)
-i, --interval add (AND) a __time filter between NOW-INTERVAL and NOW
-tz, --timezone the default timezone
-q, --query the query to run
-o, --output specify the output format. Possible values: json (default), csv, tsv, flat
-a, --allow enable a behaviour that is turned off by default eternity allow queries not filtered on time select allow select queries
-t, --timeout the time before a query is timed out in ms (default: 60000)
-r, --retry the number of tries a query should be attempted on error, 0 = unlimited, (default: 2)
-c, --concurrent the limit of concurrent queries that could be made simultaneously, 0 = unlimited, (default: 2)
--rollup use rollup mode [COUNT() -> SUM(count)]
--druid-version Assume this is the Druid version and do not query for it
--skip-cache disable Druid caching
--introspection-strategy Druid introspection strategy. Use --help for possible values
--force-time force a column to be interpreted as a time column
--force-string force a column to be interpreted as a string column
--force-boolean force a column to be interpreted as a boolean column
--force-number force a column to be interpreted as a number column
--force-unique force a column to be interpreted as a hyperLogLog uniques
--force-theta force a column to be interpreted as a theta sketch
--force-histogram force a column to be interpreted as an approximate histogram (for quantiles)

For information on specific operators and functions supported by PlyQL please see: PlyQL language reference.

Examples

For an introduction and examples please see: PlyQL language reference.

Roadmap

Here is a list of features that is not currently supported that are in the works:

  • Query simulation - preview the queries that will be run without running them
  • Sub-queries in WHERE clauses
  • JOIN support
  • Window functions

Questions & Support

For updates about new and upcoming features follow @implydata on Twitter.

Please file bugs and feature requests by opening and issue on GitHub and direct all questions to our user groups.

plyql's People

Contributors

brettdh avatar davidagee avatar evasomething avatar vogievetsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

plyql's Issues

Missing information_schema tables from MySQL gateway

Hi there,

I tried pointing Tableau at the MySQL gateway (harsh, I know) and unsurprisingly it failed. It did manage to handshake and identify the databases, but then it failed when trying to get the schema information. There are at least 2 problems and I'm logging them separately. The first is that it is looking for a key_column_usage table in the information_schema that does not exist. I've included the first connection where all appears well in case it has any information that is useful, but the error happens on connection 2.

New connection 1
[1] Got SQL: SET NAMES utf8
[1] Got SQL: SET character_set_results = NULL
[1] Got SQL: SET SQL_AUTO_IS_NULL = 0
[1] Got SQL: set @@sql_select_limit=DEFAULT
[1] Got SQL: SELECT CONNECTION_ID()
[1] Got SQL: SELECT table_name, column_name
FROM information_schema.columns
WHERE data_type='enum' AND table_schema=''
[1] Got SQL: DROP TABLE #Tableau_sid_00000001_1_Connect
[1] Got SQL: SELECT CONNECTION_ID()
[1] Got SQL: SELECT table_name, column_name
FROM information_schema.columns
WHERE data_type='enum' AND table_schema=''
[1] Got SQL: SHOW DATABASES LIKE '%'
New connection 2
[2] Got SQL: SET NAMES utf8
[2] Got SQL: SET character_set_results = NULL
[2] Got SQL: SET SQL_AUTO_IS_NULL = 0
[2] Got SQL: set @@sql_select_limit=DEFAULT
[2] Got SQL: SELECT CONNECTION_ID()
[2] Got SQL: SELECT TABLE_NAME, TABLE_COMMENT, TABLE_TYPE, TABLE_SCHEMA FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA LIKE 'plyql1' AND ( TABLE_TYPE='BASE TABLE' OR TABLE_TYPE='VIEW' )
[2] Got SQL: SELECT table_name, column_name
FROM information_schema.columns
WHERE data_type='enum' AND table_schema='plyql1'
[2] Got SQL: DROP TABLE #Tableau_sid_00000002_1_Connect
[2] Got SQL: SELECT TABLE_NAME, TABLE_COMMENT, TABLE_TYPE, TABLE_SCHEMA FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA LIKE 'plyql1' AND ( TABLE_TYPE='BASE TABLE' OR TABLE_TYPE='VIEW' )
[2] Got SQL: set @@sql_select_limit=1
[2] Got SQL: SELECT *
FROM client_logs
WHERE 1=0
[2] Got SQL: set @@sql_select_limit=DEFAULT
[2] Got SQL: SELECT COLUMN_NAME,
REFERENCED_TABLE_NAME,
REFERENCED_COLUMN_NAME,
ORDINAL_POSITION,
CONSTRAINT_NAME
FROM information_schema.KEY_COLUMN_USAGE
WHERE REFERENCED_TABLE_NAME IS NOT NULL
AND TABLE_NAME = 'client_logs'
AND TABLE_SCHEMA = 'plyql1'

Failed to resolve query with Plywood.
Query:
SELECT COLUMN_NAME,
 REFERENCED_TABLE_NAME,
 REFERENCED_COLUMN_NAME,
 ORDINAL_POSITION,
 CONSTRAINT_NAME
FROM information_schema.KEY_COLUMN_USAGE
WHERE REFERENCED_TABLE_NAME IS NOT NULL
 AND TABLE_NAME = 'client_logs'
 AND TABLE_SCHEMA = 'plyql1'
If you believe this query should work please create an issue on PlyQL and include this section
Issue url: https://github.com/implydata/plyql/issues
Message:
could not resolve $KEY_COLUMN_USAGE
Stack:
Error: could not resolve $KEY_COLUMN_USAGE
    at RefExpression._fillRefSubstitutions (/home/ben/imply-1.2.1/imply-1.2.1/dist/pivot/node_modules/plywood/build/plywood.js:6683:23)
    at ChainExpression._fillRefSubstitutions (/home/ben/imply-1.2.1/imply-1.2.1/dist/pivot/node_modules/plywood/build/plywood.js:7111:49)
    at ChainExpression.Expression.referenceCheckInTypeContext (/home/ben/imply-1.2.1/imply-1.2.1/dist/pivot/node_modules/plywood/build/plywood.js:6183:18)
    at ChainExpression.Expression.referenceCheck (/home/ben/imply-1.2.1/imply-1.2.1/dist/pivot/node_modules/plywood/build/plywood.js:6176:25)
    at ChainExpression.Expression._initialPrepare (/home/ben/imply-1.2.1/imply-1.2.1/dist/pivot/node_modules/plywood/build/plywood.js:6277:18)
    at /home/ben/imply-1.2.1/imply-1.2.1/dist/pivot/node_modules/plywood/build/plywood.js:6318:45
    at _fulfilled (/home/ben/imply-1.2.1/imply-1.2.1/dist/pivot/node_modules/q/q.js:834:54)
    at self.promiseDispatch.done (/home/ben/imply-1.2.1/imply-1.2.1/dist/pivot/node_modules/q/q.js:863:30)
    at Promise.promise.promiseDispatch (/home/ben/imply-1.2.1/imply-1.2.1/dist/pivot/node_modules/q/q.js:796:13)
    at /home/ben/imply-1.2.1/imply-1.2.1/dist/pivot/node_modules/q/q.js:604:44

Basic auth not working

Hey,

is this supposed to work?

$ plyql --allow eternity --host user:psswd@hostname:80 -q "SELECT COUNT(*) FROM gwiq"
There was an error getting the data: getaddrinfo ENOTFOUND gwi gwi:8080

Or basic auth is not implemented in plyql ?

I would use curl to reach the plyql-server, but I have troubles with putting a request together if it contains backticks and quotes :

curl -X POST 'http://user:psswd@hostname/plyql' \
  -H 'content-type: application/json' \
  -d '{
  "sql": "SELECT COUNT(DISTINCT id) as id FROM foo WHERE (cid = 'c0069' AND \`c-geo:c3\` = 'dom')"
}'

Mysql gateway not supporting show warning command

Hi,

Show warnings doesn't seem to be supported in the new Mysql gateway . Would be great if this can be supported in any of the future releases.

Got SQL: SHOW WARNINGS

Failed to resolve query with Plywood.
Query:
SHOW WARNINGS
If you believe this query should work please create an issue on PlyQL and include this section
Issue url: https://github.com/implydata/plyql/issues
Message:
SQL parse error: Expected "COLUMNS", "DATABASES", "FULL", "GLOBAL", "SCHEMAS", "SESSION", "TABLES" or "VARIABLES" but "W" found. on 'SHOW WARNINGS'
Stack:
Error: SQL parse error: Expected "COLUMNS", "DATABASES", "FULL", "GLOBAL", "SCHEMAS", "SESSION", "TABLES" or "VARIABLES" but "W" found. on 'SHOW WARNINGS'
    at Function.Expression.parseSQL (/home/rajitha/imply-1.2.1/dist/pivot/node_modules/plywood/build/plywood.js:5482:23)
    at /home/rajitha/imply-1.2.1/dist/pivot/node_modules/plyql/build/plyql-mysql-gateway.js:43:53
    at Promise.apply (/home/rajitha/imply-1.2.1/dist/pivot/node_modules/q/q.js:1165:26)
    at Promise.promise.promiseDispatch (/home/rajitha/imply-1.2.1/dist/pivot/node_modules/q/q.js:788:41)
    at /home/rajitha/imply-1.2.1/dist/pivot/node_modules/q/q.js:1391:14
    at runSingle (/home/rajitha/imply-1.2.1/dist/pivot/node_modules/q/q.js:137:13)
    at flush (/home/rajitha/imply-1.2.1/dist/pivot/node_modules/q/q.js:125:13)
    at _combinedTickCallback (internal/process/next_tick.js:67:7)
    at process._tickCallback (internal/process/next_tick.js:98:9)

Mysql gateway - Alias not working for tablename

Hi @vogievetsky ,

Alias for table name doesn't seem to work. Can this be fixed asap?

[5] Got SQL: select page , sum( count ) from wikiticker AS f where '2015-09-12T00:00:00' <= __time and __time < '2015-09-13T00:00:00' group by page limit 5

Failed to resolve query with Plywood.
Query:
select page , sum( count ) from wikiticker AS f  where '2015-09-12T00:00:00' <= __time  and  __time < '2015-09-13T00:00:00' group by page limit 5
If you believe this query should work please create an issue on PlyQL and include this section
Issue url: https://github.com/implydata/plyql/issues
Message:
SQL parse error: Expected ".", ";", "GROUP", "HAVING", "LIMIT", "ORDER", "WHERE" or end of input but "A" found. on 'select page , sum( count ) from wikiticker AS f  where '2015-09-12T00:00:00' <= __time  and  __time < '2015-09-13T00:00:00' group by page limit 5'
Stack:
Error: SQL parse error: Expected ".", ";", "GROUP", "HAVING", "LIMIT", "ORDER", "WHERE" or end of input but "A" found. on 'select page , sum( count ) from wikiticker AS f  where '2015-09-12T00:00:00' <= __time  and  __time < '2015-09-13T00:00:00' group by page limit 5'

Regards
Rajitha

mysql gateway does not see new incoming data to druid

The query from mysql client only returns data in druid up to the time mysql gateway started. Any data from that point on is not seen

plyql -h hostname:80 -i P2Y --experimental-mysql-gateway 3307 --skip-cache

from mysql

select * from table order by __time desc limit 10;

This only return a different set whenever the plyql is restarted.

Packet for query is too large (45 > 1) error from client

I have been trying to play with the mysql gateway functionality with mixed results. Connecting from mysql's CLI has worked well, but connecting from one of our BI tools (Metric Insights) is failing. The output from the mysql gateway is:

[5] Got SQL: /* mysql-connector-java-5.1.36 ( Revision: 4fc1f969f740409a4e03750316df2c0e429f3dc8 ) */SELECT @@session.auto_increment_increment, @@character_set_client, @@character_set_connection, @@character_set_results, @@character_set_server, @@init_connect, @@interactive_timeout, @@license, @@lower_case_table_names, @@max_allowed_packet, @@net_buffer_length, @@net_write_timeout, @@query_cache_size, @@query_cache_type, @@sql_mode, @@system_time_zone, @@time_zone, @@tx_isolation, @@wait_timeout

[5] Got SQL: SELECT @@session.autocommit

[5] Got SQL: SET NAMES utf8mb4

[5] Got SQL: SET character_set_results = NULL

[5] Got SQL: SET autocommit=1

[5] Error: Connection lost: The server closed the connection.

The error received on the client is:

Packet for query is too large (45 > 1). You can change this value on the server by setting the max_allowed_packet' variable.

And I believe it was trying to issue a "Select now();" query, but it did not show up on the plyql console.

Make the first status native druid query optional

Hey,

trying to improving the performance I've notice that plyql is always issuing a status request :

Sending query 1:
{
  "queryType": "status"
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Got result from query 1: (in 101ms)

Don't know why but it takes a lot of time, even within ec2 network... Considering we need to handle 40 regs/s then basically only this time causes that we can hardly ever make it unless we'd scale it very much both horizontally and vertically ...

I deployed plyql-server to the same node where Broker is so I'm minimizing this time but it would be essential for remote plyql clients

MySQL gateway does not work with MySQLWorkbench

[2] Got SQL: set autocommit=1
[2] Got SQL: SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
[2] Got SQL: SHOW SESSION VARIABLES LIKE 'lower_case_table_names'
[2] Got SQL: SELECT current_user()

Failed to resolve query with Plywood.
Query:
SELECT current_user()
If you believe this query should work please create an issue on PlyQL and include this section
Issue url: https://github.com/implydata/plyql/issues
Message:
SQL parse error: Expected "*", "+", ",", "-", ".", "/", ";", "AND", "AS", "BETWEEN", "CONTAINS", "FROM", "GROUP", "HAVING", "IN", "IS", "LIKE", "LIMIT", "NOT", "OR", "ORDER", "REGEXP", "WHERE", Comparison or end of input but "(" found. on 'SELECT current_user()'
Stack:
Error: SQL parse error: Expected "*", "+", ",", "-", ".", "/", ";", "AND", "AS", "BETWEEN", "CONTAINS", "FROM", "GROUP", "HAVING", "IN", "IS", "LIKE", "LIMIT", "NOT", "OR", "ORDER", "REGEXP", "WHERE", Comparison or end of input but "(" found. on 'SELECT current_user()'
    at Function.Expression.parseSQL (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:5482:23)
    at /usr/local/lib/node_modules/plyql/build/plyql-mysql-gateway.js:43:53
    at Promise.apply (/usr/local/lib/node_modules/plyql/node_modules/q/q.js:1165:26)
    at Promise.promise.promiseDispatch (/usr/local/lib/node_modules/plyql/node_modules/q/q.js:788:41)
    at /usr/local/lib/node_modules/plyql/node_modules/q/q.js:1391:14
    at runSingle (/usr/local/lib/node_modules/plyql/node_modules/q/q.js:137:13)
    at flush (/usr/local/lib/node_modules/plyql/node_modules/q/q.js:125:13)
    at doNTCallback0 (node.js:407:9)
    at process._tickCallback (node.js:336:13)

where:

mysql> SELECT current_user();
+-----------------+
| current_user()  |
+-----------------+
| druid@localhost |
+-----------------+
1 row in set (0.00 sec)

Special character escaping

Hey,

I have some dimensions with : character which is a special char. But I can't find a way to escape it. So I currently cannot SELECT it or GROUP BY it... But there must be a way since Pivot can handle it via plywood... Do you know how it can be done ?

Concurrently run queries when nested selects are used

In the following query:

"SELECT interface_in, sum(bps) as 'bps_total', 
(SELECT TIME_BUCKET(__time, PT1M, 'Etc/UTC') as 'Time', 
sum(bps) AS 'Value' GROUP BY 1) AS TIME_RANGE 
FROM flows WHERE '2016-04-20T20:53:00.000Z'
 <= __time AND __time < '2016-04-20T21:51:48.750Z' 
GROUP BY interface_in ORDER BY bps_total desc limit 10"

Plyql first runs a top-N query with a limit of 10 to get the highest interface_in bps, it then runs a single time-series query for each result returned from the first. I propose that when the above case is caught all queries should be run concurrently then the results are blocked on a wait group.

Ultimately more concurrent connections to the broker will occur, however in our above case all queries return in a single second. Because of the lack of concurrency regardless of how large of a time window we use, 10 seconds is the minimum query time.

Support for range queries (Tableau)

Tableau tries to be "helpful" and rewrites some where predicates as range queries:

SELECT mydatasource.country_iso_code AS country_iso_code,
SUM(mydatasource.metric1) AS sum_metric1
FROM mydatasource
WHERE ((mydatasource.__time >= TIMESTAMP('2016-02-01 00:00:00')) AND (mydatasource.__time <= TIMESTAMP('2016-03-31 23:59:59')) AND (((mydatasource.country_iso_code >= 'AD') AND (mydatasource.country_iso_code <= 'EC')) OR ((mydatasource.country_iso_code >= 'EG') AND (mydatasource.country_iso_code <= 'ZW'))))
GROUP BY 1

Message:
not supported IN rhs type SET/STRING_RANGE
Stack:
Error: not supported IN rhs type SET/STRING_RANGE

Support for LOCATE() function?

Hello,

Thanks for PlyQL, it's a really cool tool to use with Druid :)

I noticed that when Tableau (9.3.1) is used with MySQL, it generates queries of the following form for filtering:
WHERE (LOCATE('xyz', column)) > 0)
which are apparently not supported in PlyQL:
ERROR 1064 (_____): SQL parse error: Expected "*", "+", "-", ".", "/", "AND", "BETWEEN", "CONTAINS", "IN", "IS", "LIKE", "NOT", "OR", "REGEXP", ) or Comparison but "(" ##found.

It would be nice as a feature to support such queries as it would enable filtering with Tableau.

HAVING statement with Upper AND Lower bounds Is treated as an OR

In the following query:

SELECT max(value) AS MAX, 
(SELECT TIME_BUCKET(__time, $interval, 'Etc/UTC') as 'Time', sum(value) AS 'Value' GROUP BY 1)
 AS TIME_RANGE FROM data WHERE '2016-04-25T15:21:00.000Z' 
<= __time AND __time < '2016-04-25T21:20:49.031Z' 
having MAX >= $LOW AND MAX <= $HIGH ORDER BY MAX DESC LIMIT 25

The AND syntax is treated as an or rather than an AND. If I take off either side of the bound I get the expected result, but with both of the instances I get all results.

unsupported part in timeBucket expression error when Order By included

I've run into an issue when where arbitrary duration in timeBucket will not work when ORDER BY is included.

SELECT sum(value), pop, description FROM datasource WHERE description CONTAINS 'stuff' AND pop REGEXP '(abc01)' AND '2016-01-29T18:00:00.000Z' <= __time AND __time < '2016-01-29T19:00:00.000Z' GROUP BY TIME_BUCKET(__time, PT10M, 'Etc/UTC'), pop, description returns results along with any other arbitrary ISO8601 duration (PT3M, PT10H, etc.)

SELECT sum(value), pop, description FROM datasource WHERE description CONTAINS 'stuff' AND pop REGEXP '(abc01)' AND '2016-01-29T18:00:00.000Z' <= __time AND __time < '2016-01-29T19:00:00.000Z' GROUP BY TIME_BUCKET(__time, PT10M, 'Etc/UTC'), pop, description ORDER BY pop DESC does not work and returns There was an error getting the data: unsupported part in timeBucket expression PT10M

Durations listed here https://github.com/implydata/plywood/blob/79ac318e96611d760d284e9e0ef3e58a930dfa6e/src/external/druidExternal.ts#L40 work though.

Is there a reason durations are checked with ORDER BY and not other queries? I'd prefer to be able to use ORDER BY and send arbitrary durations.

Thanks!

long multidimension groupBy response time

Hello,

We are trying to execute the following query as a multidimensional groupBy, which seems to result in a long amount of processing time on PlyQL/Plywood end vs the Druid end for approximately 70,000 results from Druid. Druid returns in about 10 seconds, but PlyQL/Plywood seems to be processing for 20-30 seconds.

We see good speeds from single dimension groupBys, but I was wondering if there might be an issue that is causing the results to take longer for this particular multidimension group by query.

Thanks!

PlyQL Query:
SELECT sum(value), pop, interface, hostname, description FROM datasource WHERE pop REGEXP 'abc01' AND description REGEXP '(a|b|c):(1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16):' AND measurement = 'input' AND '2016-03-12T20:16:09.956Z' <= __time AND __time < '2016-03-14T20:16:09.958Z' GROUP BY TIME_BUCKET(__time, PT1M, 'Etc/UTC'), pop, interface, hostname, description ORDER BY interface ASC

Parsed query as the following plywood expression (as JSON):
{
"op": "chain",
"expression": {
"op": "ref",
"name": "datasource"
},
"actions": [
{
"action": "filter",
"expression": {
"op": "chain",
"expression": {
"op": "ref",
"name": "pop"
},
"actions": [
{
"action": "match",
"regexp": "abc01"
},
{
"action": "and",
"expression": {
"op": "chain",
"expression": {
"op": "ref",
"name": "description"
},
"action": {
"action": "match",
"regexp": "(a|b|c):(1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16):"
}
}
},
{
"action": "and",
"expression": {
"op": "chain",
"expression": {
"op": "ref",
"name": "measurement"
},
"action": {
"action": "is",
"expression": {
"op": "literal",
"value": "input"
}
}
}
},
{
"action": "and",
"expression": {
"op": "chain",
"expression": {
"op": "literal",
"value": "2016-03-12T20:16:09.956Z"
},
"action": {
"action": "lessThanOrEqual",
"expression": {
"op": "ref",
"name": "__time"
}
}
}
},
{
"action": "and",
"expression": {
"op": "chain",
"expression": {
"op": "ref",
"name": "__time"
},
"action": {
"action": "lessThan",
"expression": {
"op": "literal",
"value": "2016-03-14T20:16:09.958Z"
}
}
}
}
]
}
},
{
"action": "split",
"splits": {
"description": {
"op": "ref",
"name": "description"
},
"hostname": {
"op": "ref",
"name": "hostname"
},
"interface": {
"op": "ref",
"name": "interface"
},
"pop": {
"op": "ref",
"name": "pop"
},
"split0": {
"op": "chain",
"expression": {
"op": "ref",
"name": "__time"
},
"action": {
"action": "timeBucket",
"duration": "PT1M",
"timezone": "Etc/UTC"
}
}
},
"dataName": "data"
},
{
"action": "apply",
"expression": {
"op": "chain",
"expression": {
"op": "ref",
"name": "data"
},
"action": {
"action": "sum",
"expression": {
"op": "ref",
"name": "value"
}
}
},
"name": "sum_value"
},
{
"action": "sort",
"expression": {
"op": "ref",
"name": "interface"
},
"direction": "ascending"
}
]

}

vvvvvvvvvvvvvvvvvvvvvvvvvv
Sending query 1:
{
"queryType": "segmentMetadata",
"dataSource": "datasource",
"merge": true,
"analysisTypes": [
"aggregators"
],
"lenientAggregatorMerge": true
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Got error in query 1: Instantiation of [simple type, class io.druid.query.metadata.metadata.SegmentMetadataQuery] value failed: querySegmentSpec can't be null (in 1073ms)
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Sending query 2:
{
"queryType": "introspect",
"dataSource": "datasource"
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Got result from query 2: (in 18ms)
{
"dimensions": [
"description",
"hostname",
"pop",
"interface",
"measurement",
],
"metrics": [
"value"
]
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Sending query 3:
{
"queryType": "groupBy",
"dataSource": "datasource",
"intervals": [
"2016-03-12T20:16:09.956/2016-03-14T20:16:09.958"
],
"granularity": "all",
"context": {
"timeout": 60000
},
"filter": {
"type": "and",
"fields": [
{
"type": "regex",
"dimension": "pop",
"pattern": "abc01"
},
{
"type": "regex",
"dimension": "description",
"pattern": "(a|b|c):(1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16):"
},
{
"type": "selector",
"dimension": "measurement",
"value": "input"
}
]
},
"aggregations": [
{
"name": "sum_value",
"type": "doubleSum",
"fieldName": "value"
}
],
"dimensions": [
{
"type": "default",
"dimension": "description",
"outputName": "description"
},
{
"type": "default",
"dimension": "hostname",
"outputName": "hostname"
},
{
"type": "default",
"dimension": "interface",
"outputName": "interface"
},
{
"type": "default",
"dimension": "pop",
"outputName": "pop"
},
{
"type": "extraction",
"dimension": "__time",
"outputName": "split0",
"extractionFn": {
"type": "timeFormat",
"format": "yyyy-MM-dd'T'HH:mm'Z",
"timeZone": "Etc/UTC",
"locale": "en-US"
}
}
],
"limitSpec": {
"type": "default",
"limit": 500000,
"columns": [
{
"dimension": "interface",
"direction": "ascending"
}
]
}
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Got result from query 3: (in 9390ms)
About 70,000 results

Got result from from PlyQL about 20-30 seconds later

Mysql gateway - connection getting closed

Hi,

I am facing two issues with the Mysql gateway :

The following query was fired twice from the mysql console :

mysql> select count(*) from supply where "2016-06-23 01:00:00" <= __time AND __time < "2016-06-23 02:00:00" group by countryId limit 10;
Query OK, 0 rows affected, 25971 warnings (0.10 sec)

mysql> select count() from supply where "2016-06-23 01:00:00" <= __time AND __time < "2016-06-23 02:00:00" group by countryId limit 10;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 21
Current database: *
* NONE ***

Query OK, 0 rows affected, 25971 warnings (0.40 sec)


Issue 1. Notice that the first time , query runs fine but the second time, it says the Mysql server has gone away.
Following are the log lines on the Mysql server of Plyql. Both times I get the same error logged
Got SQL: select count(*) from supply where "2016-06-23 01:00:00" <= __time AND __time < "2016-06-23 02:00:00" group by countryId limit 10
Got SQL: SELECT DATABASE()
Error: Connection lost: The server closed the connection.

Issue 2 : The same query I tried to run directly using PLYQL as follows :
rajitha.r:~$ plyql -h mybroker:8082 -q 'select count(*) from supply where "2016-06-23 01:00:00" <= __time AND __time < "2016-06-23 02:00:00" group by countryId limit 10'
And this runs absolutely fine and gives the correct output. This is a bit strange as the Mysql gateway should also have returned the same output but doesn't.

I am using a latest version of Plyql. Can someone please help me on this.

Plyql gateway shutting down due to memory issues

Hi,

I brought up the mysql gateway pointing to a running druid instance. After firing few queries, the gateway crashes with following error :

1: /* anonymous */(aka /* anonymous */) [/home/rajitha.r/local/node/lib/node_modules/plyql/build/plyql-mysql-gateway.js:~98] [pc=0x6d482f081f9] (this=0x1d9f1cd04381 <undefined>,row=0x25c40070c599 <an Object with map 0x23a11e18261>)
2: arguments adaptor frame: 3->1
3: map [native array.js:~994] [pc=0x6d482ed6e82] (this=0x75b9cf56fb9 <JS 

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory


I googled a bit about this error and its advised to increase the parameter "--max-old-space-size" in node. I am not very sure about setting this variable. Can someone please help me fix this issue.

Thanks in advance,
Rajitha

MySQL gateway with Tableau (9.3) issue with TOPN queries

Hi

Firstly, awesome work on providing this mysql gateway query pattern, this is very exciting.

Running into an issue when applying a Top N filter in tableau

Thanks

If you believe this query should work please create an issue on PlyQL and include this section
Issue url: https://github.com/implydata/plyql/issues
Message:
SQL parse error: Expected ".", ";", "GROUP", "HAVING", "LIMIT", "ORDER", "WHERE" or end of input but "I" found. on 'SELECT `DRUID`.`BRANCH` AS `BRANCH`
FROM `DRUID`
  INNER JOIN (
  SELECT `DRUID`.`BRANCH` AS `BRANCH`,
    COUNT(`DRUID`.`BRANCH`) AS `$__alias__0`
  FROM `DRUID`
  GROUP BY 1
  ORDER BY `$__alias__0` DESC
  LIMIT 10
) `t0` ON (`DRUID`.`BRANCH` <=> `t0`.`BRANCH`)
WHERE ((`DRUID`.`__time` >= TIMESTAMP('2016-04-24 00:00:00')) AND (`DRUID`.`__time` < TIMESTAMP('2016-05-06 00:00:00')))
GROUP BY 1'
Stack:
Error: SQL parse error: Expected ".", ";", "GROUP", "HAVING", "LIMIT", "ORDER", "WHERE" or end of input but "I" found. on 'SELECT `DRUID`.`BRANCH` AS `BRANCH`
FROM `DRUID`
  INNER JOIN (
  SELECT `DRUID`.`BRANCH` AS `BRANCH`,
    COUNT(`DRUID`.`BRANCH`) AS `$__alias__0`
  FROM `DRUID`
  GROUP BY 1
  ORDER BY `$__alias__0` DESC
  LIMIT 10
) `t0` ON (`DRUID`.`BRANCH` <=> `t0`.`BRANCH`)
WHERE ((`DRUID`.`__time` >= TIMESTAMP('2016-04-24 00:00:00')) AND (`DRUID`.`__time` < TIMESTAMP('2016-05-06 00:00:00')))
GROUP BY 1'
    at Function.Expression.parseSQL (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:5482:23)
    at /usr/local/lib/node_modules/plyql/build/plyql-mysql-gateway.js:43:53
    at Promise.apply (/usr/local/lib/node_modules/plyql/node_modules/q/q.js:1165:26)
    at Promise.promise.promiseDispatch (/usr/local/lib/node_modules/plyql/node_modules/q/q.js:788:41)
    at /usr/local/lib/node_modules/plyql/node_modules/q/q.js:1391:14
    at runSingle (/usr/local/lib/node_modules/plyql/node_modules/q/q.js:137:13)
    at flush (/usr/local/lib/node_modules/plyql/node_modules/q/q.js:125:13)
    at nextTickCallbackWith0Args (node.js:420:9)
    at process._tickCallback (node.js:349:13)

Parse reduced-precision ISO8601 dates and times

ISO8601 allows reduced precision dates and times, which Druid does understand and are also sometimes useful for typing out a query with fewer keystrokes. It seems that plyql supports some of these but not all of them. Some examples,

2016
2016-01
2016-01-01T00
2016-01-01T00:00

[these all resolve to the same instant: 2016-01-01T00:00:00.000Z]

Plyql : Query case sensitive

Hi @vogievetsky ,

Noticed that fields and table names in plyql query needs to be case sensitive else the query will fail like this :
mysql> select page , sum( count ) from wikiticker where '2015-09-12T00:00:00' <= __time and __time < '2015-09-13T00:00:00' group by page limit 1;
| page | sum( count ) |
+-----------------+--------------+
| (1510) Charlois | 1 |

mysql> select PAGE , sum( count ) from wikiticker where '2015-09-12T00:00:00' <= __time and __time < '2015-09-13T00:00:00' group by page limit 1;

ERROR 1337 (_____):something broke

As you can see, PAGE in upper case doesn't work as the schema is in lower case on Druid. Can this be looked into. I have seen that most of the sql languages are case insensitive which is not the case here.

MySQL gateway not allowing to SHOW FULL TABLES

Hi, one the libraries I use tried to issue a following query:

SHOW FULL TABLES FROM plyql1 LIKE '%'.

This clearly is not working for two reasons:

  1. SHOW FULL TABLES is currently unsupported.
  2. druid tables are not really in plyql1 database (it is acting weird), which can be verified by: the following:
mysql> use plyql1;
Database changed
mysql> select DATABASE();
+------------+
| DATABASE() |
+------------+
| plyql1     |
+------------+

mysql> show tables;
+--------------------------+
| Tables_in_database       |
+--------------------------+
| COLUMNS                  |
| SCHEMATA                 |
| TABLES                   |
| cc-causecode-dev         |
| new-gi-test              |
| nicer-gi-no-imsi-staging |
| nicer-gi-staging         |
| nicer-mme-staging        |
| wikipedia                |
+--------------------------+


mysql> describe plyql1;
Empty set (0.00 sec)

plyql not supporting FROM_UNIXTIME

Hi there!

I'm trying to do a WHERE __time BETWEEN FROM_UNIXTIME() AND FROM_UNIXTIME() query, but plyql is complaining.

Are you planning to add it? Is there a workaround to work with UNIX timestamps?

Thanks!

MySQL Gateway very slow

This may be user error, but I am finding the performance exceedingly slow (orders of magnitude slower than talking to Druid directly).

Running the following SQL:
select __time, sum(count) from business_events_test where event_type = 'walkin' group by __time

Took 1 minute 55 seconds to run.

Running:
select __time, count from business_events_test where event_type = 'walkin';

Never returned (at least not within the 30 minutes that I let it run).

Running the following Druid query took 3s (note I only have a week's worth of data so the interval doesn't make any difference):

{
"dimensions": [],
"aggregations": [
{
"type": "count",
"name": "count"
}
],
"filter": {
"type": "selector",
"dimension": "event_type",
"value": "walkin"
},
"intervals": "2016-06-27T21:29:15+00:00/2016-07-04T21:29:15+00:00",
"limitSpec": {
"limit": 50000,
"type": "default",
"columns": [
{
"direction": "descending",
"dimension": "count"
}
]
},
"granularity": {
"duration": 60000.0,
"type": "duration"
},
"postAggregations": [],
"queryType": "groupBy",
"dataSource": "business_events_test"
}

Edit:
I discovered that writing the query a different way results in fast results:

select count() from business_events_test where event_type = 'walkin' group by TIME_BUCKET(__time, PT1M, "Etc/UTC")

Returns in under 1s. I don't understand how this is different than the queries above. The data was ingested with a query granularity of 1 minute so the time bucket function shouldn't actually be doing anything.

The issue I have with this query is that when using TIME_BUCKET the date always appears as the first column in the results. I tried re-writing as:

select count(), TIME_BUCKET(__time, PT1M, "Etc/UTC") from business_events_test where event_type = 'walkin' group by 2

But the order of the columns does not change.

Query for single millisecond does not work

I tried this query: SELECT page, SUM(value_sum) AS Value FROM metrics WHERE '2016-01-08T20:53:37.000Z' <= time AND time <= '2016-01-08T20:53:37.000Z' GROUP BY page ORDER BY Value DESC LIMIT 5

Verbose mode shows it generates this query:

{
  "queryType": "topN",
  "dataSource": "metrics",
  "granularity": "all",
  "context": {
    "timeout": 60000
  },
  "aggregations": [
    {
      "name": "Value",
      "type": "doubleSum",
      "fieldName": "value_sum"
    }
  ],
  "dimension": {
    "type": "default",
    "dimension": "page",
    "outputName": "page"
  },
  "metric": "Value",
  "threshold": 5
}

Which is not right, there is no "intervals" set. It should be set to 2016-01-08T20:53:37.000Z/2016-01-08T20:53:37.001Z.

PlyQL through JDBC doesn't return updated data

If I'm not doing anything incorrectly, here's what happens:

I'm ingesting 80 million rows to Druid through Kafka. In the meantime, I want to start querying them.

Let's just check how many rows are already indexed.

plyql -h IPADDRESS -q 'SELECT count(*) as cnt FROM content_log'

Result: (correct).

┌─────────┐
│ cnt     │
├─────────┤
│ 3706116 │
└─────────┘

Previously I started the PlyQL MySQL gateway.

plyql -h your.druid.broker:8082 -i P2Y --experimental-mysql-gateway 3307

And if I run the same query in a Java app with JDBC, the result is: 999756. And it hasn't change since the gatewat start.

Do you know this problem?

Write verbose output to stderr

This makes it possible to redirect the query results to a file while still seeing verbose output on the console, by doing:

plyql -h localhost:8082 -q 'select mystuff from mytable' -v > results

Joins query

Hi Team,

We need path funnels via druid so we are looking for a option to run joins query on druid data. I checked that Join support is still in RoadMap. Could you please let me know Is it available?
Looking forward to hear from you.

Thanks,
Lovenish

Cannot read property 'isOp' of undefined when REGEXP used

I encountered an error when using REGEXP in a query for certain fields. If I am reading the verbose output right it looks like Druid is returning results correctly based on the regular expression and the field does not look to be null.

For query SELECT description FROM processor WHERE pop LIKE 'abc01' AND description REGEXP 'ISP' AND '2016-01-20T10:02:02.107Z' <= __time AND __time < '2016-01-23T16:02:02.107Z' GROUP BY description, pop"

I get the error There was an error getting the data: Cannot read property 'isOp' of undefined

vagrant@ply:/opt/ply$ plyql -h DRUIDBROKER:8082 -q "SELECT description FROM datasource WHERE pop LIKE 'abc01' AND description REGEXP 'ISP' AND '2016-01-20T10:02:02.107Z' <= __time AND __time < '2016-01-23T16:02:02.107Z' GROUP BY description, pop" -a select -a eternity -v
Parsed query as the following plywood expression (as JSON):
{
  "op": "chain",
  "expression": {
    "op": "ref",
    "name": "datasource"
  },
  "actions": [
    {
      "action": "filter",
      "expression": {
        "op": "chain",
        "expression": {
          "op": "ref",
          "name": "pop"
        },
        "actions": [
          {
            "action": "match",
            "regexp": "^abc01$"
          },
          {
            "action": "and",
            "expression": {
              "op": "chain",
              "expression": {
                "op": "ref",
                "name": "description"
              },
              "action": {
                "action": "match",
                "regexp": "ISP"
              }
            }
          },
          {
            "action": "and",
            "expression": {
              "op": "chain",
              "expression": {
                "op": "literal",
                "value": "2016-01-20T10:02:02.107Z"
              },
              "action": {
                "action": "lessThanOrEqual",
                "expression": {
                  "op": "ref",
                  "name": "__time"
                }
              }
            }
          },
          {
            "action": "and",
            "expression": {
              "op": "chain",
              "expression": {
                "op": "ref",
                "name": "__time"
              },
              "action": {
                "action": "lessThan",
                "expression": {
                  "op": "literal",
                  "value": "2016-01-23T16:02:02.107Z"
                }
              }
            }
          }
        ]
      }
    },
    {
      "action": "split",
      "splits": {
        "description": {
          "op": "ref",
          "name": "description"
        },
        "split1": {
          "op": "ref",
          "name": "pop"
        }
      },
      "dataName": "data"
    }
  ]
}
---------------------------
vvvvvvvvvvvvvvvvvvvvvvvvvv
Sending query 1:
{
  "queryType": "introspect",
  "dataSource": "datasource"
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Got result from query 1: (in 37ms)
{
  "dimensions": [
    "description",
    "pop"
  ],
  "metrics": [
    "value"
  ]
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Sending query 2:
{
  "queryType": "groupBy",
  "dataSource": "datasource",
  "intervals": [
    "2016-01-20T10:02:02.107/2016-01-23T16:02:02.107"
  ],
  "granularity": "all",
  "context": {
    "timeout": 60000
  },
  "filter": {
    "type": "and",
    "fields": [
      {
        "type": "regex",
        "dimension": "pop",
        "pattern": "^abc01$"
      },
      {
        "type": "regex",
        "dimension": "description",
        "pattern": "ISP"
      }
    ]
  },
  "aggregations": [
    {
      "name": "!DUMMY",
      "type": "count"
    }
  ],
  "dimensions": [
    {
      "type": "default",
      "dimension": "description",
      "outputName": "description"
    },
    {
      "type": "default",
      "dimension": "pop",
      "outputName": "split1"
    }
  ],
  "limitSpec": {
    "type": "default",
    "limit": 500000,
    "columns": [
      "description"
    ]
  }
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Got result from query 2: (in 122ms)
[
  {
    "version": "v1",
    "timestamp": "2016-01-20T10:02:02.107Z",
    "event": {
      "!DUMMY": 9279,
      "split1": "abc01",
      "description": "ISP:STUFF"
    }
  }
]
^^^^^^^^^^^^^^^^^^^^^^^^^^
There was an error getting the data: Cannot read property 'isOp' of undefined

When I use CONTAINS instead it works. REGEXP used with the field 'pop' also works.

vagrant@ply:/opt/ply$ plyql -h DRUIDBROKER:8082 -q "SELECT description FROM datasource WHERE pop LIKE 'abc01' AND description CONTAINS 'ISP' AND '2016-01-20T10:02:02.107Z' <= __time AND __time < '2016-01-23T16:02:02.107Z' GROUP BY description, pop" -a select -a eternity -v
Parsed query as the following plywood expression (as JSON):
{
  "op": "chain",
  "expression": {
    "op": "ref",
    "name": "datasource"
  },
  "actions": [
    {
      "action": "filter",
      "expression": {
        "op": "chain",
        "expression": {
          "op": "ref",
          "name": "pop"
        },
        "actions": [
          {
            "action": "match",
            "regexp": "^abc01$"
          },
          {
            "action": "and",
            "expression": {
              "op": "chain",
              "expression": {
                "op": "ref",
                "name": "description"
              },
              "action": {
                "action": "contains",
                "expression": {
                  "op": "literal",
                  "value": "ISP"
                },
                "compare": "ignoreCase"
              }
            }
          },
          {
            "action": "and",
            "expression": {
              "op": "chain",
              "expression": {
                "op": "literal",
                "value": "2016-01-20T10:02:02.107Z"
              },
              "action": {
                "action": "lessThanOrEqual",
                "expression": {
                  "op": "ref",
                  "name": "__time"
                }
              }
            }
          },
          {
            "action": "and",
            "expression": {
              "op": "chain",
              "expression": {
                "op": "ref",
                "name": "__time"
              },
              "action": {
                "action": "lessThan",
                "expression": {
                  "op": "literal",
                  "value": "2016-01-23T16:02:02.107Z"
                }
              }
            }
          }
        ]
      }
    },
    {
      "action": "split",
      "splits": {
        "description": {
          "op": "ref",
          "name": "description"
        },
        "split1": {
          "op": "ref",
          "name": "pop"
        }
      },
      "dataName": "data"
    }
  ]
}
---------------------------
vvvvvvvvvvvvvvvvvvvvvvvvvv
Sending query 1:
{
  "queryType": "introspect",
  "dataSource": "datasource"
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Got result from query 1: (in 37ms)
{
  "dimensions": [
    "description",
    "pop"
  ],
  "metrics": [
    "value"
  ]
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Sending query 2:
{
  "queryType": "groupBy",
  "dataSource": "datasource",
  "intervals": [
    "2016-01-20T10:02:02.107/2016-01-23T16:02:02.107"
  ],
  "granularity": "all",
  "context": {
    "timeout": 60000
  },
  "filter": {
    "type": "and",
    "fields": [
      {
        "type": "regex",
        "dimension": "pop",
        "pattern": "^abc01$"
      },
      {
        "type": "search",
        "dimension": "description",
        "query": {
          "type": "fragment",
          "values": [
            "ISP"
          ]
        }
      }
    ]
  },
  "aggregations": [
    {
      "name": "!DUMMY",
      "type": "count"
    }
  ],
  "dimensions": [
    {
      "type": "default",
      "dimension": "description",
      "outputName": "description"
    },
    {
      "type": "default",
      "dimension": "pop",
      "outputName": "split1"
    }
  ],
  "limitSpec": {
    "type": "default",
    "limit": 500000,
    "columns": [
      "description"
    ]
  }
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Got result from query 2: (in 144ms)
[
  {
    "version": "v1",
    "timestamp": "2016-01-20T10:02:02.107Z",
    "event": {
      "!DUMMY": 9279,
      "split1": "abc01",
      "description": "ISP:STUFF"
    }
  }
]
^^^^^^^^^^^^^^^^^^^^^^^^^^
[
  {
    "split1": "abc01",
    "description": "ISP:STUFF"
  }
]

Thanks!

select timezone

hi,
when i use http query the result :

wikipedia-query.json
{
"queryType" : "groupBy",
"dataSource" : "wikipedia",
"intervals": [ "2016-07-29T00:00:00.000/2016-07-30T00:00:00.000" ],
"granularity" : {"type": "period", "period": "PT1H", "timeZone": "Asia/Shanghai"},
"aggregations" : [
{
"type" : "longSum",
"name" : "delta",
"fieldName" : "delta"
}
]
}

curl -X POST 'http://localhost:8082/druid/v2/?pretty' -H 'Content-Type:application/json' -d @wikipedia-query.json
[ {
"version" : "v1",
"timestamp" : "2016-07-29T15:00:00.000+08:00",
"event" : {
"delta" : -499
}
}, {
"version" : "v1",
"timestamp" : "2016-07-29T16:00:00.000+08:00",
"event" : {
"delta" : 187
}
} ]

and use plyql query
bin/plyql -h localhost -q "select sum(delta) from wikipedia group by TIME_BUCKET(__time, PT1H, 'Asia/Shanghai')"

and the result

[2016-07-29T07:00:00.000Z,2016-07-29T08:00:00.000Z) │ -499 │
│ [2016-07-29T08:00:00.000Z,2016-07-29T09:00:00.000Z) │ 187 |

delta column is ok , but the timestamp is different
why ? something wrong with Timezone ?

MySQL Control Flow Functions unsupported

Hi there,

I tried pointing Tableau at the MySQL gateway (harsh, I know) and unsurprisingly it failed. It did manage to handshake and identify the databases, but then it failed when trying to get the schema information. There are at least 2 problems and I'm logging them separately. This is the second one: Tableau is using some funky SQL to try an inspect the database and it is not parsing.

[2] Got SQL: SELECT A.REFERENCED_TABLE_SCHEMA AS PKTABLE_CAT,NULL AS PKTABLE_SCHEM,A.REFERENCED_TABLE_NAME AS PKTABLE_NAME,A.REFERENCED_COLUMN_NAME AS PKCOLUMN_NAME,A.TABLE_SCHEMA AS FKTABLE_CAT, NULL AS FKTABLE_SCHEM,A.TABLE_NAME AS FKTABLE_NAME,A.COLUMN_NAME AS FKCOLUMN_NAME,A.ORDINAL_POSITION AS KEY_SEQ,CASE WHEN R.UPDATE_RULE = 'CASCADE' THEN 0 WHEN R.UPDATE_RULE = 'SET NULL' THEN 2 WHEN R.UPDATE_RULE = 'SET DEFAULT' THEN 4 WHEN R.UPDATE_RULE = 'SET RESTRICT' THEN 1 WHEN R.UPDATE_RULE = 'SET NO ACTION' THEN 3 ELSE 3 END AS UPDATE_RULE,CASE WHEN R.DELETE_RULE = 'CASCADE' THEN 0 WHEN R.DELETE_RULE = 'SET NULL' THEN 2 WHEN R.DELETE_RULE = 'SET DEFAULT' THEN 4 WHEN R.DELETE_RULE = 'SET RESTRICT' THEN 1 WHEN R.DELETE_RULE = 'SET NO ACTION' THEN 3 ELSE 3 END AS DELETE_RULE,A.CONSTRAINT_NAME AS FK_NAME,'PRIMARY' AS PK_NAME,7 AS DEFERRABILITY FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE A JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE D ON (D.TABLE_SCHEMA=A.REFERENCED_TABLE_SCHEMA AND D.TABLE_NAME=A.REFERENCED_TABLE_NAME AND D.COLUMN_NAME=A.REFERENCED_COLUMN_NAME) JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS R ON (R.CONSTRAINT_NAME = A.CONSTRAINT_NAME AND R.TABLE_NAME = A.TABLE_NAME AND R.CONSTRAINT_SCHEMA = A.TABLE_SCHEMA) WHERE D.CONSTRAINT_NAME='PRIMARY' AND A.TABLE_SCHEMA = 'plyql1' AND A.TABLE_NAME = 'client_logs' ORDER BY FKTABLE_CAT, FKTABLE_NAME, KEY_SEQ, PKTABLE_NAME

Failed to resolve query with Plywood.
Query:
SELECT A.REFERENCED_TABLE_SCHEMA AS PKTABLE_CAT,NULL AS PKTABLE_SCHEM,A.REFERENCED_TABLE_NAME AS PKTABLE_NAME,A.REFERENCED_COLUMN_NAME AS PKCOLUMN_NAME,A.TABLE_SCHEMA AS FKTABLE_CAT, NULL AS FKTABLE_SCHEM,A.TABLE_NAME AS FKTABLE_NAME,A.COLUMN_NAME AS FKCOLUMN_NAME,A.ORDINAL_POSITION AS KEY_SEQ,CASE WHEN R.UPDATE_RULE = 'CASCADE' THEN 0 WHEN R.UPDATE_RULE = 'SET NULL' THEN 2 WHEN R.UPDATE_RULE = 'SET DEFAULT' THEN 4 WHEN R.UPDATE_RULE = 'SET RESTRICT' THEN 1 WHEN R.UPDATE_RULE = 'SET NO ACTION' THEN 3 ELSE 3 END AS UPDATE_RULE,CASE WHEN R.DELETE_RULE = 'CASCADE' THEN 0 WHEN R.DELETE_RULE = 'SET NULL' THEN 2 WHEN R.DELETE_RULE = 'SET DEFAULT' THEN 4 WHEN R.DELETE_RULE = 'SET RESTRICT' THEN 1 WHEN R.DELETE_RULE = 'SET NO ACTION' THEN 3 ELSE 3 END AS DELETE_RULE,A.CONSTRAINT_NAME AS FK_NAME,'PRIMARY' AS PK_NAME,7 AS DEFERRABILITY FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE A JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE D ON (D.TABLE_SCHEMA=A.REFERENCED_TABLE_SCHEMA AND D.TABLE_NAME=A.REFERENCED_TABLE_NAME AND D.COLUMN_NAME=A.REFERENCED_COLUMN_NAME) JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS R ON (R.CONSTRAINT_NAME = A.CONSTRAINT_NAME AND R.TABLE_NAME = A.TABLE_NAME AND R.CONSTRAINT_SCHEMA = A.TABLE_SCHEMA) WHERE D.CONSTRAINT_NAME='PRIMARY' AND A.TABLE_SCHEMA = 'plyql1' AND A.TABLE_NAME = 'client_logs' ORDER BY FKTABLE_CAT, FKTABLE_NAME, KEY_SEQ, PKTABLE_NAME
If you believe this query should work please create an issue on PlyQL and include this section
Issue url: https://github.com/implydata/plyql/issues

Mysql gateway closing connection issue

Hi,

Trying to access Mysql gateway using jdbc driver from java gives error saying connection closed at the end.

New connection 7
[7] Got SQL: SELECT @@session.autocommit
[7] Got SQL: SET NAMES utf8mb4
[7] Got SQL: SET character_set_results = NULL
[7] Got SQL: SET autocommit=1
[7] Got SQL: select page , sum( count ) as cnt from wikiticker where (( __time >= '2015-03-12T00:00:00' ) and ( __time < '2016-09-13T00:00:00' )) group by page order by cnt asc limit 5;
[7] Error: Connection lost: The server closed the connection.

The java client code I used is as follows :
try {
Class.forName("com.mysql.jdbc.Driver");
con = DriverManager.getConnection("jdbc:mysql://127.0.0.1:3307/plyql1");
stmt = con.createStatement();
rs = stmt.executeQuery(
" SELECT page, SUM(count) AS Edits FROM wikiticker WHERE '2015-09-12T00:00:00' <= __time AND __time < '2015-09-13T00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 5;"
);

  while (rs.next()) {
    String page = rs.getString("page");
    long count = rs.getLong("Edits");
    System.out.println(String.format("page[%s] count[%d]", page, count));
  }
} catch (SQLException | ClassNotFoundException s) {
  s.printStackTrace();
}

Invalid query crashes plyql server + possible caching issues

Hey,

An invalid query was issued to plyql server from imply data distribution 1.2.1 :

plyql -c 2 -h broker:8082 -i P2Y --json-server 8099

which crashed it :

/usr/local/lib/node_modules/plyql/node_modules/q/q.js:155
                throw e;
                      ^
Error: can not serialize an approximate unique value
    at UniqueAttributeInfo.serialize (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:715:19)
    at DruidExternal.makeSelectorFilter (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3779:38)
    at DruidExternal.timelessFilterToDruid (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3833:37)
    at DruidExternal.timelessFilterToDruid (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3823:37)
    at DruidExternal.makeNativeAggregateFilter (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4534:30)
    at DruidExternal.applyToAggregation (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4686:36)
    at DruidExternal.getAggregationsAndPostAggregations (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4701:22)
    at DruidExternal.getQueryAndPostProcess (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4978:64)
    at DruidExternal.External.queryValue (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3170:48)
    at ExternalExpression._computeResolved (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:6788:29)

It was just count query like this :

curl -X POST 'http://host/plyql' -H 'content-type: application/json' -d '{ "sql": "SELECT COUNT(bar) FROM baz" }'

bar is hll field

Isn't this a really critical bug? Server being crashed by such a common query ?

Also what is strange about it is that I was investigating why all of a sudden plyql-server returns empty results which should not be empty. And right after plyql-server docker container restarted it was all good again.

Columns order is wrong on the resultset

Hi,
When using distinct, the columns order is wrong:

MySQL [(none)]> select week_date, sum(c1) as rch, count(distinct uuid) as rc from tab where country='CA' group by week_date;
+------------+-------------------+---------+
| week_date  | rc                | rch     |
+------------+-------------------+---------+
| 2016-05-16 | 586419.3622051235 | 1540934 |
| 2016-05-23 |  803478.913143905 | 2174018 |
| 2016-05-30 | 613292.7918142952 | 2048509 |
+------------+-------------------+---------+

MySQL [(none)]> select week_date, count(distinct uuid) as rc, sum(c1) as rch
from tab where country='CA' group by week_date;
+------------+-------------------+---------+
| week_date  | rc                | rch     |
+------------+-------------------+---------+
| 2016-05-16 | 586419.3622051235 | 1540934 |
| 2016-05-23 |  803478.913143905 | 2174018 |
| 2016-05-30 | 613292.7918142952 | 2048509 |
+------------+-------------------+---------+

Haven't tested retrieving the data by column using jdbc, but my guess is that would work.
It's just confusing when running on mysql to see the data in the columns in the wrong order as selected.

moment-timezone module not found

I installed plyql via:

npm install -g plyql

But when I try to run it I get an error indicating that the moment-timezone module could not be found. I had to install 0.8.15 to get around the problem.

querySegmentSpec can't be null error for segmentMetadata query

When running some queries I noticed an error that always seems to occur for the first query segmentMetadata is sent: Got error in query 1: Instantiation of [simple type, class io.druid.query.metadata.metadata.SegmentMetadataQuery] value failed: querySegmentSpec can't be null (in 1092ms).

This seems to happen for every query I have tried (groupBy, timeseries, etc.) but the rest of the queries in the chain finish and I get output but I was curious if the failure of the first query is expected or could be causing some issues with the PlyQL/Plywood processing.

Also, I was under the impression that PlyQL/Plywood would try to process groupBy queries in a more efficient manner if possible by using topNs but in my experience so far I always see GROUP BY queries stay as groupBy queries. Could the failure of the segmentMetadata be impacting that or will GROUP BY PlyQL queries always be processed as Druid groupBy?

I am running Druid 0.8.1 if that helps.

Query:

SELECT sum(value), pop, interface, description FROM datasource WHERE description REGEXP 'abc' AND pop REGEXP '(abc01)' AND '2016-02-09T17:44:20.297Z' <= __time AND __time < '2016-02-09T23:04:20.297Z' GROUP BY TIME_BUCKET(__time, PT1M, 'Etc/UTC'), pop, interface, description ORDER BY pop DESC

Verbose output:

{
  "op": "chain",
  "expression": {
    "op": "ref",
    "name": "datasource"
  },
  "actions": [
    {
      "action": "filter",
      "expression": {
        "op": "chain",
        "expression": {
          "op": "ref",
          "name": "description"
        },
        "actions": [
          {
            "action": "match",
            "regexp": "abc"
          },
          {
            "action": "and",
            "expression": {
              "op": "chain",
              "expression": {
                "op": "ref",
                "name": "pop"
              },
              "action": {
                "action": "match",
                "regexp": "(abc01)"
              }
            }
          },
          {
            "action": "and",
            "expression": {
              "op": "chain",
              "expression": {
                "op": "literal",
                "value": "2016-02-09T22:42:18.794Z"
              },
              "action": {
                "action": "lessThanOrEqual",
                "expression": {
                  "op": "ref",
                  "name": "__time"
                }
              }
            }
          },
          {
            "action": "and",
            "expression": {
              "op": "chain",
              "expression": {
                "op": "ref",
                "name": "__time"
              },
              "action": {
                "action": "lessThan",
                "expression": {
                  "op": "literal",
                  "value": "2016-02-09T22:47:18.794Z"
                }
              }
            }
          }
        ]
      }
    },
    {
      "action": "split",
      "splits": {
        "description": {
          "op": "ref",
          "name": "description"
        },
        "interface": {
          "op": "ref",
          "name": "interface"
        },
        "pop": {
          "op": "ref",
          "name": "pop"
        },
        "split0": {
          "op": "chain",
          "expression": {
            "op": "ref",
            "name": "__time"
          },
          "action": {
            "action": "timeBucket",
            "duration": "PT1M",
            "timezone": "Etc/UTC"
          }
        }
      },
      "dataName": "data"
    },
    {
      "action": "apply",
      "expression": {
        "op": "chain",
        "expression": {
          "op": "ref",
          "name": "data"
        },
        "action": {
          "action": "sum",
          "expression": {
            "op": "ref",
            "name": "value"
          }
        }
      },
      "name": "sum_value"
    },
    {
      "action": "sort",
      "expression": {
        "op": "ref",
        "name": "pop"
      },
      "direction": "descending"
    }
  ]
}
---------------------------
vvvvvvvvvvvvvvvvvvvvvvvvvv
Sending query 1:
{
  "queryType": "segmentMetadata",
  "dataSource": "datasource",
  "merge": true,
  "analysisTypes": [
    "aggregators"
  ],
  "lenientAggregatorMerge": true
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Got error in query 1: Instantiation of [simple type, class io.druid.query.metadata.metadata.SegmentMetadataQuery] value failed: querySegmentSpec can't be null (in 1092ms)
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Sending query 2:
{
  "queryType": "introspect",
  "dataSource": "datasource"
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Got result from query 2: (in 13ms)
{
  "dimensions": [
    "description",
    "interface",
    "pop"
  ],
  "metrics": [
    "value"
  ]
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Sending query 3:
{
  "queryType": "groupBy",
  "dataSource": "datasource",
  "intervals": [
    "2016-02-09T22:42:18.794/2016-02-09T22:47:18.794"
  ],
  "granularity": "all",
  "context": {
    "timeout": 60000
  },
  "filter": {
    "type": "and",
    "fields": [
      {
        "type": "regex",
        "dimension": "description",
        "pattern": "abc"
      },
      {
        "type": "regex",
        "dimension": "pop",
        "pattern": "(abc01)"
      }
    ]
  },
  "aggregations": [
    {
      "name": "sum_value",
      "type": "doubleSum",
      "fieldName": "value"
    }
  ],
  "dimensions": [
    {
      "type": "default",
      "dimension": "description",
      "outputName": "description"
    },
    {
      "type": "default",
      "dimension": "interface",
      "outputName": "interface"
    },
    {
      "type": "default",
      "dimension": "pop",
      "outputName": "pop"
    },
    {
      "type": "extraction",
      "dimension": "__time",
      "outputName": "split0",
      "extractionFn": {
        "type": "timeFormat",
        "format": "yyyy-MM-dd'T'HH:mm'Z",
        "timeZone": "Etc/UTC",
        "locale": "en-US"
      }
    }
  ],
  "limitSpec": {
    "type": "default",
    "limit": 500000,
    "columns": [
      {
        "dimension": "pop",
        "direction": "descending"
      }
    ]
  }
}
^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvv
Got result from query 3: (in 52ms)
... RESULTS ...

Thanks for your time!

Production readiness of Mysql gateway

Hi ,

I have been working on integrating our Unified reporting system to answer Druid queries using the Mysql gateway of Plyql. However, I would like to understand if this is something I can deploy in Production without any issues w.r.t scalability and other performance related issues which can come up .

Any inputs from the users who are actively using the new gateway will be really helpful.

Thanks,
Rajitha

Mysql gateway - show tables not working

Hi,

Following command doesn't work on the mysql gateway :

[92] Got SQL: SHOW FULL TABLES FROM plyql1 LIKE '%'

Failed to resolve query with Plywood.
Query:
SHOW FULL TABLES FROM `plyql1` LIKE '%'
Message:
SQL parse error: Expected "COLUMNS" but "T" found. on 'SHOW FULL TABLES FROM `plyql1` LIKE '%''
Stack:

Can this be supported anytime soon in the new mysql gateway of Plyql? This would really be helpful.

Regards

Plyql lacks Bound Filters on string Dimensions

We have a use case where we want plyql to transform a greater than AND less than or equal to operation to a bound query against a string type dimension in druid:

Data Example:

srcAddr = aaaa, value = 1, timestamp = 2016-04-09T21:55:45.284Z
srcAddr = ffff, value = 2, timestamp = 2016-04-09T21:55:45.284Z

Query Example:

SELECT COUNT(*) FROM flowdata WHERE '2016-04-09T21:55:45.284Z' <= __time AND __time < '2016-04-19T21:55:45.284Z' AND srcAddr >= 'aaaa' AND srcAddr <= 'ffff'

Specifically we would like the above to turn into the following filter:

{
    "type": "bound",
    "dimension": "srcAddr",
    "lower": "aaaa",
    "upper": "ffff" 
}

Time-greater-than operator pulls in not enough milliseconds

A time filter like '2016-01-08T20:53:37.000Z' < __time AND __time < '2016-01-08T20:53:38.000Z' should resolve to intervals = 2016-01-08T20:53:37.001Z/2016-01-08T20:53:38.000Z but it actually resolves to 2016-01-08T20:53:38/2016-01-08T20:53:38.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.