Giter Club home page Giter Club logo

coral's People

Contributors

770120041 avatar aastha25 avatar aliceyeh12 avatar antumbde avatar autumnust avatar cwsteinbach avatar ebyhr avatar findepi avatar findinpath avatar funcheetah avatar hotsushi avatar huangxiaopingrd avatar jiajunbernoulli avatar jongn avatar kevinge00 avatar khaitranq avatar kxu1026 avatar ljfgem avatar losipiuk avatar maluchari avatar mockitoguy avatar nagarathnam200 avatar rzhang10 avatar shardulm94 avatar sherryhli avatar shipkit-org avatar uzshao avatar wagnermarkd avatar wmoustafa avatar yiqiangin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coral's Issues

Discussion channel

Hello,
Sorry for leaving message here, but I couldn't find place to ask about good usage(or example) of this project.

Is there some kind of opened discussion channel(gitter, discord...)?
There are Slack link, but seems this is not opened.

Thanks.

(Release 1.0.25) Hive->Presto query translation fails on struct field access

Translation of struct<> field references to Presto fails with 1.0.25 (released 4 days ago).

Sample:

    public static void main(String[] args) throws Exception {
        Map<String, Map<String, List<String>>> localMetaStore = new HashMap<>();

        Map<String, List<String>> tables = new HashMap<>();
        tables.put("bar", Arrays.asList("col|struct<i:int,s:string>"));

        localMetaStore.put("foo", tables);

        HiveToRelConverter converter = HiveToRelConverter.create(localMetaStore);
        RelToPrestoConverter presto = new RelToPrestoConverter();

        RelNode rel = converter.convertSql("select col from foo.bar");
        System.out.println(presto.convert(rel));

        // Referencing 'i' in struct 'col' fails
        rel = converter.convertSql("select col.i from foo.bar");
        System.out.println(presto.convert(rel));
    }

Stacktrace:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
SELECT "col"
FROM "foo"."bar"
Exception in thread "main" org.apache.calcite.runtime.CalciteContextException: At line 0, column 0: Table 'col' not found
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:463)
        at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:834)
        at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:819)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:4867)
        at org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:370)
        at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:5759)
        at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:5744)
        at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.expand(SqlValidatorImpl.java:5351)
        at com.linkedin.coral.hive.hive2rel.HiveSqlValidator.expand(HiveSqlValidator.java:61)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem(SqlValidatorImpl.java:447)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelectList(SqlValidatorImpl.java:4104)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3392)
        at org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
        at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1005)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:965)
        at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:940)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:647)
        at com.linkedin.coral.hive.hive2rel.HiveSqlToRelConverter.convertQuery(HiveSqlToRelConverter.java:57)
        at com.linkedin.coral.hive.hive2rel.HiveToRelConverter.toRel(HiveToRelConverter.java:126)
        at com.linkedin.coral.hive.hive2rel.HiveToRelConverter.convertSql(HiveToRelConverter.java:81)
        at com.vmware.cb.product.metrics.Main.main(Main.java:51)
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: Table 'col' not found
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:463)
        at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:572)
        ... 23 more

Support Presto as the input SQL

We had a few offline conversations about the n-to-one-to-n proposal, which is to support any engine query to any other engine query.

This issue is the feature quest to introduce Presto SQL as the input query and translate it into intermediate Coral representation and then it can be translated to other engines that are already supported in Coral output.

There are a few stages for this feature:

  1. Parse Trino SQL to Calcite SqlNode
  2. Calcite SqlNode to Trino RelNode (most work is about the function resolver)
  3. Trino RelNode to Intermediate Coral RelNode
  4. End to end test

Coral prepends hardcoded catalog name while translating a hive view to Presto

Coral hardcodes "hive" as the default catalog name while translating views to presto.

i.e. a view hql SELECT a FROM test_schema.myview translates to SELECT a FROM hive.test_schema.myview in Presto.

This is problematic for users who don't have "hive" as their default catalog name.

Presto Users reported this regression since coral was integrated for more complex hive view support: trinodb/trino#5785

We need to modify the translation such that the underlying tables are resolved to the correct catalog name.

Does not support string splicing

We have a SQL that uses || for string splicing, just like select'a' ||'b' ,but coral does not support this syntax.Can someone help support this feature?

Remove catalog name from the coralRelNode TableScan

Currently during hive -> spark/trino Sql translations, coral creates a relNode plan. The tableScan relNode has a representation: LogicalTableScan(table=[[hive, dbName, tableName]]).
When this relNode is translated to a SqlNode, "hive" (catalog name) has to be removed explicitly.

Filing this issue to investigate if introducing the catalog name in the relNode can be avoided.

The way context is used in ParseTreeBuilder is error-prone and hard-to-maintain

While adding support for window functions, I noticed an issue in the current design of ParseTreeBuilder that has caused the following query:

  SELECT `a`, RANK() OVER (PARTITION BY `b` ORDER BY `c`) FROM `foo`

To be translated into a Calcite SqlNode tree as the following:

  SELECT `a`, RANK() OVER (PARTITION BY `b` ORDER BY `c`) FROM `foo` ORDER BY `c`

The extra "ORDER BY c" at the end of the query was due to the following issue in the current design of ParseTreeBuilder.

com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder has the following method that uses context to pass around information while converting the Hive ASTNode tree to Calcite SqlNode tree:

  private SqlNode processAST(ASTNode node, @Nullable Table hiveView) {
    ParseContext ctx = new ParseContext(hiveView);
    return visit(node, ctx);
  }

In the following function, it picks up "order by" clause by looking it up via the context:

  @Override
  protected SqlNode visitQueryNode(ASTNode node, ParseContext ctx) {
    ArrayList<Node> children = node.getChildren();
    checkState(children != null && !children.isEmpty());
    SqlNode cte = null;
    ParseContext qc = new ParseContext(ctx.getHiveTable().orElse(null));
    for (Node child : node.getChildren()) {
      ASTNode ast = (ASTNode) child;
      if (ast.getType() == HiveParser.TOK_CTE) {
        // Child of type TOK_CTE represents the "WITH" list
        /** See {@link #visitCTE(ASTNode, ParseContext) visitCTE} for the return value */
        cte = visit(ast, new ParseContext(null));
      } else {
        // The return values are ignored since all other children of SELECT query will be captures via ParseConext qc.
        visit(ast, qc);
      }
    }
    SqlSelect select = new SqlSelect(ZERO, qc.keywords, qc.selects, qc.from, qc.where, qc.grpBy, qc.having, null,
        qc.orderBy, null, qc.fetch);
    if (cte != null) {
      // Calcite uses "SqlWith(SqlNodeList of SqlWithItem, SqlSelect)" to represent queries with WITH
      /** See {@link #visitCTE(ASTNode, ParseContext) visitCTE} for details */
      return new SqlWith(ZERO, (SqlNodeList) cte, select);
    } else {
      return select;
    }
  }

So any kind of "ORDER BY" within the SELECT clause will be picked up.

To fix the issue, I would like to get rid of the use of Context but instead directly pass the orderBy SqlNode via the return value of the visit function.

Any thoughts @wmoustafa @funcheetah @antumbde ?

Coral-Schema alias output upper-cased schema even if I explicitly alias to lower case

To reproduce this issue:

add this test in ViewToAvroSchemaConverterTests and run it:

  @Test
  public void foo() {
    String viewSql = "CREATE VIEW v AS SELECT id as id FROM basecomplex";
    TestUtils.executeCreateViewQuery("default", "v", viewSql);

    ViewToAvroSchemaConverter viewToAvroSchemaConverter = ViewToAvroSchemaConverter.create(hiveMetastoreClient);
    Schema actualSchema = viewToAvroSchemaConverter.toAvroSchema("default", "v", false, false);

    System.out.println(actualSchema);
  }

the output will be:

{"type":"record","name":"v","namespace":"default.v","fields":[{"name":"Id","type":"int"}]}

Which shows the column name is Id rather than the explicit lower-cased alias: id.

Translation of NULL column from Hive to Trino

The Hive view is defined as
CREATE VIEW v1 AS SELECT NULL AS c1 FROM t

When using it in Trino it throws an error
Unknown type 'NULL' for column 'c1' in view: hive.default.v1

It looks like Hive has the type void for NULL, while Trino doesn't and still enforces a type for every column.

A workaround is to modify the Hive view definition by adding an explicit cast (to whatever valid type in both Hive and Trino)
CREATE VIEW v1 AS SELECT CAST(NULL AS INT) AS c1 FROM t
This will be accepted by Trino and still works in Hive.

Regarding the translation from Hive to Trino, does it make sense to convert a plain NULL column into a CAST?

Trino SQL with `UNNEST(arrayOfStruct) AS t(col0, col1...)` cannot be converter to CoralIR

Currently, Coral supports unnesting an operand into a single row type.
Ex:
For a table default.struct defined as - default.struct(a int, b array<struct<b1:string, b2:int>>

input query: SELECT * from default.struct cross join unnest(struct.b) AS t(ccol) has a CoralIR representation.

However, OSS trino supports expanding the unnest operand into multiple columns. But coral doesn't accept the following as a valid trino input SQL: SELECT * from default.struct cross join unnest(struct.b) AS t(ccol1, ccol2)

A potential solution can be to introduce 2 types of uncollect relNodes in CoralIR - (1) Uncollect1 and (2) UncollectN to determine the number of columns unnest() should expand to (1 or more).

Inclusive language

Can you address a couple of instances of language in the project where the wording can be made more inclusive? (example: whiltelist -> allowlist). We are trying to make the code in all LinkedIn projects more inclusive. Could you please examine whether or not these need to be updated, and make the changes? For suggested replacements see go/inclusivelanguage or google. THANK YOU!

Term URL
Blacklist https://github.com/linkedin/coral/blob/b436653fe94c1b6bab30fcf7c9fd1869e2461afd/coral-spark/src/main/java/com/linkedin/coral/spark/UnsupportedHiveUDFsInSpark.java

Hive -> RelNode representation for Lateral Views has LogicalJoin as a RelNode

For a given Hive SQL with Lateral View, the RelNode representation generated has LogicalCorrelate / LogicalJoin depending on input SQL's syntax.

For Input hive SQL - SELECT col FROM (SELECT ARRAY('a1', 'a2') as a) LATERAL VIEW EXPLODE(a) a_alias AS col
RelNode representation is -

LogicalProject(col=[$1])
  LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
    LogicalProject(a=[ARRAY('a1', 'a2')])
      LogicalValues(tuples=[[{ 0 }]])
    HiveUncollect
      LogicalProject(col=[$cor0.a])
        LogicalValues(tuples=[[{ 0 }]])

For Input SQL - SELECT arr.alias FROM foo LATERAL VIEW EXPLODE(ARRAY('a', 'b')) arr as alias
RelNode representation -

LogicalProject(alias=[$3])
  LogicalJoin(condition=[true], joinType=[inner])
    LogicalTableScan(table=[[hive, default, foo]])
    HiveUncollect
      LogicalProject(col=[ARRAY('a', 'b')])
        LogicalValues(tuples=[[{ 0 }]])

Both translations should result only in LogicalCorrelate in the RelNode representation.

when resolve sql not support group by 1,2,3 this stytle

sql :
select
countdate,-----统计日期
fstrmerchantid_name,
SUBSTRING(fstrmerchantid_name,1,instr(fstrmerchantid_name,'-')-1) as fstrmerchantid,
fuifundprojectid_name,
SUBSTRING(fuifundprojectid_name,1,instr(fuifundprojectid_name,'-')-1) as fuifundprojectid,
l0_fuiconfigquota,
l0_fuifundprojectidnames,
l0_fuiquotatypedesc,
l0_fstrfailunconfirmamounttypedesc,
l0_fuiwaitlendamount,
l0_fuilendprocessingamount,
l0_fuilendedamount,
l0_fuiremaintolendamount,
l1_fuiconfigquota,
l1_fstrtotalamounttypedesc,
l1_fstrfailunconfirmamounttypedesc,
l1_fuiwaitlendamount,
l1_fuilendprocessingamount,
l1_fuilendedamount,
l1_fuiremaintolendamount
from xy_mart.fact_loan_fund_lend_quota
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20

code :
RelNode relNode = trinoToRelConverter.convertSql(a);
RelToTrinoConverter relToTrinoConverter = new RelToTrinoConverter();
String expandedSql = relToTrinoConverter.convert(relNode);

error
image

Semantic inconsistency use coral convert hive query to presto

You can use the following test case to reproduce the issue.

  public static final TestTable TABLE_FIVE = new TestTable("dwd_ogv_app_play_click_tf_inc_d", ImmutableMap.of("r_type", SqlTypeName.INTEGER,
      "ogv_type_name", SqlTypeName.VARCHAR, "buvid", SqlTypeName.VARCHAR, "log_date", SqlTypeName.VARCHAR));

  @Test
  public void testSelect() {
    String sql = String.format("select t.ogv_type_name, count(1) as user_amt from (SELECT ogv_type_name,\n" +
        "buvid\n" +
        "FROM dwd_ogv_app_play_click_tf_inc_d\n" +
        "WHERE log_date = '20210312'\n" +
        "AND r_type = 1\n" +
        "AND ogv_type_name IS NOT NULL\n" +
        "GROUP BY ogv_type_name, buvid) t GROUP BY\n" +
        "t.ogv_type_name", tableFive);
    String expectedSql = "";
    testConversion(sql, expectedSql);
  }

After convert, the presto query is below:

SELECT "ogv_type_name" AS "OGV_TYPE_NAME", COUNT(*) AS "USER_AMT"
FROM "dwd_ogv_app_play_click_tf_inc_d"
WHERE "log_date" = '20210312' AND "r_type" = 1 AND "ogv_type_name" IS NOT NULL
GROUP BY "ogv_type_name"

The semantics of the two Queries are clearly inconsistent.

Shade trino-parser dependency of coral-trino

When bumping the Coral dependency version in Trino, we noticed that coral-trino has a new dependency on trino-parser.

One of our maintainers brought up the fact that the trino-parser is not designed to be depended on by other projects i.e. not guaranteed to maintain backward compatibility. In order to preempt future incompatibilities, would the folks in Coral consider shading the trino-parser dependency?

For reference, the conversation is here.

Incorrect schema for views with unnamed columns in the view SQL

If we create a Hive view with columns that are not explicitly named in the SQL, Hive with autogenerate column names for such columns persisting the view to the Metastore. E.g. Consider the following view

hive> CREATE VIEW v AS SELECT TRUE, lower('STR') FROM some_table;

Here is the persisted information in the metastore

hive> DESC FORMATTED v;
OK
# col_name            	data_type           	comment

_c0                 	boolean
_c1                 	string
.
.
.
# View Information
View Original Text: 	SELECT TRUE, lower('STR') FROM some_table
View Expanded Text: 	SELECT TRUE, lower('STR') FROM `some_table`

Trying to retrieve view schema using coral-schema for this view returns EXPR_0 and EXPR_1 as column names, which I assume are auto generated by Calcite. I think we should respect the column names provided by Hive here i.e. _c0 and c1 since they have been persisted as column metadata in the metastore.

Coral transform hive function “regexp_extract” to trino by introduce a new function "hive_pattern_to_trino"

Coral transform the following hive sql to trino

select regexp_extract('1a 2b 14m', '\d+', 1); 

result:

select regexp_extract('1a 2b 14m', "hive_pattern_to_presto"('\d+'), 1); 

But trino not have the function name of "hive_pattern_to_presto", then the query will failed with exception:

Function 'hive_pattern_to_trino' not registered

So, why you introduce the "hive_pattern_to_trino" function? Do this function is implement in linkedin internal?

Not supported hive sql which contains windowfunc

Not supported hive sql which contains windowfunc,

exception:
Unhandled Hive AST token TOK_WINDOWSPEC, tree:

`
TOK_WINDOWSPEC

TOK_PARTITIONINGSPEC

  TOK_DISTRIBUTEBY

     TOK_TABLE_OR_COL

          c1

  TOK_ORDERBY

     TOK_TABSORTCOLNAMEDESC

        TOK_TABLE_OR_COL

           c2

`

for example:
column | type
c0 | string
c1 | string
c2 | int

select c0, c1, c2, lead(c0, 1) over ( PARTITION BY c1 ORDER BY c2 des ) c3 FROM hive.tablex

SELECT c0, c1, ROW_NUMBER() over ( PARTITION BY c0, ORDER BY c1 desc ) rw FROM hive.tablex

Trino SQLs cannot be converted to CoralSqlNode & CoralRelNode representation

Some valid input trino SQLs like:

  1. select * from unnest(array[1, 2, 3])
  2. select x from unnest(array[1, 2, 3]) t(x)
  3. select * from unnest(array[1, 2, 3]) with ordinality
  4. SELECT * from default.struct cross join unnest(struct.b) AS t(b1col, b2col) where table default.struct is defined as - default.struct(a int, b array<struct<b1:string, b2:int>>

cannot be converter to Coral IR representation.

Append 2 AS operators in SqlNode representation for LATERAL VIEWS

For RelNode -> SqlNode translations for SQLs with LATERAL VIEW UNNEST(..), Coral only appends one (inner) AS operator. The output SqlNode generated looks like -
SELECT t0.ccol .... LATERAL VIEW UNNEST(col) AS t0 (ccol)...

However, the same underlying code in Calcite appends 2 AS operators in the following fashion -
SELECT t00.ccol .... LATERAL VIEW UNNEST(col) AS t0 (ccol) AS t00....

Why does Coral deviate and is it possible to follow the original Calcite syntax in Coral?

Column '<table>.<column>' is ambiguous in Trino

I am having a problem with hive view translation to Trino that results in a query with an ambiguous column.

I narrowed down the issue to a simple view that reproduces the issue. I also identified why this happens which is a combination of how the select is rewritten and the column aliases used.

Conditions:

  • The select statement must have a join that uses a function or one of the columns in the join had a function applied to it.
  • There must be a where clause (otherwise the query is not rewritten in the same way).
  • Aliases must be used that match the original column name and are written in UPPER CASE.

Given these 2 tables:

create table hive.test.table_a (
  some_id string
);

create table hive.test.table_b (
  some_id string
);

These 2 views will have the same issue:

-- Hive view version 1
create or replace view test.view_ab as
select a.some_id
from   test.table_a a
left join 
(
  select trim(some_id) AS SOME_ID
  from   test.table_b
) b
on    a.some_id = b.some_id
where a.some_id != ''

-- Hive view version 2
create or replace view test.view_ab2 as
select a.some_id
from   test.table_a a
left join 
(
  select some_id AS SOME_ID
  from   test.table_b
) b
on    a.some_id = trim(b.some_id)
where a.some_id != ''

The translated view hive in Trino will look like this:

CREATE VIEW hive.test.view_ab SECURITY DEFINER AS
SELECT some_id
FROM
  (
   SELECT
     table_a.some_id some_id
   , t.SOME_ID SOME_ID
   FROM
     (test.table_a
   LEFT JOIN (
      SELECT
        TRIM(some_id) SOME_ID
      , CAST(TRIM(some_id) AS VARCHAR(65536)) $f1
      FROM
        test.table_b
   )  t ON (table_a.some_id = t.$f1))
)  t0
WHERE (t0.some_id <> '')

As you can see there are 2 columns with the same alias in the first sub-query

     table_a.some_id some_id
   , t.SOME_ID SOME_ID

The problem goes away if the alias is specified in lower case in the hive view definition. Which would seem like an easy change unless you have 50k+ views owned by many different teams across the organization.

I managed to fix the issue in my local environment by setting the node text to lower case for identifiers in the com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder class but I am not sure if this is the best place to do this or if this is the best way to resolve the problem.  Would that be an acceptable change?

  protected SqlNode visitIdentifier(ASTNode node, ParseContext ctx) {
    return new SqlIdentifier(node.getText().toLowerCase(Locale.ROOT), ZERO);
  }

`CONCAT(date, varchar)` is not handled well in Coral-Trino

Hive supports CONCAT(date, varchar) while Trino only supports concat(char(x), char(y)) , concat(array(E), E) E, concat(E, array(E)) E, concat(array(E)) E, concat(varchar) , concat(varbinary)

We need to handle this corner case (and maybe other similar corner cases) for Coral-Trino.

Please clarify documentation on how to use Coral-Spark with no hive

The objective of Coral-Spark => View logical plan to spark sql

Unfortunately the entrypoint to coral is RelNode which is hard to construct from a view logical plan.
I've looked into the test cases and it seems we can not live without hive.
HiveToRelConverter needs a full-fledged hive system. But to build a RelNode we only need spark's catalog (which can be anything, HMS/SparkCatalog or any 3rd party catalog).

But I've requirement of converting pure (LogicalPlan -> String) to re-generate sql.

Any ideas on that?

Escape characters are not handled correctly in regular expressions while convert Hive SQL to Trino SQL

Coral version: 2.0.86
SQL: select '2022-01-01' rlike '^\\d{4}-\\d{2}-\\d{2}$|^\\d{4}-\\d{2}$', return true in hive
However coral's translation result is SELECT "REGEXP_LIKE"('2022-01-01', '^\\d{4}-\\d{2}-\\d{2}$|^\\d{4}-\\d{2}$') FROM (VALUES (0)) AS "t" ("ZERO"), return false in trino, but we know the expected translation result is SELECT "REGEXP_LIKE"('2022-01-01', '^\d{4}-\d{2}-\d{2}$|^\d{4}-\d{2}$') FROM (VALUES (0)) AS "t" ("ZERO") (the double backslash should be translate to single backslash).

Is i using it wrong or there is a BUG ?

Support for LATERAL VIEW json_tuple

Currently Coral lacks support of the LATERAL VIEW json_tuple which is a Hive syntax leveraging json_tuple UDTF.

The main difference between json_tuple and other UDTFs like explode is that json_tuple is column generating function (in contrast to explode which is a row generating function).

json_tuple has following signature in Hive:

json_tuple(string jsonStr,string k1,...,string kn)

Where jsonStr is a string or column reference (of string type) holding serialized JSON object. k1, ..., kn are field names that are being extracted from jsonStr JSON object. So having jsonStr:

{"user_id": 1000, "name": "Mateusz", "surname": "Gajewski"}

SELECT json_tuple(jsonStr, "name", "surname", "user_id")

will return a tuple (this Hive concept is weird tbh):

"1000", "Mateusz", "Gajewski"

and by default, those columns are having automatically generated names (c0, c1,...).

Hive allows to name those columns which Coral doesn't handle as well:

SELECT json_tuple(jsonStr, "name", "surname", "user_id") as (name, surname, user_id)

And finally:

Hive's:

SELECT v.col, v.col2 FROM json_test t 
  LATERAL VIEW json_tuple(t.col, 'field', 'field2') v as col, col2

should be translated to Trino's/Presto's query (or similar shape):

SELECT v.field AS col, v.field2 AS col2 from json_test t 
  CROSS JOIN LATERAL (
    SELECT cast(json_parse(t.col) as ROW(field varchar, field2 varchar)) as v
  );

Please note that due to Hive's AS clause it's possible to extract field from JSON as column col.

Add support for DATE function in Hive

Below SQL throws an exception

SELECT date('2021-01-02') as a
com.linkedin.coral.hive.hive2rel.functions.UnknownSqlFunctionException: Unknown function name: date

	at com.linkedin.coral.hive.hive2rel.functions.HiveFunctionResolver.tryResolve(HiveFunctionResolver.java:124)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitFunctionInternal(ParseTreeBuilder.java:572)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitFunction(ParseTreeBuilder.java:563)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitFunction(ParseTreeBuilder.java:81)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:122)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelectExpr(ParseTreeBuilder.java:580)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelectExpr(ParseTreeBuilder.java:81)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:98)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelect(ParseTreeBuilder.java:605)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitSelect(ParseTreeBuilder.java:81)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:95)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitInsert(AbstractASTVisitor.java:522)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:86)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitQueryNode(ParseTreeBuilder.java:733)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitQueryNode(ParseTreeBuilder.java:81)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:66)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.lambda$visitChildren$0(AbstractASTVisitor.java:286)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:286)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visitChildren(AbstractASTVisitor.java:282)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitNil(ParseTreeBuilder.java:739)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.visitNil(ParseTreeBuilder.java:81)
	at com.linkedin.coral.hive.hive2rel.parsetree.AbstractASTVisitor.visit(AbstractASTVisitor.java:47)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.processAST(ParseTreeBuilder.java:176)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.process(ParseTreeBuilder.java:168)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilder.processSql(ParseTreeBuilder.java:161)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilderTest.convert(ParseTreeBuilderTest.java:145)
	at com.linkedin.coral.hive.hive2rel.parsetree.ParseTreeBuilderTest.testConvertAndValidate(ParseTreeBuilderTest.java:127)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:701)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:893)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1218)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
	at org.testng.TestRunner.privateRun(TestRunner.java:758)
	at org.testng.TestRunner.run(TestRunner.java:613)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
	at org.testng.SuiteRunner.run(SuiteRunner.java:240)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:87)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1170)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:1095)
	at org.testng.TestNG.run(TestNG.java:1007)
	at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:66)
	at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:109)

Relates to trinodb/trino#8789

Create unit tests to test dynamic UDF resolution

Currently dynamic UDF resolution is only testable through integration tests as it relies on communicating with an Artifact Management Repository, typically an external service. At this point, integration tests are not open source either. This issue is to track creating a unit test that tests this functionality by creating a local artifact management repository and leveraging it in unit tests.

Translating Expressions through Coral

Can Coral provide stable APIs to translate expressions from hive to Presto through coral?

Usecase (As requested by Praveen Krishna):

Presto allows connectors to provide row-filters in terms of expressions. It's possible that they're defined in hiveql, and presto needs to read and transform them before applying. For this purpose, having the ability to translate expressions directly is useful.

for example, Presto can be configured to say that for queries by user U on on table tpch.tiny.orders , exclude rows for which orderKey is >= 10. This is defined as an expression written as orderKey < 10 in the configuration. Currently, Presto requires this expression to be in a form that Presto can understand. If coral allows hive --> presto translation of such expressions, then they could be expressed as hive expressions and translated at runtime.

The order of the converted SQL execution results is inconsistent when use coral convert hive query to presto

The table structure is as follows:

public static final TestTable TABLE_NULLSCOLLATION = new TestTable("nullscollation", ImmutableMap.of("name", SqlTypeName.VARCHAR,
          "id", SqlTypeName.BIGINT, "fans", SqlTypeName.BIGINT, "log_date", SqlTypeName.VARCHAR));

The table data is as follows:

|NULL|456789|NULL|20210426|
|24784|556789|14535|20210426|
|NULL|33456789|NULL|20210426|
|96344|454888|14255|20210426|
|96344|454888|14255|20210426|
|NULL|7556789|NULL|20210426|

Raw sql :

SELECT name, id, fans FROM nullscollation
WHERE log_date = 20210426
ORDER BY fans DESC
LIMIT 3

Converted sql :

SELECT "name", "id", "fans" 
FROM "nullscollation"
WHERE TRY_CAST("log_date" AS INTEGER) = 20210426
ORDER BY "fans" DESC NULLS FIRST
LIMIT 3

Original sql result use hive or presto:

|24784|556789|14535|
|96344|454888|14255|
|96344|454888|14255|

Converted sql result use presto:

|NULL|456789|NULL|
|NULL|33456789|NULL|
|NULL|7556789|NULL|

Hive default Nulls last for DESC, nulls first for ASC. Calcite defaultNullCollation is HIGH(which is Nulls first for DESC, nulls last for ASC.), coral not set nullcollation. We can set nullcollation to NullCollation.LOW when get hive sql validator,it can make the query results the same

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.