Giter Club home page Giter Club logo

marble's Introduction

Marble is a high performance in-memory hive sql engine based on Apache Calcite.
It can help you to migrate hive sql scripts to a real-time computing system.
It also provides a convenient Table API to help you to build custom SQL engines.

You may want another similar project: direct-spark-sql

Build and run tests

Requirements

  • Java 1.8 as a build JDK
  • Maven

1.build marble

cd marble
mvn clean install -DskipTests

(Optional)
if you need modify the patches of Calcite, build calcite-patch project first

git clone https://github.com/51nb/calcite-patch.git
cd calcite-patch
mvn clean install -DskipTests

In the long term,we hope to merge the patches to Calcite finally.

2.import marble project into IDE, but please don't import calcite-patch as a submodule of marble project

3.run the test TableEnvTest and HiveTableEnvTest

Usage

Maven dependency

        <dependency>
            <groupId>org.codehaus.janino</groupId>
            <artifactId>janino</artifactId>
            <version>3.0.11</version>
        </dependency>
        <dependency>
            <groupId>org.codehaus.janino</groupId>
            <artifactId>commons-compiler</artifactId>
            <version>3.0.11</version>
        </dependency>
        <dependency>
            <groupId>com.u51.marble</groupId>
            <artifactId>marble-table-hive</artifactId>
            <version>1.0.0</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.calcite</groupId>
                    <artifactId>calcite-core</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.calcite</groupId>
                    <artifactId>calcite-linq4j</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.codehaus.janino</groupId>
                    <artifactId>janino</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.codehaus.janino</groupId>
                    <artifactId>commons-compiler</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

API Overview

TableEnv.enableSqlPlanCacheSize(200);

TableEnv tableEnv = HiveTableEnv.getTableEnv();

DataTable t1 = tableEnv.fromJavaPojoList(pojoList);
DataTable t2 = tableEnv.fromJdbcResultSet(resultSet);
DataTable t3=tableEnv.fromRowListWithSqlTypeMap(rowList,sqlTypeMap);

tableEnv.addSubSchema("test");
tableEnv.registerTable("test","t1",t1);
tableEnv.registerTable("test","t2", t2);
DataTable queryResult = tableEnv.sqlQuery("select * from test.t1 join test.t2 on t1.id=t2.id");
List<Map<String, Object>> rowList=queryResult.toMapList();

It's recommended to enable plan cache for the same sql query:

TableEnv.enableSqlPlanCacheSize(200);

TableEnv is the main table api to execute sql queries on a dataSet.
It can be used to:

  • convert a java pojo List or jdbc ResultSet to a DataTable
  • register a DataTable in TableEnv's catalog
  • add subSchemas and customized functions in TableEnv's catalog
  • execute a sql query to get the result DataTable

The TableEnv supports Calcite's sql dialect by default,see it's sql reference.
The goal of HiveTableEnv is to support hive sql as far as possible,developers can aslo use a TableConfig to create a new TableEnv to support other sql dialects(MysqlTableEnv,PostgreTableEnv ..etc).

Supported hive sql features

  • specific keywords and operators
  • all of UDF,UDAF
  • part of UDTF
  • implicit type casting
  • load customized UDF,UDAF by package name
    HiveTableEnv.registerHiveFunctionPackages("com.u51.data.hive.udf"); 
    

Benchmark

There're some benchmark tests in the benchmark module,it compares flink,spark and marble on some simple sql queries.

Design

It shows how marble customized calcite in the sql processing flow: how_marble_customized_calcite
You can find more details from calcite-patch's commit history.Now Marble uses calcite 1.18.0.

The main type mapping between calcite and hive is:

CalciteSqlType JavaStorageType HiveObjectInspector
BIGINT Long LongObjectInspector
INTEGER Int IntObjectInspector
DOUBLE Double DoubleObjectInspectors
DECIMAL BigDecimal HiveDecimalObjectInspector
VARCHAR String StringObjectInspector
DATE Int DateObjectInspector
TIMESTAMP Long TimestampObjectInspector
ARRAY List StandardListObjectInspector
...... ...... ......

Roadmap

  • improve compatibility with hive sql.(high priority)
  • submit patches to Calcite,make it easy to upgrade calcite-core, some related issues:CALCITE-2282,CALCITE-2973,CALCITE-2992.(high priority)
  • implements UDTF in a generic way.(high priority)
  • constant folded for hive udf.(low priority)
  • use a customized sql Planner to replace the default PlannerImpl.(low priority)
  • TPC-DS queries with a customized scale.(low priority)
  • vectorized udf execution.(experimental)
  • distributed broadcast join.(experimental)
  • cost based optimizer.(experimental)

More issues see issues.

Contributing

Welcome contributions. Please use the Calcite-idea-code-style.xml under the marble directory to reformat code, and ensure that the validation of maven checker-style plugin is success after source code building.

License

This library is distributed under terms of Apache 2 License

marble's People

Contributors

laizhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

marble's Issues

文档说是支持了array类型,但是测试报错

@test
public void testCreateTableFromRowListWithSqlType() {

List<Map<String, Object>> rowList = new ArrayList<>();
Map<String, Object> row1 = new HashMap<>();
row1.put("c1", "a");
row1.put("c2", 0.1);
row1.put("c3", new Date().getTime());
row1.put("c4", Lists.newArrayList("a", "n"));

rowList.add(row1);
Map<String, Object> row2 = new HashMap<>();
row2.put("c1", "b");
row2.put("c2", 0.2);
row2.put("c3", new Date().getTime());
row1.put("c4", Lists.newArrayList("a", "n"));

rowList.add(row2);

Map<String, SqlTypeName> sqlTypeMap = new HashMap<>();
sqlTypeMap.put("c1", SqlTypeName.VARCHAR);
sqlTypeMap.put("c2", SqlTypeName.DOUBLE);
sqlTypeMap.put("c3", SqlTypeName.TIMESTAMP);
sqlTypeMap.put("c4", SqlTypeName.ARRAY);

DataTable dataTable = tableEnv.fromRowListWithSqlTypeMap(rowList, sqlTypeMap);
tableEnv.registerTable("t", dataTable);
DataTable queryResult = tableEnv.sqlQuery("select * from t");
System.out.println(queryResult.toMapList());

}

测试报错

java.lang.AssertionError: use createArrayType() instead

at org.apache.calcite.sql.type.SqlTypeFactoryImpl.assertBasic(SqlTypeFactoryImpl.java:221)
at org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:48)
at org.apache.calcite.rel.type.RelDataTypeFactory$Builder.add(RelDataTypeFactory.java:476)
at org.apache.calcite.rel.type.RelDataTypeFactory$FieldInfoBuilder.add(RelDataTypeFactory.java:377)
at org.apache.calcite.rel.type.RelDataTypeFactory$FieldInfoBuilder.add(RelDataTypeFactory.java:365)
at org.apache.calcite.table.TableEnv.lambda$fromRowListWithSqlTypeMap$2(TableEnv.java:557)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at org.apache.calcite.table.TableEnv.fromRowListWithSqlTypeMap(TableEnv.java:556)
at org.apache.calcite.table.TableEnvTest.testCreateTableFromRowListWithSqlType(TableEnvTest.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

Allow TableEnv to register ViewTable

Let TableEnv register a ViewTable .
Consider a specific useCase, in sql it seems like:
create view v1 as select id, udf(content) as c1 from t1
when we executing a query like
select * from t2 join v1 on t2.id=v1.id and v1.id<10000
the predicate condition v1.id<10000 can be pushed down.
The final sql will be
select * from t2 join (select id, udf(content) as c1 from t1 where id<10000)v1 on t2.id=v1.id,
so the cost can be reduced.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.