Giter Club home page Giter Club logo

takvamborgen / java-vtl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from statisticsnorway/java-vtl

1.0 1.0 0.0 15.69 MB

An Open Source Java implementation of the Validation Transformation Language, based on the VTL 1.1 draft specification. The implementation follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores. VTL is a standard language for defining validation and transformation rules (set of operators, their syntax and semantics) for any kind of statistical data.

Home Page: http://java-vtl.org

License: Apache License 2.0

HTML 2.13% CSS 5.66% JavaScript 26.33% Java 62.10% ANTLR 1.00% GAP 2.48% Shell 0.15% RAML 0.14%

java-vtl's Introduction

Build Status Codacy Badge Codacy coverage Gitter

Java VTL: Java implementation of VTL

The Java VTL project is an open source java implementation of the VTL 1.1 draft specification. It follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores.

Visit the interactive reference manual for more information.

Modules

The project is divided in modules;

  • java-vtl-parent
    • java-vtl-parser, contains the lexer and parser for VTL.
    • java-vtl-model, VTL data model.
    • java-vtl-script, JSR-223 (ScriptEngine) implementation.
    • java-vtl-connector, connector API.
    • java-vtl-tools, various tools.

Usage

Add a dependency to the maven project

<dependency>
    <groupId>no.ssb.vtl</groupId>
    <artifactId>java-vtl-script</artifactId>
    <version>[VERSION]</version>
</dependency>

Evaluate VTL expressions

ScriptEngine engine = new VTLScriptEngine(connector);

Bindings bindings = engine.getBindings(ScriptContext.ENGINE_SCOPE);
engine.eval("ds1 := get(\"foo\")" +
            "ds2 := get(\"bar\")" +
            "ds3 := [ds1, ds2] {" +
            "   filter ds1.id = \"string\"," +
            "   total := ds1.measure + ds2.measure" +
            "}");

System.out.println(bindings.get("ds3"))

Connect to external systems

VTL Java uses the no.ssb.vtl.connector.Connector interface to access and export data from and to external systems.

The Connector interface defines three methods:

public interface Connector {

    boolean canHandle(String identifier);

    Dataset getDataset(String identifier) throws ConnectorException;

    Dataset putDataset(String identifier, Dataset dataset) throws ConnectorException;

}

The method canHandle(String identifier) is used by the engine to find which connector is able to provide a Dataset for a given identifier.

The method getDataset(String identifier) is then called to get the dataset. Example implementations can be found in the java-vtl-ssb-api-connector module but a very crude implementation could be as such:

class StaticDataset implements Dataset {

    private final DataStructure structure = DataStructure.builder()
            .put("id", Role.IDENTIFIER, String.class)
            .put("period", Role.IDENTIFIER, Instant.class)
            .put("measure", Role.MEASURE, Long.class)
            .put("attribute", Role.ATTRIBUTE, String.class)
            .build();

    @Override
    public Stream<DataPoint> getData() {

        List<Map<String, Object>> data = new ArrayList<>();
        HashMap<String, Object> row = new HashMap<>();
        Instant period = Instant.now();
        for (int i = 0; i < 100; i++) {
            row.put("id", "id #" + i);
            row.put("period", period);
            row.put("measure", Long.valueOf(i));
            row.put("attribute", "attribute #" + i);
            data.add(row);
        }

        return data.stream().map(structure::wrap);
    }

    @Override
    public Optional<Map<String, Integer>> getDistinctValuesCount() {
        return Optional.empty();
    }

    @Override
    public Optional<Long> getSize() {
        return Optional.of(100L);
    }

    @Override
    public DataStructure getDataStructure() {
        return structure;
    }
}

Implementation roadmap

This is an overview of the implementation progress.

Group Operators Progress Comment
General purpose round parenthesis done
General purpose := (assignment) done
General purpose membership done
General purpose get usable The keep, filter and aggregate are not yet reflected in the connector interface.
General purpose put usable The Connector interface is defined but expressions are not recognized yet.
Join expression []{} done
Join clause filter done
Join clause keep done
Join clause drop done
Join clause fold done
Join clause unfold done
Join clause rename done
Join clause := (assignment) done
Join clause . (membership) done
Clauses rename done
Clauses filter done
Clauses keep done
Clauses calc todo
Clauses attrcalc todo
Clauses aggregate todo
Conditional if-then-else todo
Conditional nvl usable Dataset as input not implemented.
Validation Comparisons (>,<,>=,<=,=,<>) usable Only inside join expression (no lifting).
Validation in,not in, between todo
Validation isnull done Implemented syntax are isnull(value), value is null and value is not null
Validation exist_in, not_exist_in todo
Validation exist_in_all, not_exist_in_all todo
Validation check usable The boolean dataset must be built manually (no lifting).
Validation match_characters todo
Validation match_values todo
Statistical min, max todo
Statistical hierarchy usable The inline definition is not supported. A dataset that has a correct structure can be used instead.
Statistical aggregate todo
Relational union done
Relational intersect todo
Relational symdiff todo
Relational setdiff todo
Relational merge todo
Boolean and usable Only inside join expression (no lifting).
Boolean or usable Only inside join expression (no lifting).
Boolean xor usable Only inside join expression (no lifting).
Boolean not usable Only inside join expression (no lifting).
Mathematical unary plus and minus usable Only inside join expression (no lifting).
Mathematical addition, substraction usable Only inside join expression (no lifting).
Mathematical multiplication, division usable Only inside join expression (no lifting).
Mathematical round todo
Mathematical abs todo
Mathematical trunc todo
Mathematical power, exp, nroot todo
Mathematical in, log todo
Mathematical mod todo
String length todo
String concatenation todo
String trim todo
String upper/lower case todo
String substring todo
String indexof todo
String date_from_string usable Dataset as input not implemented. Only YYYY date format accepted.

Analytics

java-vtl's People

Contributors

hadrienk avatar pawbu avatar trygu avatar takvamborgen avatar eivindgi avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.