Giter Club home page Giter Club logo

sdf-cli's Introduction

SDF

This repo contains documentation, examples, benchmarks, and schemas for SDF.

SDF is a multi-dialect SQL compiler, transformation framework, and analytical database engine. It natively compiles SQL dialects, like Snowflake, and connects to their corresponding data warehouses to materialize models.

For more, check out our official documentation at docs.sdf.com.

SDF is open core built on Apache DataFusion`.

Installation

To install the SDF CLI, run the following command:

curl -LSfs https://cdn.sdf.com/releases/download/install.sh | sh -s

For more thorough installation docs, check out the installation guide.

Featured Libraries

SDF has a rich open-source ecosystem. Here are a few libraries that we recommend you check out:

  • SQL Functions - This Rust Crate powers SDF function typing and execution for every supported dialect. This enables our local compute engine Apache DataFusion to run SQL functions across dialects. This Rust Crate is used in the CLI, and contributions are welcome.
  • SDF Materialization - The library used to materialize SDF models. This is packaged up into the CLI and can be found in the .sdfcache.
  • SDF Tests - The library used to write data tests with SDF. This is packaged up into the CLI and can be found in the .sdfcache.
  • SDF Utils - A collection of utilities for working with SDF models. This is packaged up into the CLI and can be found in the .sdfcache. This library should act as a model for how to author your own SDF libraries.
  • SDF Workspace Evaluator - A library for evaluating SDF workspaces with SDF reports. This library contains some of the most used reports and can be used as a model for authoring your own reports. Some of the most popular are dead_column analysis (find columns that are never used and are wasting compute / storage) and column_description_coverage (find columns that are missing descriptions).
  • SDF GitHub Action - Our official GitHub action for running SDF commands in CI/CD.

Non-default libraries like the SDF Workspace Evaluator can be added to an SDF workspace like so:

workspace:
  ...
  dependencies:
    - name: evaluator
      git: https://github.com/sdf-labs/workspace-evaluator/workspace-evaluator.git

Note that SDF Utils, SDF Materialization, and SDF Tests are already included in the SDF CLI and do not need to be added to the workspace.

Structure

This repo is organized as follows:

  • Docs - Contains the official SDF documentation found at docs.sdf.com. All docs are open source, and contributions are welcome.
  • Examples - Contains example SDF projects that demonstrate how to use SDF in practice. These examples are open source and are packaged into the SDF binary. Run sdf new --sample <sample-name> to create a workspace locally from one of these examples.
  • Schemas - Contains the JSON schemas for the SDF configuration YML files. These schemas are auto-generated every time the binary is released. They are used to validate the configuration files in the SDF CLI and can be used to power integrations that leverage lineage or metatadata from the SDF compiler artifacts.

Releases

All SDF releases are reflected in this repo as official releases. You can find the latest release here. And yes, those release notes are begrudgingly reflective of the release notes in our internal repo.

For in-depth release notes, check out the SDF changelog.

SDF is updated frequently and adheres to a strict versioning system: ..* || Patch versions include patch fixes, and additive improvments which should not contain any breaking changes .*. || Minor versions may include breaking changes. These might be YML schema changes, information schema changes, or changes in SDF's internal logic *.. || Major versions

Contributing

Contributions are welcome to our examples and docs. Although SDF is still being incubated and is not yet open source, it is powered by an open source core: Apache DataFusion and function execution is enabled by our open source SQL Functions Crate. If you're ever missing support for a function, finding that a function is mistyped, or dealing with an execution error - feel free to contribute to either of those repos. If you have a feature request or bug report with the compiler, please open an issue on this repo. We'll track it and prioritize it ASAP.

sdf-cli's People

Contributors

actions-user avatar eliasdefaria avatar evabgood avatar schulte-lukas avatar deepyaman avatar

Stargazers

Jason Brownstein avatar Duke avatar Neal Grantham avatar Tim O'Guin avatar  avatar Daniel Bartley avatar Sung Won Chung avatar  avatar  avatar  avatar  avatar Kendrick van Doorn avatar Ani Venkateshwaran avatar Zhong Xu avatar  avatar

Watchers

Zhong Xu avatar  avatar

Forkers

deepyaman findepi

sdf-cli's Issues

error when selecting from a join with ambigious column names

Describe the bug**

SDF Slack thread

related: apache/datafusion#11993

scenario

sdf run (compile has no issue)
joining two tables on a key but not specifying columns so that the join column is ambiguous e.g. name in the below query is ambiguous
SELECT * FROM age LEFT JOIN job ON age.name = job.name

error

SDF1015: Failed to write table to disk: Optimizer rule 'optimize_projections' failed

To Reproduce**

call sdf run with the below models

-- age.sql
SELECT 'Alice' AS name, 25 AS age
UNION ALL
SELECT 'Bob', 30
UNION ALL
SELECT 'Charlie', 28

-- job.sql
SELECT 'Alice' AS name,
    'Engineer' AS job
UNION ALL
SELECT 'Bob','Designer'
UNION ALL
SELECT 'Charlie','Manager'

-- join.sql
SELECT * FROM age LEFT JOIN job ON age.name = job.name

Expected behavior

flag that there's ambiguous columns that need to be resolved

Additional Context

Version

sdf 0.3.21

full stack trace

SDF1015: Failed to write table to disk: Optimizer rule 'optimize_projections' failed
caused by
optimize_projections
caused by
Internal error: Failed due to a difference in schemas, original schema: DFSchema { inner: Schema { fields: [Field { name: "name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "age", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "job", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "_0" }), None, Some(Bare { table: "_2" }), None], functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: DFSchema { inner: Schema { fields: [Field { name: "_0.name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "age", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "_2.name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "job", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None, None, None, None], functional_dependencies: FunctionalDependencies { deps: [] } }.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.