tconbeer / sqlfmt Goto Github PK

View Code? Open in Web Editor NEW

347.0 4.0 13.0 1.39 MB

sqlfmt formats your dbt SQL files so you don't have to

Home Page: https://sqlfmt.com

License: Apache License 2.0

Makefile 0.24% Python 98.27% PLpgSQL 1.45% Dockerfile 0.04%

sql python dbt formatter

sqlfmt's Introduction

sqlfmt

sqlfmt formats your dbt SQL files so you don't have to. It is similar in nature to black, gofmt, and rustfmt (but for SQL).

sqlfmt promotes collaboration. An auto-formatter makes it easier to collaborate with your team and solicit contributions from new people. You will never have to mention (or argue about) code style in code reviews again.
sqlfmt is fast. Forget about formatting your code, and spend your time on business logic instead. sqlfmt processes hundreds of files per second and only operates on files that have changed since the last run.
sqlfmt works with Jinja. It formats the code that users look at, and therefore doesn't need to know anything about what happens after the templates are rendered.
sqlfmt integrates with your workflow. As a CLI written in Python, it's easy to install locally on any OS and run in CI. Plays well with dbt, pre-commit, SQLFluff, VSCode, and GitHub Actions. sqlfmt powers the dbt Cloud IDE's Format button.

sqlfmt is not configurable, except for line length. It enforces a single style. sqlfmt maintains comments and some extra newlines, but largely ignores all indentation and line breaks in the input file.

sqlfmt is not a linter. It does not parse your code into an AST; it just lexes it and tracks a small subset of tokens that impact formatting. This lets us "do one thing and do it well:" sqlfmt is very fast, and easier to maintain and extend than linters that need a full SQL grammar.

For now, sqlfmt only works on select, delete, grant, revoke, and create function statements (which is all you need if you use sqlfmt with a dbt project). It is being extended to additional DDL and DML. Visit this tracking issue for more information.

Documentation

Please visit docs.sqlfmt.com for more information on Getting Started, Integrations, the sqlfmt Style, and an API Reference. Or keep reading for an excerpt from the full docs.

Installation

Try it first

Want to test out sqlfmt on a query before you install it? Go to sqlfmt.com to use the interactive, web-based version.

Install Using pipx (recommended)

sqlfmt is a pip-installable Python package listed on PyPI under the name shandy-sqlfmt. You should install it into a virtual environment, which pipx does automatically:

pipx install shandy-sqlfmt

To install with the jinjafmt extra (which will also install the Python code formatter, black):

pipx install shandy-sqlfmt[jinjafmt]

For more installation options, read the docs.

Getting Started

Other prerequisites

sqlfmt will not always produce the formatted output you might want. It might even break your SQL syntax. It is highly recommended to only run sqlfmt on files in a version control system (like git), so that it is easy for you to revert any changes made by sqlfmt. On your first run, be sure to make a commit before running sqlfmt.

There are certain situations where sqlfmt can be considered to be in Beta, or even more mature than that. Those are:

Using sqlfmt to format select statements for one of the major dialects (PostgresSQL, MySQL, Snowflake, BQ, Redshift).
Using sqlfmt to format a dbt project (which may also include jinja and some minimal DDL/DML, like grants, create function, etc.) for one of the major dialects.

However, there are other use cases where sqlfmt is very much alpha:

Formatting some dialects that deviate from ANSI or Postgres, like T-SQL (SQLServer).
Formatting other DDL (create table, insert, etc.) (sqlfmt attempts to be no-op on these statements as much as possible).

In these domains sqlfmt is nowhere near "feature complete" and caution is highly advised.

Using sqlfmt

To list commands and options:

sqlfmt --help

If you want to format all .sql and .sql.jinja files in your current working directory (and all nested directories), simply type:

$ sqlfmt .

If you don't want to format the files you have on disk, you can run sqlfmt with the --check option. sqlfmt will exit with code 1 if the files on disk are not properly formatted:

$ sqlfmt --check .

If you want to print a diff of changes that sqlfmt would make to format a file (but not update the file on disk), you can use the --diff option. --diff also exits with 1 on changes:

$ sqlfmt --diff .

For more commands, see the docs.

Configuring sqlfmt using pyproject.toml

Any command-line option for sqlfmt can also be set in a pyproject.toml file, under a [tool.sqlfmt] section header. Options passed at the command line will override the settings in the config file. See the docs for more information.

The jinjafmt extra

sqlfmt loves properly-formatted jinja, too.

See the docs for more information about using the jinjafmt extra or disabling jinja formatting.

Using sqlfmt with different SQL dialects

sqlfmt's rules are simple, which means it does not have to parse every single token in your query. This allows nearly all SQL dialects to be formatted using sqlfmt's default "polyglot" dialect, which requires no configuration.

The exception to this is ClickHouse, which is case-sensitive where other dialects are not. To prevent the lowercasing of function names, database identifiers, and aliases, use the --dialect clickhouse option when running sqlfmt. For example,

$ sqlfmt . --dialect clickhouse

This can also be configured using the pyproject.toml file:

[tool.sqlfmt]
dialect = "clickhouse"

Note that with this option, sqlfmt will not lowercase most non-reserved keywords, even common ones like sum or count. See (and please join) this discussion for more on this topic.

Integrations

sqlfmt plays nicely with other analytics engineering tools. For more information, see the docs.

dbt

sqlfmt was built for dbt, so only minimal configuration is required. We recommend excluding your target and dbt_packages directories from formatting. You can do this with the command-line --exclude option, or by setting exclude in your pyproject.toml file:

[tool.sqlfmt]
exclude=["target/**/*", "dbt_packages/**/*"]

Other Integrations

Config for other integrations is detailed in the docs linked below:

The sqlfmt style

The only thing you can configure with sqlfmt is the desired line length of the formatted file. You can do this with the --line-length or -l options. The default is 88.

sqlfmt borrows elements from well-accepted styles from other programming languages. It places opening brackets on the same line as preceding function names (like black for python and 1TBS for C). It indents closing brackets to the same depth as the opening bracket (this is extended to statements that must be closed, like case and end).

The sqlfmt style is as simple as possible, with little-to-no special-casing of formatting concerns. While at first blush, this may not create a format that is as "nice" or "expressive" as hand-crafted indentation, over time, as you grow accustomed to the style, formatting becomes transparent and the consistency will allow you to jump between files, projects, and even companies much faster.

Why lowercase?

Because SQL is code! But there are other good reasons too.

Why trailing commas?

Using trailing commas follows the convention of every other written language and programming language. But wait, there's more.

Contributing

Providing Feedback

We'd love to hear from you! Open an Issue to request new features, report bad formatting, or say hello.

Setting up Your Dev Environment and Running Tests

Install Poetry v1.2 or higher if you don't have it already. You may also need or want pyenv, make, and gcc. A complete setup from a fresh install of Ubuntu can be found here.
Clone this repo into a directory (let's call it sqlfmt), then cd sqlfmt.
Use poetry install --all-extras --sync to install the project (editable) and its dependencies (including the jinjafmt and sqlfmt_primer extras) into a new virtual env.
Use poetry shell to spawn a subshell.
Type make to run all tests and linters, or run pytest, black, flake8, isort, and mypy individually.

Updating primer repos to reflect formatting changes

Make sure all changes are committed to sqlfmt.
Check out main in the repo and make sure you pull changes locally.
Check out the unformatted tag in the repo with git checkout -b chore/apply-abc123 unformatted where abc123 is the hash of the most recent sqlfmt commit (from 1).
Run sqlfmt against the working tree, then git add . and git commit -m "chore: apply sqlfmt abc123".
We will have conflicts with main that we want to ignore, so merge main into this branch, ignoring anything on main: git merge -s ours main.
Push and open a PR; squash and merge. Grab the commit SHA.
Paste the commit SHA as a ref into primer.py.
Run sqlfmt_primer -k to clear the cache, then update the stats in primer.py to match the results.

sqlfmt's People

Contributors

Stargazers

Watchers

Forkers

ljhopkins2 mtmbutler rk4mile kxzk joseguillermoaraya smcgivern markns pgoslatara benjamin-awd leonandrade jomido aj-22 ryaminal

sqlfmt's Issues

Add support for `like any` and `like all`

see https://docs.snowflake.com/en/sql-reference/functions/like_any.html

Fix "complex_logic" issue in src/sqlfmt/splitter.py

Consider simplifying this complex logical expression.

https://codeclimate.com/github/tconbeer/sqlfmt/src/sqlfmt/splitter.py#issue_61f46a9f597a090001000042

Bug: Jinjafmt adds space after dbt_utils macro call (Can't reproduce)

Describe the bug
Jinjafmt adds a space between the macro name and its opening parens. Thankfully jinja doesn't seem to mind, but it's bad style.

To Reproduce

{{ dbt_utils.surrogate_key(['users.id', "'users'"]) }} as id,

Expected behavior

{{ dbt_utils.surrogate_key(['users.id', "'users'"]) }} as id,

Actual behavior

{{ dbt_utils.surrogate_key (['users.id', "'users'"]) }} as id,

Additional context
What is the output of sqlfmt --version?
0.5.1

More unstable indentation in multiline jinja

Describe the bug
Looks like #138 should not have been closed.

I suspect this is happening when jinjafmt is enabled, but black is not able to parse the string, so we abort formatting; we don't detect this today and probably need to handle this case and not modify the indentation (same as other instances where jinjafmt no-ops)

To Reproduce

...
         select
             {{
                 dbt_utils.star(
                      from=ref('prep_crm_user'),
                      except=['CREATED_BY','UPDATED_BY','MODEL_CREATED_DATE','MODEL_UPDATED_DATE','DBT_UPDATED_AT','DBT_CREATED_AT']                           
                )
             }}
         from {{ ref("prep_crm_user") }}
...

Expected behavior
This was produced by 0.8.0, so it should no-op

Actual behavior

...
         select
             {{
                 dbt_utils.star(
                              from=ref('prep_crm_user'),
                              except=['CREATED_BY','UPDATED_BY','MODEL_CREATED_DATE','MODEL_UPDATED_DATE','DBT_UPDATED_AT','DBT_CREATED_AT']                           
                        )
             }}
         from {{ ref("prep_crm_user") }}
...

Additional context
What is the output of sqlfmt --version?
0.8.0

Feature: add a command line option to reset the cache

helpful for debugging unstable formatting

-k/--reset-cache

Bug: Comments inside of --fmt:off blocks are still formatted

Describe the bug
The --fmt: off tags should stop all formatting of the contained code, but today that is not the case, since comments may be rendered differently in those blocks

To Reproduce

-- fmt: off
where something_long = something_else_long -- and there is a long comment that should stay on this line
-- fmt: on

Expected behavior

-- fmt: off
where something_long = something_else_long -- and there is a long comment that should stay on this line
-- fmt: on

Actual behavior

-- fmt: off
-- and there is a long comment that should stay on this line
where something_long = something_else_long
-- fmt: on

Additional context
What is the output of sqlfmt --version?
0.5.1

Feature: Support additional operators

Support ?, the "existence" operator in pg
range operators: =, &&, <@, @>, <<, >>, -|-, &<, and &> https://www.postgresql.org/docs/current/rangetypes.html
EXISTS, ALL, ANY, MINUS https://docs.snowflake.com/en/sql-reference/operators.html

Maybe many others here: https://www.postgresql.org/docs/current/functions.html

Try to replace `~` (jinja string concatenation) with another operator before formatting with black

Also consider replacing ~ (jinja string concatenation) with + or similar?

Search for an unused operator (+, -, *, /) and substitute that. black doesn't care that you can't actually divide strings.

Originally posted by @tconbeer in #177 (comment)

Unstable formatting with chained jinja expressions and operators

Describe the bug
sqlfmt should always produce the same output on a given query, even when run multiple times. When this is not the case, we call this "unstable formatting".

There is an interaction between jinja expressions (and jinjafmt) and operators that is causing unstable formatting.

To Reproduce
This file in the gitlab project has already been formatted by 0.8.0.

Run it through sqlfmt --diff

Expected behavior
sqlfmt should no-op

Actual behavior

--- source_query
+++ formatted_query
@@ -29,25 +29,38 @@
         as dim_sales_qualified_source_id,
         {{ get_keyed_nulls("order_type.dim_order_type_id") }} as dim_order_type_id
     from {{ ref("sheetload_sales_funnel_targets_matrix_source") }}
-    left join date on {{
+    left join
+        date
+        on
+        {{
             sales_funnel_text_slugify(
                 "sheetload_sales_funnel_targets_matrix_source.month"
             )
-        }} = {{ sales_funnel_text_slugify("date.fiscal_month_name_fy") }}
-    left join sales_qualified_source on {{
+        }}
+        = {{ sales_funnel_text_slugify("date.fiscal_month_name_fy") }}
+    left join
+        sales_qualified_source
+        on
+        {{
             sales_funnel_text_slugify(
                 "sheetload_sales_funnel_targets_matrix_source.opportunity_source"
             )
-        }} = {{
+        }}
+        =
+        {{
             sales_funnel_text_slugify(
                 "sales_qualified_source.sales_qualified_source_name"
             )
         }}
-    left join order_type on {{
+    left join
+        order_type
+        on
+        {{
             sales_funnel_text_slugify(
                 "sheetload_sales_funnel_targets_matrix_source.order_type"
             )
-        }} = {{ sales_funnel_text_slugify("order_type.order_type_name") }}
+        }}
+        = {{ sales_funnel_text_slugify("order_type.order_type_name") }}

 ),
 fy22_user_hierarchy as (
@@ -94,11 +107,9 @@
     from target_matrix
     left join
         fy22_user_hierarchy
-        on {{ sales_funnel_text_slugify("target_matrix.area") }} = {{
-            sales_funnel_text_slugify(
-                "fy22_user_hierarchy.crm_opp_owner_area_stamped"
-            )
-        }}
+        on {{ sales_funnel_text_slugify("target_matrix.area") }}
+        =
+        {{ sales_funnel_text_slugify("fy22_user_hierarchy.crm_opp_owner_area_stamped") }}
     where target_matrix.fiscal_year = 2022

     union all
@@ -126,7 +137,8 @@
     left join
         fy23_and_beyond_user_hierarchy
         on {{ sales_funnel_text_slugify("target_matrix.area") }}
-        = {{
+        =
+        {{
             sales_funnel_text_slugify(
                 "fy23_and_beyond_user_hierarchy.crm_opp_owner_sales_segment_geo_region_area_stamped"
             )

Additional context
What is the output of sqlfmt --version?
0.8.0

Fix "file_lines" issue in src/sqlfmt/line.py

File line.py has 785 lines of code (exceeds 250 allowed). Consider refactoring.

https://codeclimate.com/github/tconbeer/sqlfmt/src/sqlfmt/line.py#issue_61f46a9f597a09000100003e

Bug: Unstable indentation for multiline jinja

Describe the bug
Sometimes sqlfmt increases the indentation on a mulitline jinja tag when it's run multiple times.

To Reproduce

with
    base_spine as (
        {{
            dbt_utils.date_spine(
                datepart="day",
                start_date="'2021-01-01'::date",
                end_date="sysdate() + interval '1 year'"
            )
        }}
    ),

Expected behavior

with
    base_spine as (
        {{
            dbt_utils.date_spine(
                datepart="day",
                start_date="'2021-01-01'::date",
                end_date="sysdate() + interval '1 year'"
            )
        }}
    ),

Actual behavior

with
    base_spine as (
        {{
            dbt_utils.date_spine(
                                                    datepart="day",
                                                    start_date="'2021-01-01'::date",
                                                    end_date="sysdate() + interval '1 year'"
                                                )
        }}
    ),

Additional context
What is the output of sqlfmt --version?
0.5.1

Bug: semicolons should never be indented

semicolons end queries, so they always have a depth of zero

when we reach a semicolon, open_brackets should be reset to []

between's "and" should be higher priority than other "ands"

Describe the bug
Today we don't split before a between's and, see #124
But this is wrong, we should still split before it, we should just merge it more aggressively than other ands.

To Reproduce
From Gitlab

Expected behavior

            between namespace_snapshots.valid_from::date
            and namespace_snapshots.valid_to_::date

Actual behavior

            between namespace_snapshots.valid_from
            ::date and namespace_snapshots.valid_to_::date

Additional context
What is the output of sqlfmt --version?
commit 88617d2 (0.10pre)

Bug: Does not support snapshot blocks

Describe the bug
We support common jinja blocks, but we forgot about {% snapshot my_snapshot %}, so the {% endsnapshot %} tag can get put on the wrong line.

To Reproduce

{% snapshot snp_my_snapshot %}

{{
    config(
          target_database='analytics',
          target_schema=target.schema + '_snapshots',
          unique_key='id',
          strategy='timestamp',
          updated_at='updated_at',
        )
}}

select * from {{ ref('stg_my_model') }}
{% endsnapshot %}

Expected behavior

{% snapshot snp_my_snapshot %}

{{
    config(
          target_database='analytics',
          target_schema=target.schema + '_snapshots',
          unique_key='id',
          strategy='timestamp',
          updated_at='updated_at',
        )
}}

select * 
from {{ ref('stg_my_model') }}
{% endsnapshot %}

Actual behavior

{% snapshot snp_my_snapshot %}

{{
    config(
          target_database='analytics',
          target_schema=target.schema + '_snapshots',
          unique_key='id',
          strategy='timestamp',
          updated_at='updated_at',
        )
}}

select * 
from {{ ref('stg_my_model') }}{% endsnapshot %}

Additional context
What is the output of sqlfmt --version? 0.5.1

Add more granular operator priorities to capture order of operations

e.g., today this won't merge onto a single line, since it's too long, so we leave it:

foofoofoofoofoofoofoofoofoofoo
/ barbarbarbarbarbarbar
+ foofoofoofoofoofoofoofoofoofoo
/ bazbazbazbazbazbaz
+ foofoofoofoofoofoofoofoofoofoo
/ quxquxquxquxqux

but it would be better to merge the / lines:

foofoofoofoofoofoofoofoofoofoo / barbarbarbarbarbarbar
+ foofoofoofoofoofoofoofoofoofoo / bazbazbazbazbazbaz
+ foofoofoofoofoofoofoofoofoofoo / quxquxquxquxqux

Customize formatting for certain jinja expressions/macros

Some jinja expressions contain keywords, but they're all just formatted like names.

For example, {{ dbt_utils.group_by(n) }} tends to get wrapped onto the line above it. sqlfmt doesn't have any way to know that the macro compiles to a keyword, so it wants to one-line it with the end of the where statement. If sqlfmt knew this compiled to GROUP BY, it would not do that.

I've thought about this before, and I don't have a great solution except for hard-coding it in as an exception (it's one of very few macros that start with a keyword).

other than the hacky/hard-coded method, we could consider an inline comment that could tell sqlfmt how to treat a jinja expression (if the default treatment doesn't work). So something like:

{{ dbt_utils.group_by(6) }}  -- fmt: group by

If anyone else has solutions/ideas, I'm open to them, but I do not want to go down the sqlfluff road and jinja-template the sql.

For now, I'd suggest either 1) using a -- fmt: off comment on this line, or b) not using dbt_utils.group_by() and instead implement a macro called range:

{%- macro range(n) -%}
{% for i in range(1, n + 1) -%}
    {{ i }}{{ ',' if not loop.last }}   
{%- endfor -%}
{%- endmacro -%}

and then call it with:

...
where my_condition
group by {{ range(10) }}

Originally posted by @tconbeer in #165 (comment)

Psycopg placeholders

Describe the bug
sqlfmt breaks placeholders used by psycopg (python postgres driver), such as %s and %(name)s

To Reproduce
Format the following code:

INSERT INTO some_table VALUES (%s)

INSERT INTO some_table VALUES (%(foo)s)

Expected behavior
The placeholders should remain intact:

insert into some_table values(%s)

insert into some_table values(%(foo)s)

Actual behavior
sqlfmt inserts extra spaces:

insert into some_table values(% s)

insert into some_table values(% (foo) s)

Additional context
What is the output of sqlfmt --version?
sqlfmt, version 0.9.0

Add support for ClickHouse dialect

Describe the bug
sqlfmt doesn't support the ClickHouse dialect which is case sensitive.

To Reproduce
input code:
SELECT toString(1) AS Test_string, toDateTime64('2022-05-25', 3) AS Test_dateTime64, ifNull(null, 'TestNull') as testIf, JSONExtractString('{"abc": "hello"}', 'abc') as testJSON

Expected behavior
select toString(1) as Test_string, toDateTime64('2022-05-25', 3) as Test_dateTime64, ifNull(null, 'TestNull') as testIf, JSONExtractString('{"abc": "hello"}', 'abc') as testJSON

Actual behavior
select tostring(1) as test_string, todatetime64('2022-05-25', 3) as test_datetime64, ifnull(NULL, 'TestNull') as testif, jsonextractstring('{"abc": "hello"}', 'abc') as testjson
this output is not a valid ClickHouse code

Additional context
What is the output of sqlfmt --version?
sqlfmt, version 0.9.0

Feature: format the contents of jinja tags

Today, sqlfmt does not do any formatting inside the curlies of a jinja tag. We should enforce a single space inside each curly, and use black to format the python between.

Feature: create a proper docs site

Consider not lowercasing column names

We use bigquery, which supports column names in mixed case though doesn't allow names which differ only in case:

https://cloud.google.com/bigquery/docs/schemas#column_names

A column name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. Duplicate column names are not allowed even if the case differs.

When we started using bigquery, we settle on (perhaps misguided) approach of using camelCase for column names. Now it is too late to fix it as that convention is used in hundreds of column names. It works mostly fine - as bigquery persists the case of column name.

Currently sqlfmt lowercases column names. An alternative way to think is that column names are like variable names and it is responsibility of author to get it right - so sqlfmt should just leave it as is.

If it is really common for sqlfmt users to use inconsistent case, and so it is a desirable feature to convert it all to lowercase - perhaps considering an option to leave column names alone.

Line break introduced in the middle of `cross join`

Hi, wanted to lead off by saying how excited we are by this tool. Thanks for your work on it. Now, to the issue at hand.

sqlfmt 0.4.2 formats this query with a line break between "cross" and "join":

select foo.*, bar.*
from {{ ref('foo_foo_foo') }} as foo cross
join {{ ref('bar_bar_bar') }} as bar

I would prefer to have cross join together on the new line, like so:

select foo.*, bar.*
from {{ ref('foo_foo_foo') }} as foo
cross join {{ ref('bar_bar_bar') }} as bar

Feel free to close the issue if this is intended behavior, but I found it surprising and suspect it might be a bug.

Refactor: Separate dev and test dependencies

#98 should have been caught in CI, but wasn't, since our test env also installs black by default.

Poetry's next release will support dependency groups, which we can use to solve this. Just waiting for that new version to be released.

We should also use this opportunity to clean up the Lint and Test workflow so we only lint the code once.

missing space around union all when formatting dbt_utils.unpivot

Describe the bug
When using the dbt_utils.unpivot SQL Generator after sqlfmt has run the compiled SQL is missing spaces around the union all between select statements. This does not happen if the statement is not put through sqlfmt first. A working format for the unpivot generator is provided under additional context
To Reproduce
sqlfmt . && dbt build -s <model using unpivot>

Expected behavior
When complied each union all should be separated:

WITH unpivoted AS (
    

    select

      cast('ID' as 
    varchar
) as column_name,
      cast(  
           ID
             
           as decimal) as percentage

    from <database>.<schema>.<model>

    union all
    select

      cast('USERNAME' as 
    varchar
) as column_name,
      cast(  
           USERNAME
             
           as decimal) as percentage

    from <database>.<schema>.<model>

    union all
    select

      cast('LAST_NAME' as 
    varchar
) as column_name,
      cast(  
           LAST_NAME
             
           as decimal) as percentage

    from <database>.<schema>.<model>

    union all
    select ...

Actual behavior

with
    unpivoted as (
        

select

    cast('ID' as  varchar) as column_name,
    cast(
         ID
         as decimal
    ) as percentage

from <database>.<schema>.<model>union allselect

    cast('USERNAME' as  varchar) as column_name,
    cast(
         USERNAME
         as decimal
    ) as percentage

from <database>.<schema>.<model>union allselect

    cast('LAST_NAME' as  varchar) as column_name,
    cast(
         LAST_NAME
         as decimal
    ) as percentage

from <database>.<schema>.<model>union allselect...

Additional context
What is the output of sqlfmt --version?
sqlfmt, version 0.6.0

Working format

 WITH unpivoted AS (
     {{ dbt_utils.unpivot(
         relation = <model_name>,
         cast_to = 'decimal',
         field_name = 'column_name',
         value_name = 'percentage'
     ) }}
 )
 SELECT
     *
 FROM
     unpivoted

Feature: One-line short queries between semicolons or set operations

Describe the feature
When we have multiple queries in a single file, it would be nice if those queries were formatted the same as if they were in their own files. These queries may be separated by semicolons or set operations, like union

Desired behavior

select * from my_table

union all

select * from another_table
;

select * from another_query_entirely

Current behavior

select * 
from my_table

union all

select * 
from another_table
;

select *
from another_query_entirely

Additional context
What is the output of sqlfmt --version?
0.5.1

Feature: create a VS Code extension to format on save

While it's possible to run sqlfmt from the terminal inside VS Code, they support 3rd party formatters via their extensions API, and it would be great if we could support that.

The extension has to be written in Typescript.

There is a scaffold to get started: https://code.visualstudio.com/api/get-started/your-first-extension

As an example of a formatting extension, someone created this for black (before the official python extension added black support): https://github.com/joslarson/black-vscode

jinjafmt adds trailing comma to macro args, breaking it

Describe the bug
Jinja doesn't like trailing commas in macro arguments (?) but black inserts them when the args wrap to multiple lines

To Reproduce

{%- macro my_macro(
    arg_here,
    kwarg1=true,
    kwarg2="''",
    kwarg3="model",
    kwarg4="permanent",
    kwarg5=256
) -%}

Expected behavior

{%- macro my_macro(
    arg_here,
    kwarg1=true,
    kwarg2="''",
    kwarg3="model",
    kwarg4="permanent",
    kwarg5=256
) -%}

Actual behavior

{%- macro my_macro(
    arg_here,
    kwarg1=true,
    kwarg2="''",
    kwarg3="model",
    kwarg4="permanent",
    kwarg5=256,
) -%}

Unfortunately this doesn't compile because of the trailing comma after 256

Additional context
What is the output of sqlfmt --version?
0.6.0

Don't always stubborn merge lines that open brackets

Describe the bug
This is ugly. We force merge the lines that have function calls, since they open brackets, but we probably shouldn't, so that all the sibling operators stay at the same level.

The new rule should be: don't stubborn-merge p1 operators that open brackets if the previous segment also starts with an operator. Also, consider that p1 operators might need a different sequence of stubborn-merge strategies from p0 operators.

To Reproduce
From gitlab's project (indented a few more times, so it doesn't fit on a single line):

SELECT 
    LOWER(table_catalog)||'.'||LOWER(table_schema)||'.'||LOWER(table_name) AS fqd_name,
    ...

Expected behavior

select
    lower(table_catalog)
    || '.' 
    || lower(table_schema)
    || '.'
    || lower(table_name) as fqd_name,

Actual behavior

select
    lower(table_catalog)
    || '.' || lower(
        table_schema
    )
    || '.'
    || lower(table_name) as fqd_name,

Additional context
What is the output of sqlfmt --version?

aliases should always merge if there is space

Describe the bug
Sometimes aliases don't get merged onto the line above, even when there is space. Could we be using the token's raw prefix somewhere, instead of the computed prefix?

To Reproduce
from the gitlab project

), final AS (

    SELECT DISTINCT
      ...
      state_or_province                                                                                                  AS geozone_state_or_province,
      MIN(valid_from) OVER (PARTITION BY country,state_or_province,locality_group)::DATE                                 AS first_file_date,
      MAX(valid_to) OVER (PARTITION BY country,state_or_province,locality_group)::DATE                                   AS last_file_date,
      ...
)

Expected behavior

    ),
    final as (
        select distinct
            ...
            min(valid_from) over (
                partition by country, state_or_province, locality_group
            )::date as first_file_date,
            max(valid_to) over (
                partition by country, state_or_province, locality_group
            )::date as last_file_date,
            ...

Actual behavior

    ),
    final as (
        select distinct
            ...
            min(valid_from) over (
                partition by country, state_or_province, locality_group
            )::date
            as first_file_date,
            max(valid_to) over (
                partition by country, state_or_province, locality_group
            )::date
            as last_file_date,
            ...

Additional context
What is the output of sqlfmt --version?

between's "and" split like logical operator

Describe the bug
When using between ___ and ____, the line is always split before the and

To Reproduce
Use the between operator in a where clause, such as the following:

select
    radio,
    mcc,
    net as mnc,
    area as lac,
    cell % 65536 as cid,
    cell / 65536 as rnc,
    cell as long_cid,
    lon,
    lat
from
    towershift
where
    radio != 'CDMA'
    and mcc between 200 and 799
    and net between 1 and 999
    and area between 0 and 65535
    and cell between 0 and 268435455
    and lon between -180 and 180
    and lat between -90 and 90

Expected behavior

select
    radio,
    mcc,
    net as mnc,
    area as lac,
    cell % 65536 as cid,
    cell / 65536 as rnc,
    cell as long_cid,
    lon,
    lat
from towershift
where
    radio != 'CDMA'
    and mcc between 200 and 799
    and net between 1 and 999
    and area between 0 and 65535
    and cell between 0 and 268435455
    and lon between -180 and 180
    and lat between -90 and 90

Actual behavior

select
    radio,
    mcc,
    net as mnc,
    area as lac,
    cell % 65536 as cid,
    cell / 65536 as rnc,
    cell as long_cid,
    lon,
    lat
from towershift
where
    radio != 'CDMA'
    and mcc between 200
    and 799
    and net between 1
    and 999
    and area between 0
    and 65535
    and cell between 0
    and 268435455
    and lon between -180
    and 180
    and lat between -90
    and 90

Additional context

% sqlfmt --version
sqlfmt, version 0.5.0

sqlfmt -l 80 test.sql
1 file formatted.
0 files left unchanged.
test.sql formatted.

refactor merger.py to use a Segment class

Segment is a List[Line]. Would really help with readability/maintenance of merger.py to have segment methods in a different class:

merger._split_into_segments becomes segment.create_segments_from_lines
merger._tail_closes_head becomes Segment.tail_closes_head
merger._get_first_nonblank_line becomes Segment.head (add a Segment.tail also)
merger._segment_continues_operator_sequence becomes Segment.continues_operator_sequence
merger._get_remainder_of_segment becomes Segment.split

Fork primer repos to make it easier to detect changes

Primer should run against already-formatted repos.

stubborn merge segment after a singleton operator

Describe the bug
If an operator ends up on a line by itself (after splitting), we don't try hard enough to merge the segment after that operator into the operator's line, which can result in ugly output (for somewhat pathological input)

To Reproduce
From gitlab's project:

{{column|lower}} || 
ENCRYPT_RAW(
    to_binary('{{ get_salt(column|lower) }}', 'utf-8'), 
    to_binary('{{ env_var("SALT_PASSWORD") }}', 'HEX'), 
    to_binary('416C736F4E637265FFFFFFAB', 'HEX')
)

Expected behavior

{{ column | lower }} || encrypt_raw(
    to_binary('{{ get_salt(column|lower) }}', 'utf-8'),
    to_binary('{{ env_var("SALT_PASSWORD") }}', 'HEX'),
    to_binary('416C736F4E637265FFFFFFAB', 'HEX')
)

Actual behavior

{{ column | lower }} ||
encrypt_raw(
    to_binary('{{ get_salt(column|lower) }}', 'utf-8'),
    to_binary('{{ env_var("SALT_PASSWORD") }}', 'HEX'),
    to_binary('416C736F4E637265FFFFFFAB', 'HEX')
)

Additional context
What is the output of sqlfmt --version?
Commit 88617d2 (v0.10)

Unstable formatting for jinja config block

Describe the bug
The formatter is unstable with dbt config blocks.

Likely related to #175 but not sure.

To Reproduce
Start with a simple file with a config block definition, such as config_block.txt, and make a minor change. Run the formatter and then make another minor change and run the formatter again. You should see the indentation of the config variables increasing each time (4 more spaces).

Expected behavior
The formatter should produce a stable result.

Actual behavior
The formatter is unstable because it keeps indenting the config variables.

Additional context
sqlfmt, version 0.9.0

Fix "method_complexity" issue in src/sqlfmt/merger.py

Function _maybe_merge_lines_split_by_operators has a Cognitive Complexity of 24 (exceeds 5 allowed). Consider refactoring.

https://codeclimate.com/github/tconbeer/sqlfmt/src/sqlfmt/merger.py#issue_61fc0af0ef6b070001000051

Improve formatting for chained boolean operators with complex expressions

Describe the bug
Reported by @Rainymood on discourse.

We're not very smart about building a hierarchy of chained boolean expressions; mixing simple expressions with complex expressions can produce unexpected and hard-to-read formatting.

To Reproduce

select *
from historic_orders h
where 1 = 1
    and order_type = 'clothing'
    and category = "pants"
    and dt_utc > date('2019-10-01')
    and is_test is false

Expected behavior

select *
from historic_orders h
where
    1 = 1
    and order_type = 'clothing'
    and category = "pants"
    and dt_utc > date('2019-10-01')
    and is_test is false

Actual behavior

select *
from historic_orders h
where
    1 = 1 and order_type = 'clothing' and category = "pants" and dt_utc > date(
        '2020-10-01'
    ) and is_test is false

Additional context
What is the output of sqlfmt --version?
0.8.0

Fix: Support numbered field references, `$2`, `my_table.$3`, and stage references, `@my_stage`

These currently raise a parsing error, but:

they are allowed in a select statement by snowflake: https://docs.snowflake.com/en/sql-reference/sql/select.html
they reference arguments in pg udfs: https://www.postgresql.org/docs/14/sql-select.html

Feature: read config from pyproject.toml

sqlfmt does not have many configuration options, but it would be nice to be able to set the config we do have in a pyproject.toml file in the root of a project directory

Fix "complex_logic" issue in src/sqlfmt/line.py

Consider simplifying this complex logical expression.

https://codeclimate.com/github/tconbeer/sqlfmt/src/sqlfmt/line.py#issue_61f46a9f597a09000100003d

Remove extra newlines and whitespace inside of tokens

Describe the bug
It's possible for a single token to be composed of multiple words, like union all, select top 25, not in, etc. We parse those tokens even if they contain newlines or extra whitespace, like union\n\n all, but we don't then remove those extra characters from the formatted output

To Reproduce

union

 all

Expected behavior

union all

Actual behavior

union

 all

Additional context
What is the output of sqlfmt --version?
0.9.0

Feature: exclude paths

other tools support this; would be helpful to avoid formatting the target and dbt_packages directories in a dbt project

Report Test Coverage to CodeClimate

docs: https://docs.codeclimate.com/docs/github-actions-test-coverage

[bug] dbt_utils.group_by gets trailing whitespace removed, breaking queries

Describe the bug
A clear and concise description of what the bug is.

It looks like whitespace is removed from {{ dbt_utils.group_by }}, sticking it directly into the clause above (a where clause, for example), which breaks the query.

To Reproduce
Steps to reproduce the behavior. If this is a formatting issue, include the input code here.

If you sqlfmt a file that looks like this:

select
  ...
from <table_name>
where <where_clause>
{{ dbt_utils.group_by(10) }}

Expected behavior
A clear and concise description of what you expected to happen. If this is a formatting issue, include the expected formatting here, by manually formatting your code from above.

select
  ...
from <table_name>
where <where_clause>
{{ dbt_utils.group_by(10) }}

Actual behavior
Provide any output generated by sqlfmt here. If this is a formatting issue, include the code generated by sqlfmt.

select
  ...
from <table_name>
where <where_clause>{{ dbt_utils.group_by(10) }}

Additional context
What is the output of sqlfmt --version?

0.6.0

`black` dependency not installed via `pip`

Issue

When doing a first-time, fresh install of sqlfmt, I get a ModuleNotFoundError for black.

I added shandy-sqlfmt==0.4.2 to the requirements.txt in my dbt project, and—in a clean virtualenv—ran pip install -r requirements.txt.

(I know the recommended install method is pipx, but since pip is given as an option, I figured I'd log this FWIW.)

Workaround:

pip install black in the virtualenv
Add the following additional_dependencies line to .pre-commit-config.yaml:

- repo: https://github.com/tconbeer/sqlfmt
  rev: v0.4.2
  hooks:
    - id: sqlfmt
      language_version: python
      additional_dependencies: ['black']
      args: [--line-length=90]

sqlfmt works 👍

Provide workaround for jinja macros that use reserved words

Describe the bug
jinjafmt passes jinja to black, but jinja allows some things that python does not. Specifically, variables (esp. macro arguments) can have names that are reserved words. For example:

{{ dbt_utils.star(from=ref('snowplow_page_views_30'), except=fields_to_exclude|upper) }}

This is valid jinja and dbt sql (from gitlab's project), but from and except are both reserved words in python. Running black -c "from" returns error: cannot format <string>: Cannot parse: 1:4: from

Right now jinjafmt no-ops in these situations, but it wouldn't be that hard to replace these reserved words (followed by an eq, e.g. from=) with a sentinel (e.g., from__sqlfmt__=), format that, and then replace the sentinel with the keyword again. There are 33 reserved words in python

Space added when flattening columns on Snowflake

Given a column with the following sql

        _airbyte_data:"AccountId"::varchar as accountid,

sqlfmt will add additional spaces around : like so

            _airbyte_data : "AccountId"::varchar as accountid,

Union gets improperly indented

Describe the bug
I believe that union gets improperly indented. The first and second blocks are of the same "importance" but union indents it because I assume it thinks it's a function and the opening parentheses (() at the start. I think this is a bug.

To Reproduce
Paste the query in http://sqlfmt.com/

Input code

(
    select *
    from "data_warehouse"."order_status"
    where var1 is not null and var2 is not null
    limit 100
)
union
    (
        select *
        from "data_warehouse"."order_status"
        where var1 is not null and var2 is not null
        limit 100
    )

Expected behavior

(
    select *
    from "data_warehouse"."order_status"
    where var1 is not null and var2 is not null
    limit 100
)
union
(
    select *
    from "data_warehouse"."order_status"
    where var1 is not null and var2 is not null
    limit 100
)

Actual behavior

(
    select *
    from "data_warehouse"."order_status"
    where var1 is not null and var2 is not null
    limit 100
)
union
    (
        select *
        from "data_warehouse"."order_status"
        where var1 is not null and var2 is not null
        limit 100
    )

jinjafmt strips parens from return macro, breaking it

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

{{- return(adapter.dispatch('my_macro', 'my_package')(arg1, arg2)) -}}

Expected behavior

{{- return(adapter.dispatch('my_macro', 'my_package')(arg1, arg2)) -}}

Actual behavior
This is not valid jinja, and won't compile

{{- return adapter.dispatch('my_macro', 'my_package')(arg1, arg2) -}}

Additional context
What is the output of sqlfmt --version?
0.6.0

add support for the `within group` keyword

Describe the bug
add support for the within group keyword, which seems analagous to over, e.g.,

array_agg(distinct my_field) within group (order by another_field)

This is necessary for proper formatting when this spills onto multiple lines, as below.

To Reproduce
from gitlab's project

, services_by_marketing_contact_id AS (

    SELECT
      marketing_contact_order.dim_marketing_contact_id                                           AS dim_marketing_contact_id,
      COUNT(*)                                                                                   AS pql_nbr_integrations_installed,
      ARRAY_AGG(DISTINCT services.service_type) WITHIN GROUP (ORDER BY services.service_type)    AS pql_integrations_installed
    FROM services
    LEFT JOIN project
      ON services.project_id = project.dim_project_id
    LEFT JOIN marketing_contact_order
      ON marketing_contact_order.dim_namespace_id = project.dim_namespace_id
    GROUP BY 1

), users_role_by_marketing_contact_id AS (

    SELECT
      marketing_contact_order.dim_marketing_contact_id,
      ARRAY_AGG(DISTINCT marketing_contact.job_title) WITHIN GROUP (ORDER BY marketing_contact.job_title) AS pql_namespace_creator_job_description
    FROM marketing_contact_order
    ...

Expected behavior

services_by_marketing_contact_id as (
    select
        marketing_contact_order.dim_marketing_contact_id as dim_marketing_contact_id,
        count(*) as pql_nbr_integrations_installed,
        array_agg(distinct services.service_type) within group (
            order by services.service_type
        ) as pql_integrations_installed
    from services
    left join project on services.project_id = project.dim_project_id
    left join
        marketing_contact_order
        on marketing_contact_order.dim_namespace_id = project.dim_namespace_id
    group by 1
),
users_role_by_marketing_contact_id as (

    select
        marketing_contact_order.dim_marketing_contact_id,
        array_agg(distinct marketing_contact.job_title) within group (
            order by marketing_contact.job_title
        ) as pql_namespace_creator_job_description
    from marketing_contact_order
    ...

Actual behavior

services_by_marketing_contact_id as (
    select
        marketing_contact_order.dim_marketing_contact_id as dim_marketing_contact_id,
        count(*) as pql_nbr_integrations_installed,
        array_agg(
            distinct services.service_type) within group(order by services.service_type
        ) as pql_integrations_installed
    from services
    left join project on services.project_id = project.dim_project_id
    left join
        marketing_contact_order
        on marketing_contact_order.dim_namespace_id = project.dim_namespace_id
    group by 1
),
users_role_by_marketing_contact_id as (

    select
        marketing_contact_order.dim_marketing_contact_id,
        array_agg(
            distinct marketing_contact.job_title
        ) within group(order by marketing_contact.job_title
        ) as pql_namespace_creator_job_description
    from marketing_contact_order
    ...

Additional context
What is the output of sqlfmt --version?

Fix "file_lines" issue in src/sqlfmt/line.py

File line.py has 785 lines of code (exceeds 250 allowed). Consider refactoring.

https://codeclimate.com/github/tconbeer/sqlfmt/src/sqlfmt/line.py#issue_61f46a9f597a09000100003e

tconbeer / sqlfmt Goto Github PK

sqlfmt's Introduction

sqlfmt

Documentation

Installation

Try it first

Install Using pipx (recommended)

Getting Started

Other prerequisites

Using sqlfmt

Configuring sqlfmt using pyproject.toml

The jinjafmt extra

Using sqlfmt with different SQL dialects

Integrations

dbt

Other Integrations

The sqlfmt style

Why lowercase?

Why trailing commas?

Contributing

Providing Feedback

Setting up Your Dev Environment and Running Tests

Updating primer repos to reflect formatting changes

sqlfmt's People

Contributors

Stargazers

Watchers

Forkers

sqlfmt's Issues

Issue

Workaround:

Recommend Projects

Recommend Topics

Recommend Org