tconbeer / sqlfmt Goto Github PK

View Code? Open in Web Editor NEW

352.0 4.0 14.0 1.46 MB

sqlfmt formats your dbt SQL files so you don't have to

Home Page: https://sqlfmt.com

License: Apache License 2.0

Makefile 0.24% Python 98.28% PLpgSQL 1.45% Dockerfile 0.04%

sql python dbt formatter

sqlfmt's Issues

[bug] dbt_utils.group_by gets trailing whitespace removed, breaking queries

Describe the bug
A clear and concise description of what the bug is.

It looks like whitespace is removed from {{ dbt_utils.group_by }}, sticking it directly into the clause above (a where clause, for example), which breaks the query.

To Reproduce
Steps to reproduce the behavior. If this is a formatting issue, include the input code here.

If you sqlfmt a file that looks like this:

select
  ...
from <table_name>
where <where_clause>
{{ dbt_utils.group_by(10) }}

Expected behavior
A clear and concise description of what you expected to happen. If this is a formatting issue, include the expected formatting here, by manually formatting your code from above.

select
  ...
from <table_name>
where <where_clause>
{{ dbt_utils.group_by(10) }}

Actual behavior
Provide any output generated by sqlfmt here. If this is a formatting issue, include the code generated by sqlfmt.

select
  ...
from <table_name>
where <where_clause>{{ dbt_utils.group_by(10) }}

Additional context
What is the output of sqlfmt --version?

0.6.0

Unstable formatting with chained jinja expressions and operators

Describe the bug
sqlfmt should always produce the same output on a given query, even when run multiple times. When this is not the case, we call this "unstable formatting".

There is an interaction between jinja expressions (and jinjafmt) and operators that is causing unstable formatting.

To Reproduce
This file in the gitlab project has already been formatted by 0.8.0.

Run it through sqlfmt --diff

Expected behavior
sqlfmt should no-op

Actual behavior

--- source_query
+++ formatted_query
@@ -29,25 +29,38 @@
         as dim_sales_qualified_source_id,
         {{ get_keyed_nulls("order_type.dim_order_type_id") }} as dim_order_type_id
     from {{ ref("sheetload_sales_funnel_targets_matrix_source") }}
-    left join date on {{
+    left join
+        date
+        on
+        {{
             sales_funnel_text_slugify(
                 "sheetload_sales_funnel_targets_matrix_source.month"
             )
-        }} = {{ sales_funnel_text_slugify("date.fiscal_month_name_fy") }}
-    left join sales_qualified_source on {{
+        }}
+        = {{ sales_funnel_text_slugify("date.fiscal_month_name_fy") }}
+    left join
+        sales_qualified_source
+        on
+        {{
             sales_funnel_text_slugify(
                 "sheetload_sales_funnel_targets_matrix_source.opportunity_source"
             )
-        }} = {{
+        }}
+        =
+        {{
             sales_funnel_text_slugify(
                 "sales_qualified_source.sales_qualified_source_name"
             )
         }}
-    left join order_type on {{
+    left join
+        order_type
+        on
+        {{
             sales_funnel_text_slugify(
                 "sheetload_sales_funnel_targets_matrix_source.order_type"
             )
-        }} = {{ sales_funnel_text_slugify("order_type.order_type_name") }}
+        }}
+        = {{ sales_funnel_text_slugify("order_type.order_type_name") }}

 ),
 fy22_user_hierarchy as (
@@ -94,11 +107,9 @@
     from target_matrix
     left join
         fy22_user_hierarchy
-        on {{ sales_funnel_text_slugify("target_matrix.area") }} = {{
-            sales_funnel_text_slugify(
-                "fy22_user_hierarchy.crm_opp_owner_area_stamped"
-            )
-        }}
+        on {{ sales_funnel_text_slugify("target_matrix.area") }}
+        =
+        {{ sales_funnel_text_slugify("fy22_user_hierarchy.crm_opp_owner_area_stamped") }}
     where target_matrix.fiscal_year = 2022

     union all
@@ -126,7 +137,8 @@
     left join
         fy23_and_beyond_user_hierarchy
         on {{ sales_funnel_text_slugify("target_matrix.area") }}
-        = {{
+        =
+        {{
             sales_funnel_text_slugify(
                 "fy23_and_beyond_user_hierarchy.crm_opp_owner_sales_segment_geo_region_area_stamped"
             )

Additional context
What is the output of sqlfmt --version?
0.8.0

Fix: Support numbered field references, `$2`, `my_table.$3`, and stage references, `@my_stage`

These currently raise a parsing error, but:

they are allowed in a select statement by snowflake: https://docs.snowflake.com/en/sql-reference/sql/select.html
they reference arguments in pg udfs: https://www.postgresql.org/docs/14/sql-select.html

Space added when flattening columns on Snowflake

Given a column with the following sql

        _airbyte_data:"AccountId"::varchar as accountid,

sqlfmt will add additional spaces around : like so

            _airbyte_data : "AccountId"::varchar as accountid,

aliases should always merge if there is space

Describe the bug
Sometimes aliases don't get merged onto the line above, even when there is space. Could we be using the token's raw prefix somewhere, instead of the computed prefix?

To Reproduce
from the gitlab project

), final AS (

    SELECT DISTINCT
      ...
      state_or_province                                                                                                  AS geozone_state_or_province,
      MIN(valid_from) OVER (PARTITION BY country,state_or_province,locality_group)::DATE                                 AS first_file_date,
      MAX(valid_to) OVER (PARTITION BY country,state_or_province,locality_group)::DATE                                   AS last_file_date,
      ...
)

Expected behavior

    ),
    final as (
        select distinct
            ...
            min(valid_from) over (
                partition by country, state_or_province, locality_group
            )::date as first_file_date,
            max(valid_to) over (
                partition by country, state_or_province, locality_group
            )::date as last_file_date,
            ...

Actual behavior

    ),
    final as (
        select distinct
            ...
            min(valid_from) over (
                partition by country, state_or_province, locality_group
            )::date
            as first_file_date,
            max(valid_to) over (
                partition by country, state_or_province, locality_group
            )::date
            as last_file_date,
            ...

Additional context
What is the output of sqlfmt --version?

add support for the `within group` keyword

Describe the bug
add support for the within group keyword, which seems analagous to over, e.g.,

array_agg(distinct my_field) within group (order by another_field)

This is necessary for proper formatting when this spills onto multiple lines, as below.

To Reproduce
from gitlab's project

, services_by_marketing_contact_id AS (

    SELECT
      marketing_contact_order.dim_marketing_contact_id                                           AS dim_marketing_contact_id,
      COUNT(*)                                                                                   AS pql_nbr_integrations_installed,
      ARRAY_AGG(DISTINCT services.service_type) WITHIN GROUP (ORDER BY services.service_type)    AS pql_integrations_installed
    FROM services
    LEFT JOIN project
      ON services.project_id = project.dim_project_id
    LEFT JOIN marketing_contact_order
      ON marketing_contact_order.dim_namespace_id = project.dim_namespace_id
    GROUP BY 1

), users_role_by_marketing_contact_id AS (

    SELECT
      marketing_contact_order.dim_marketing_contact_id,
      ARRAY_AGG(DISTINCT marketing_contact.job_title) WITHIN GROUP (ORDER BY marketing_contact.job_title) AS pql_namespace_creator_job_description
    FROM marketing_contact_order
    ...

Expected behavior

services_by_marketing_contact_id as (
    select
        marketing_contact_order.dim_marketing_contact_id as dim_marketing_contact_id,
        count(*) as pql_nbr_integrations_installed,
        array_agg(distinct services.service_type) within group (
            order by services.service_type
        ) as pql_integrations_installed
    from services
    left join project on services.project_id = project.dim_project_id
    left join
        marketing_contact_order
        on marketing_contact_order.dim_namespace_id = project.dim_namespace_id
    group by 1
),
users_role_by_marketing_contact_id as (

    select
        marketing_contact_order.dim_marketing_contact_id,
        array_agg(distinct marketing_contact.job_title) within group (
            order by marketing_contact.job_title
        ) as pql_namespace_creator_job_description
    from marketing_contact_order
    ...

Actual behavior

services_by_marketing_contact_id as (
    select
        marketing_contact_order.dim_marketing_contact_id as dim_marketing_contact_id,
        count(*) as pql_nbr_integrations_installed,
        array_agg(
            distinct services.service_type) within group(order by services.service_type
        ) as pql_integrations_installed
    from services
    left join project on services.project_id = project.dim_project_id
    left join
        marketing_contact_order
        on marketing_contact_order.dim_namespace_id = project.dim_namespace_id
    group by 1
),
users_role_by_marketing_contact_id as (

    select
        marketing_contact_order.dim_marketing_contact_id,
        array_agg(
            distinct marketing_contact.job_title
        ) within group(order by marketing_contact.job_title
        ) as pql_namespace_creator_job_description
    from marketing_contact_order
    ...

Additional context
What is the output of sqlfmt --version?

Bug: Jinjafmt adds space after dbt_utils macro call (Can't reproduce)

Describe the bug
Jinjafmt adds a space between the macro name and its opening parens. Thankfully jinja doesn't seem to mind, but it's bad style.

To Reproduce

{{ dbt_utils.surrogate_key(['users.id', "'users'"]) }} as id,

Expected behavior

{{ dbt_utils.surrogate_key(['users.id', "'users'"]) }} as id,

Actual behavior

{{ dbt_utils.surrogate_key (['users.id', "'users'"]) }} as id,

Additional context
What is the output of sqlfmt --version?
0.5.1

Unstable formatting for jinja config block

Describe the bug
The formatter is unstable with dbt config blocks.

Likely related to #175 but not sure.

To Reproduce
Start with a simple file with a config block definition, such as config_block.txt, and make a minor change. Run the formatter and then make another minor change and run the formatter again. You should see the indentation of the config variables increasing each time (4 more spaces).

Expected behavior
The formatter should produce a stable result.

Actual behavior
The formatter is unstable because it keeps indenting the config variables.

Additional context
sqlfmt, version 0.9.0

Try to replace `~` (jinja string concatenation) with another operator before formatting with black

Also consider replacing ~ (jinja string concatenation) with + or similar?

Search for an unused operator (+, -, *, /) and substitute that. black doesn't care that you can't actually divide strings.

Originally posted by @tconbeer in #177 (comment)

Add support for ClickHouse dialect

Describe the bug
sqlfmt doesn't support the ClickHouse dialect which is case sensitive.

To Reproduce
input code:
SELECT toString(1) AS Test_string, toDateTime64('2022-05-25', 3) AS Test_dateTime64, ifNull(null, 'TestNull') as testIf, JSONExtractString('{"abc": "hello"}', 'abc') as testJSON

Expected behavior
select toString(1) as Test_string, toDateTime64('2022-05-25', 3) as Test_dateTime64, ifNull(null, 'TestNull') as testIf, JSONExtractString('{"abc": "hello"}', 'abc') as testJSON

Actual behavior
select tostring(1) as test_string, todatetime64('2022-05-25', 3) as test_datetime64, ifnull(NULL, 'TestNull') as testif, jsonextractstring('{"abc": "hello"}', 'abc') as testjson
this output is not a valid ClickHouse code

Additional context
What is the output of sqlfmt --version?
sqlfmt, version 0.9.0

Bug: Unstable indentation for multiline jinja

Describe the bug
Sometimes sqlfmt increases the indentation on a mulitline jinja tag when it's run multiple times.

To Reproduce

with
    base_spine as (
        {{
            dbt_utils.date_spine(
                datepart="day",
                start_date="'2021-01-01'::date",
                end_date="sysdate() + interval '1 year'"
            )
        }}
    ),

Expected behavior

with
    base_spine as (
        {{
            dbt_utils.date_spine(
                datepart="day",
                start_date="'2021-01-01'::date",
                end_date="sysdate() + interval '1 year'"
            )
        }}
    ),

Actual behavior

with
    base_spine as (
        {{
            dbt_utils.date_spine(
                                                    datepart="day",
                                                    start_date="'2021-01-01'::date",
                                                    end_date="sysdate() + interval '1 year'"
                                                )
        }}
    ),

Additional context
What is the output of sqlfmt --version?
0.5.1

Fix "file_lines" issue in src/sqlfmt/line.py

File line.py has 785 lines of code (exceeds 250 allowed). Consider refactoring.

https://codeclimate.com/github/tconbeer/sqlfmt/src/sqlfmt/line.py#issue_61f46a9f597a09000100003e

More unstable indentation in multiline jinja

Describe the bug
Looks like #138 should not have been closed.

I suspect this is happening when jinjafmt is enabled, but black is not able to parse the string, so we abort formatting; we don't detect this today and probably need to handle this case and not modify the indentation (same as other instances where jinjafmt no-ops)

To Reproduce

...
         select
             {{
                 dbt_utils.star(
                      from=ref('prep_crm_user'),
                      except=['CREATED_BY','UPDATED_BY','MODEL_CREATED_DATE','MODEL_UPDATED_DATE','DBT_UPDATED_AT','DBT_CREATED_AT']                           
                )
             }}
         from {{ ref("prep_crm_user") }}
...

Expected behavior
This was produced by 0.8.0, so it should no-op

Actual behavior

...
         select
             {{
                 dbt_utils.star(
                              from=ref('prep_crm_user'),
                              except=['CREATED_BY','UPDATED_BY','MODEL_CREATED_DATE','MODEL_UPDATED_DATE','DBT_UPDATED_AT','DBT_CREATED_AT']                           
                        )
             }}
         from {{ ref("prep_crm_user") }}
...

Additional context
What is the output of sqlfmt --version?
0.8.0

jinjafmt adds trailing comma to macro args, breaking it

Describe the bug
Jinja doesn't like trailing commas in macro arguments (?) but black inserts them when the args wrap to multiple lines

To Reproduce

{%- macro my_macro(
    arg_here,
    kwarg1=true,
    kwarg2="''",
    kwarg3="model",
    kwarg4="permanent",
    kwarg5=256
) -%}

Expected behavior

{%- macro my_macro(
    arg_here,
    kwarg1=true,
    kwarg2="''",
    kwarg3="model",
    kwarg4="permanent",
    kwarg5=256
) -%}

Actual behavior

{%- macro my_macro(
    arg_here,
    kwarg1=true,
    kwarg2="''",
    kwarg3="model",
    kwarg4="permanent",
    kwarg5=256,
) -%}

Unfortunately this doesn't compile because of the trailing comma after 256

Additional context
What is the output of sqlfmt --version?
0.6.0

Union gets improperly indented

Describe the bug
I believe that union gets improperly indented. The first and second blocks are of the same "importance" but union indents it because I assume it thinks it's a function and the opening parentheses (() at the start. I think this is a bug.

To Reproduce
Paste the query in http://sqlfmt.com/

Input code

(
    select *
    from "data_warehouse"."order_status"
    where var1 is not null and var2 is not null
    limit 100
)
union
    (
        select *
        from "data_warehouse"."order_status"
        where var1 is not null and var2 is not null
        limit 100
    )

Expected behavior

(
    select *
    from "data_warehouse"."order_status"
    where var1 is not null and var2 is not null
    limit 100
)
union
(
    select *
    from "data_warehouse"."order_status"
    where var1 is not null and var2 is not null
    limit 100
)

Actual behavior

(
    select *
    from "data_warehouse"."order_status"
    where var1 is not null and var2 is not null
    limit 100
)
union
    (
        select *
        from "data_warehouse"."order_status"
        where var1 is not null and var2 is not null
        limit 100
    )

Fix "method_complexity" issue in src/sqlfmt/merger.py

Function _maybe_merge_lines_split_by_operators has a Cognitive Complexity of 24 (exceeds 5 allowed). Consider refactoring.

https://codeclimate.com/github/tconbeer/sqlfmt/src/sqlfmt/merger.py#issue_61fc0af0ef6b070001000051

Refactor: Separate dev and test dependencies

#98 should have been caught in CI, but wasn't, since our test env also installs black by default.

Poetry's next release will support dependency groups, which we can use to solve this. Just waiting for that new version to be released.

We should also use this opportunity to clean up the Lint and Test workflow so we only lint the code once.

between's "and" should be higher priority than other "ands"

Describe the bug
Today we don't split before a between's and, see #124
But this is wrong, we should still split before it, we should just merge it more aggressively than other ands.

To Reproduce
From Gitlab

Expected behavior

            between namespace_snapshots.valid_from::date
            and namespace_snapshots.valid_to_::date

Actual behavior

            between namespace_snapshots.valid_from
            ::date and namespace_snapshots.valid_to_::date

Additional context
What is the output of sqlfmt --version?
commit 88617d2 (0.10pre)

Fix "file_lines" issue in src/sqlfmt/line.py

File line.py has 785 lines of code (exceeds 250 allowed). Consider refactoring.

https://codeclimate.com/github/tconbeer/sqlfmt/src/sqlfmt/line.py#issue_61f46a9f597a09000100003e

Report Test Coverage to CodeClimate

docs: https://docs.codeclimate.com/docs/github-actions-test-coverage

Feature: create a proper docs site

Feature: create a VS Code extension to format on save

While it's possible to run sqlfmt from the terminal inside VS Code, they support 3rd party formatters via their extensions API, and it would be great if we could support that.

The extension has to be written in Typescript.

There is a scaffold to get started: https://code.visualstudio.com/api/get-started/your-first-extension

As an example of a formatting extension, someone created this for black (before the official python extension added black support): https://github.com/joslarson/black-vscode

Provide workaround for jinja macros that use reserved words

Describe the bug
jinjafmt passes jinja to black, but jinja allows some things that python does not. Specifically, variables (esp. macro arguments) can have names that are reserved words. For example:

{{ dbt_utils.star(from=ref('snowplow_page_views_30'), except=fields_to_exclude|upper) }}

This is valid jinja and dbt sql (from gitlab's project), but from and except are both reserved words in python. Running black -c "from" returns error: cannot format <string>: Cannot parse: 1:4: from

Right now jinjafmt no-ops in these situations, but it wouldn't be that hard to replace these reserved words (followed by an eq, e.g. from=) with a sentinel (e.g., from__sqlfmt__=), format that, and then replace the sentinel with the keyword again. There are 33 reserved words in python

missing space around union all when formatting dbt_utils.unpivot

Describe the bug
When using the dbt_utils.unpivot SQL Generator after sqlfmt has run the compiled SQL is missing spaces around the union all between select statements. This does not happen if the statement is not put through sqlfmt first. A working format for the unpivot generator is provided under additional context
To Reproduce
sqlfmt . && dbt build -s <model using unpivot>

Expected behavior
When complied each union all should be separated:

WITH unpivoted AS (
    

    select

      cast('ID' as 
    varchar
) as column_name,
      cast(  
           ID
             
           as decimal) as percentage

    from <database>.<schema>.<model>

    union all
    select

      cast('USERNAME' as 
    varchar
) as column_name,
      cast(  
           USERNAME
             
           as decimal) as percentage

    from <database>.<schema>.<model>

    union all
    select

      cast('LAST_NAME' as 
    varchar
) as column_name,
      cast(  
           LAST_NAME
             
           as decimal) as percentage

    from <database>.<schema>.<model>

    union all
    select ...

Actual behavior

with
    unpivoted as (
        

select

    cast('ID' as  varchar) as column_name,
    cast(
         ID
         as decimal
    ) as percentage

from <database>.<schema>.<model>union allselect

    cast('USERNAME' as  varchar) as column_name,
    cast(
         USERNAME
         as decimal
    ) as percentage

from <database>.<schema>.<model>union allselect

    cast('LAST_NAME' as  varchar) as column_name,
    cast(
         LAST_NAME
         as decimal
    ) as percentage

from <database>.<schema>.<model>union allselect...

Additional context
What is the output of sqlfmt --version?
sqlfmt, version 0.6.0

Working format

 WITH unpivoted AS (
     {{ dbt_utils.unpivot(
         relation = <model_name>,
         cast_to = 'decimal',
         field_name = 'column_name',
         value_name = 'percentage'
     ) }}
 )
 SELECT
     *
 FROM
     unpivoted

Add more granular operator priorities to capture order of operations

e.g., today this won't merge onto a single line, since it's too long, so we leave it:

foofoofoofoofoofoofoofoofoofoo
/ barbarbarbarbarbarbar
+ foofoofoofoofoofoofoofoofoofoo
/ bazbazbazbazbazbaz
+ foofoofoofoofoofoofoofoofoofoo
/ quxquxquxquxqux

but it would be better to merge the / lines:

foofoofoofoofoofoofoofoofoofoo / barbarbarbarbarbarbar
+ foofoofoofoofoofoofoofoofoofoo / bazbazbazbazbazbaz
+ foofoofoofoofoofoofoofoofoofoo / quxquxquxquxqux

Fork primer repos to make it easier to detect changes

Primer should run against already-formatted repos.

Bug: semicolons should never be indented

semicolons end queries, so they always have a depth of zero

when we reach a semicolon, open_brackets should be reset to []

refactor merger.py to use a Segment class

Segment is a List[Line]. Would really help with readability/maintenance of merger.py to have segment methods in a different class:

merger._split_into_segments becomes segment.create_segments_from_lines
merger._tail_closes_head becomes Segment.tail_closes_head
merger._get_first_nonblank_line becomes Segment.head (add a Segment.tail also)
merger._segment_continues_operator_sequence becomes Segment.continues_operator_sequence
merger._get_remainder_of_segment becomes Segment.split

Consider not lowercasing column names

We use bigquery, which supports column names in mixed case though doesn't allow names which differ only in case:

https://cloud.google.com/bigquery/docs/schemas#column_names

A column name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. Duplicate column names are not allowed even if the case differs.

When we started using bigquery, we settle on (perhaps misguided) approach of using camelCase for column names. Now it is too late to fix it as that convention is used in hundreds of column names. It works mostly fine - as bigquery persists the case of column name.

Currently sqlfmt lowercases column names. An alternative way to think is that column names are like variable names and it is responsibility of author to get it right - so sqlfmt should just leave it as is.

If it is really common for sqlfmt users to use inconsistent case, and so it is a desirable feature to convert it all to lowercase - perhaps considering an option to leave column names alone.

Feature: format the contents of jinja tags

Today, sqlfmt does not do any formatting inside the curlies of a jinja tag. We should enforce a single space inside each curly, and use black to format the python between.

Line break introduced in the middle of `cross join`

Hi, wanted to lead off by saying how excited we are by this tool. Thanks for your work on it. Now, to the issue at hand.

sqlfmt 0.4.2 formats this query with a line break between "cross" and "join":

select foo.*, bar.*
from {{ ref('foo_foo_foo') }} as foo cross
join {{ ref('bar_bar_bar') }} as bar

I would prefer to have cross join together on the new line, like so:

select foo.*, bar.*
from {{ ref('foo_foo_foo') }} as foo
cross join {{ ref('bar_bar_bar') }} as bar

Feel free to close the issue if this is intended behavior, but I found it surprising and suspect it might be a bug.

between's "and" split like logical operator

Describe the bug
When using between ___ and ____, the line is always split before the and

To Reproduce
Use the between operator in a where clause, such as the following:

select
    radio,
    mcc,
    net as mnc,
    area as lac,
    cell % 65536 as cid,
    cell / 65536 as rnc,
    cell as long_cid,
    lon,
    lat
from
    towershift
where
    radio != 'CDMA'
    and mcc between 200 and 799
    and net between 1 and 999
    and area between 0 and 65535
    and cell between 0 and 268435455
    and lon between -180 and 180
    and lat between -90 and 90

Expected behavior

select
    radio,
    mcc,
    net as mnc,
    area as lac,
    cell % 65536 as cid,
    cell / 65536 as rnc,
    cell as long_cid,
    lon,
    lat
from towershift
where
    radio != 'CDMA'
    and mcc between 200 and 799
    and net between 1 and 999
    and area between 0 and 65535
    and cell between 0 and 268435455
    and lon between -180 and 180
    and lat between -90 and 90

Actual behavior

select
    radio,
    mcc,
    net as mnc,
    area as lac,
    cell % 65536 as cid,
    cell / 65536 as rnc,
    cell as long_cid,
    lon,
    lat
from towershift
where
    radio != 'CDMA'
    and mcc between 200
    and 799
    and net between 1
    and 999
    and area between 0
    and 65535
    and cell between 0
    and 268435455
    and lon between -180
    and 180
    and lat between -90
    and 90

Additional context

% sqlfmt --version
sqlfmt, version 0.5.0

sqlfmt -l 80 test.sql
1 file formatted.
0 files left unchanged.
test.sql formatted.

Fix "complex_logic" issue in src/sqlfmt/splitter.py

Consider simplifying this complex logical expression.

https://codeclimate.com/github/tconbeer/sqlfmt/src/sqlfmt/splitter.py#issue_61f46a9f597a090001000042

Feature: read config from pyproject.toml

sqlfmt does not have many configuration options, but it would be nice to be able to set the config we do have in a pyproject.toml file in the root of a project directory

Remove extra newlines and whitespace inside of tokens

Describe the bug
It's possible for a single token to be composed of multiple words, like union all, select top 25, not in, etc. We parse those tokens even if they contain newlines or extra whitespace, like union\n\n all, but we don't then remove those extra characters from the formatted output

To Reproduce

union

 all

Expected behavior

union all

Actual behavior

union

 all

Additional context
What is the output of sqlfmt --version?
0.9.0

jinjafmt strips parens from return macro, breaking it

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

{{- return(adapter.dispatch('my_macro', 'my_package')(arg1, arg2)) -}}

Expected behavior

{{- return(adapter.dispatch('my_macro', 'my_package')(arg1, arg2)) -}}

Actual behavior
This is not valid jinja, and won't compile

{{- return adapter.dispatch('my_macro', 'my_package')(arg1, arg2) -}}

Additional context
What is the output of sqlfmt --version?
0.6.0

Bug: Does not support snapshot blocks

Describe the bug
We support common jinja blocks, but we forgot about {% snapshot my_snapshot %}, so the {% endsnapshot %} tag can get put on the wrong line.

To Reproduce

{% snapshot snp_my_snapshot %}

{{
    config(
          target_database='analytics',
          target_schema=target.schema + '_snapshots',
          unique_key='id',
          strategy='timestamp',
          updated_at='updated_at',
        )
}}

select * from {{ ref('stg_my_model') }}
{% endsnapshot %}

Expected behavior

{% snapshot snp_my_snapshot %}

{{
    config(
          target_database='analytics',
          target_schema=target.schema + '_snapshots',
          unique_key='id',
          strategy='timestamp',
          updated_at='updated_at',
        )
}}

select * 
from {{ ref('stg_my_model') }}
{% endsnapshot %}

Actual behavior

{% snapshot snp_my_snapshot %}

{{
    config(
          target_database='analytics',
          target_schema=target.schema + '_snapshots',
          unique_key='id',
          strategy='timestamp',
          updated_at='updated_at',
        )
}}

select * 
from {{ ref('stg_my_model') }}{% endsnapshot %}

Additional context
What is the output of sqlfmt --version? 0.5.1

Feature: exclude paths

other tools support this; would be helpful to avoid formatting the target and dbt_packages directories in a dbt project

stubborn merge segment after a singleton operator

Describe the bug
If an operator ends up on a line by itself (after splitting), we don't try hard enough to merge the segment after that operator into the operator's line, which can result in ugly output (for somewhat pathological input)

To Reproduce
From gitlab's project:

{{column|lower}} || 
ENCRYPT_RAW(
    to_binary('{{ get_salt(column|lower) }}', 'utf-8'), 
    to_binary('{{ env_var("SALT_PASSWORD") }}', 'HEX'), 
    to_binary('416C736F4E637265FFFFFFAB', 'HEX')
)

Expected behavior

{{ column | lower }} || encrypt_raw(
    to_binary('{{ get_salt(column|lower) }}', 'utf-8'),
    to_binary('{{ env_var("SALT_PASSWORD") }}', 'HEX'),
    to_binary('416C736F4E637265FFFFFFAB', 'HEX')
)

Actual behavior

{{ column | lower }} ||
encrypt_raw(
    to_binary('{{ get_salt(column|lower) }}', 'utf-8'),
    to_binary('{{ env_var("SALT_PASSWORD") }}', 'HEX'),
    to_binary('416C736F4E637265FFFFFFAB', 'HEX')
)

Additional context
What is the output of sqlfmt --version?
Commit 88617d2 (v0.10)

Improve formatting for chained boolean operators with complex expressions

Describe the bug
Reported by @Rainymood on discourse.

We're not very smart about building a hierarchy of chained boolean expressions; mixing simple expressions with complex expressions can produce unexpected and hard-to-read formatting.

To Reproduce

select *
from historic_orders h
where 1 = 1
    and order_type = 'clothing'
    and category = "pants"
    and dt_utc > date('2019-10-01')
    and is_test is false

Expected behavior

select *
from historic_orders h
where
    1 = 1
    and order_type = 'clothing'
    and category = "pants"
    and dt_utc > date('2019-10-01')
    and is_test is false

Actual behavior

select *
from historic_orders h
where
    1 = 1 and order_type = 'clothing' and category = "pants" and dt_utc > date(
        '2020-10-01'
    ) and is_test is false

Additional context
What is the output of sqlfmt --version?
0.8.0

Fix "complex_logic" issue in src/sqlfmt/line.py

Consider simplifying this complex logical expression.

https://codeclimate.com/github/tconbeer/sqlfmt/src/sqlfmt/line.py#issue_61f46a9f597a09000100003d

Bug: Comments inside of --fmt:off blocks are still formatted

Describe the bug
The --fmt: off tags should stop all formatting of the contained code, but today that is not the case, since comments may be rendered differently in those blocks

To Reproduce

-- fmt: off
where something_long = something_else_long -- and there is a long comment that should stay on this line
-- fmt: on

Expected behavior

-- fmt: off
where something_long = something_else_long -- and there is a long comment that should stay on this line
-- fmt: on

Actual behavior

-- fmt: off
-- and there is a long comment that should stay on this line
where something_long = something_else_long
-- fmt: on

Additional context
What is the output of sqlfmt --version?
0.5.1

Feature: Support additional operators

Support ?, the "existence" operator in pg
range operators: =, &&, <@, @>, <<, >>, -|-, &<, and &> https://www.postgresql.org/docs/current/rangetypes.html
EXISTS, ALL, ANY, MINUS https://docs.snowflake.com/en/sql-reference/operators.html

Maybe many others here: https://www.postgresql.org/docs/current/functions.html

Feature: One-line short queries between semicolons or set operations

Describe the feature
When we have multiple queries in a single file, it would be nice if those queries were formatted the same as if they were in their own files. These queries may be separated by semicolons or set operations, like union

Desired behavior

select * from my_table

union all

select * from another_table
;

select * from another_query_entirely

Current behavior

select * 
from my_table

union all

select * 
from another_table
;

select *
from another_query_entirely

Additional context
What is the output of sqlfmt --version?
0.5.1

Psycopg placeholders

Describe the bug
sqlfmt breaks placeholders used by psycopg (python postgres driver), such as %s and %(name)s

To Reproduce
Format the following code:

INSERT INTO some_table VALUES (%s)

INSERT INTO some_table VALUES (%(foo)s)

Expected behavior
The placeholders should remain intact:

insert into some_table values(%s)

insert into some_table values(%(foo)s)

Actual behavior
sqlfmt inserts extra spaces:

insert into some_table values(% s)

insert into some_table values(% (foo) s)

Additional context
What is the output of sqlfmt --version?
sqlfmt, version 0.9.0

Feature: add a command line option to reset the cache

helpful for debugging unstable formatting

-k/--reset-cache

`black` dependency not installed via `pip`

Issue

When doing a first-time, fresh install of sqlfmt, I get a ModuleNotFoundError for black.

I added shandy-sqlfmt==0.4.2 to the requirements.txt in my dbt project, and—in a clean virtualenv—ran pip install -r requirements.txt.

(I know the recommended install method is pipx, but since pip is given as an option, I figured I'd log this FWIW.)

Workaround:

pip install black in the virtualenv
Add the following additional_dependencies line to .pre-commit-config.yaml:

- repo: https://github.com/tconbeer/sqlfmt
  rev: v0.4.2
  hooks:
    - id: sqlfmt
      language_version: python
      additional_dependencies: ['black']
      args: [--line-length=90]

sqlfmt works 👍

Customize formatting for certain jinja expressions/macros

Some jinja expressions contain keywords, but they're all just formatted like names.

For example, {{ dbt_utils.group_by(n) }} tends to get wrapped onto the line above it. sqlfmt doesn't have any way to know that the macro compiles to a keyword, so it wants to one-line it with the end of the where statement. If sqlfmt knew this compiled to GROUP BY, it would not do that.

I've thought about this before, and I don't have a great solution except for hard-coding it in as an exception (it's one of very few macros that start with a keyword).

other than the hacky/hard-coded method, we could consider an inline comment that could tell sqlfmt how to treat a jinja expression (if the default treatment doesn't work). So something like:

{{ dbt_utils.group_by(6) }}  -- fmt: group by

If anyone else has solutions/ideas, I'm open to them, but I do not want to go down the sqlfluff road and jinja-template the sql.

For now, I'd suggest either 1) using a -- fmt: off comment on this line, or b) not using dbt_utils.group_by() and instead implement a macro called range:

{%- macro range(n) -%}
{% for i in range(1, n + 1) -%}
    {{ i }}{{ ',' if not loop.last }}   
{%- endfor -%}
{%- endmacro -%}

and then call it with:

...
where my_condition
group by {{ range(10) }}

Originally posted by @tconbeer in #165 (comment)

Add support for `like any` and `like all`

see https://docs.snowflake.com/en/sql-reference/functions/like_any.html

Don't always stubborn merge lines that open brackets

Describe the bug
This is ugly. We force merge the lines that have function calls, since they open brackets, but we probably shouldn't, so that all the sibling operators stay at the same level.

The new rule should be: don't stubborn-merge p1 operators that open brackets if the previous segment also starts with an operator. Also, consider that p1 operators might need a different sequence of stubborn-merge strategies from p0 operators.

To Reproduce
From gitlab's project (indented a few more times, so it doesn't fit on a single line):

SELECT 
    LOWER(table_catalog)||'.'||LOWER(table_schema)||'.'||LOWER(table_name) AS fqd_name,
    ...

Expected behavior

select
    lower(table_catalog)
    || '.' 
    || lower(table_schema)
    || '.'
    || lower(table_name) as fqd_name,

Actual behavior

select
    lower(table_catalog)
    || '.' || lower(
        table_schema
    )
    || '.'
    || lower(table_name) as fqd_name,

Additional context
What is the output of sqlfmt --version?

tconbeer / sqlfmt Goto Github PK

sqlfmt's Issues

Issue

Workaround:

Recommend Projects

Recommend Topics

Recommend Org