quantco / pytsql Goto Github PK
View Code? Open in Web Editor NEWRun mssql scripts from Python.
License: BSD 3-Clause "New" or "Revised" License
Run mssql scripts from Python.
License: BSD 3-Clause "New" or "Revised" License
Analogous to what we have with glum:
https://glum.readthedocs.io/en/latest/
This is due to a bug in sqlalchemy's Connection.execution_options() or something deeper like pyodbc. Here is a "minimal" example (also in a gist):
import random
import string
import pytest
import pytsql
import sqlalchemy
from sqlalchemy.pool import NullPool
TEMPDB_CONN_STR = "mssql+pyodbc://sa:QuantCo123@localhost:1433/tempdb?driver=libtdsodbc.so"
def show_bug(
tempdb_connection_string: str,
break_with_root_cause: bool,
fix_by_disabling_pooling: bool,
db: str = "tempdb",
):
eng = sqlalchemy.create_engine(
tempdb_connection_string,
connect_args={"autocommit": True},
echo=False,
**({"poolclass": NullPool} if fix_by_disabling_pooling else {}),
)
source_table = random_name()
table_1 = random_name()
table_2 = random_name()
# print("Source table:", source_table)
# print("Table 1:", table_1)
# print("Table 2:", table_2)
with eng.connect() as conn:
conn.execute(f"CREATE TABLE {db}.dbo.{source_table} (ID INT)")
conn.execute(f"INSERT INTO {db}.dbo.{source_table} VALUES (123)")
with eng.connect() as conn:
conn.execute(f"SELECT * INTO {db}.dbo.{table_1} FROM {db}.dbo.{source_table}")
with eng.connect() as conn:
print("Found in table:", conn.execute(f"SELECT * FROM {db}.dbo.{table_1}").all())
if not break_with_root_cause:
pytsql.executes("SELECT 12", eng)
else:
eng.connect().execution_options(isolation_level="AUTOCOMMIT")
with eng.connect() as conn:
# Won't be committed if no fixes are applied.
conn.execute(f"SELECT * INTO {db}.dbo.{table_2} FROM {db}.dbo.{source_table}")
with eng.connect() as conn:
# Fails with "invalid object name" if no fixes are applied because the above didn't get committed.
print("Found in table:", conn.execute(f"SELECT * FROM {db}.dbo.{table_2}").all())
def random_name() -> str:
return "".join(random.choices(string.ascii_letters, k=10))
print("Should work")
for break_with_root_cause in [False, True]:
show_bug(
tempdb_connection_string=TEMPDB_CONN_STR,
break_with_root_cause=break_with_root_cause,
fix_by_disabling_pooling=True, # Disabling pooling fixes it because connections are not reused.
)
print("Should NOT work")
for break_with_root_cause in [False, True]:
with pytest.raises(sqlalchemy.exc.ProgrammingError, match="Invalid object name"):
show_bug(
tempdb_connection_string=TEMPDB_CONN_STR,
break_with_root_cause=break_with_root_cause,
fix_by_disabling_pooling=False,
)
print("Told ya")
The following statement to create a user-defined table type is not executable via the latestpytsql
(1.1.4):
CREATE TYPE x AS TABLE (
id INT
)
This statement it valid in MSSQL since 2008 - see doc.
File "D:[...]\pytsql\tsql.py", line 252, in execute
executes(_code(path, encoding), engine, parameters, isolate_top_level_statements)
File "[...]\lib\site-packages\pytsql\tsql.py", line 226, in executes
for batch in _split(parametrized_code, isolate_top_level_statements):
File "[...]\lib\site-packages\pytsql\tsql.py", line 163, in _split
tree = parse(InputStream(data=code), "tsql_file", error_listener)
File "[...]\lib\site-packages\pytsql\grammar\sa_tsql.py", line 82, in parse
return _cpp_parse(stream, entry_rule_name, sa_err_listener)
File "[...]\lib\site-packages\pytsql\grammar\sa_tsql.py", line 105, in _cpp_parse
return sa_tsql_cpp_parser.do_parse(tsqlParser, stream, entry_rule_name, sa_err_listener)
File "[...]\lib\site-packages\pytsql\tsql.py", line 146, in syntaxError
raise ValueError(f"Error parsing SQL script: {error_message}")
ValueError: Error parsing SQL script: Line 1:49 mismatched input 'AS' expecting FROM. Problem near token 'AS'.
There is a workflow that tries to upload the package to TestPyPI, and it started failing with:
WARNING Error during upload. Retry with the --verbose option for more details.
ERROR HTTPError: 400 Bad Request from https://test.pypi.org/legacy/
Project size too large. Limit for project 'pytsql' total size is 10 GB.
See https://test.pypi.org/help/#project-size-limit
This should not be critical.
We could consider:
I've received requests for pytsql
to support the @@ROWCOUNT variable, so that print statements of the number of affected rows can be used. An example usage would be:
<select/update statement>
PRINT (@@ROWCOUNT)
The reason this doesn't work at the moment seems to be that if we don't send both statements in a single "batch" towards the SQL server, the variable gets reset. The current implementation of pytsql separates all top-level statements and sends them separately.
My suggestion would be to introduce an option along the lines of "isolate-top-level-statements" which is true by default and when enabled would retain the current behavior of the tool. However, when disabled, the batching would be done in whole groups inbetween GO
commands. In this way users who want to use such "context-sensitive" commands can use the latter mode. The only downside of the latter mode should be that PRINT statement values would only be shown when a full batch executes, but that can easily be tuned by the user with appropriate amount of GO
statements. In my opinion such a change gives more control to the user, and the default of the option would keep us backwards compatible.
Declarations in stored procedures can be defined using stored procedure arguments. Pytsql currently still copies them to prefix batches with isolate_top_level_statements=True which causes errors.
TSQL:
CREATE PROCEDURE CREATEALLDATES
(
@StartDate AS DATE, @EndDate AS DATE
) AS
DECLARE @Current AS DATE = DATEADD(DD, 0, @StartDate); DROP TABLE IF EXISTS ##alldates CREATE TABLE ##alldates (
dt DATE PRIMARY KEY
) WHILE @Current <= @EndDate BEGIN
INSERT INTO ##alldates
VALUES (@Current);
SET @Current = DATEADD(DD, 1, @Current) -- add 1 to current day
END
GO
IF OBJECT_ID ( N'dbo.get_db_sampling_factor' , N'FN' ) IS NOT NULL DROP FUNCTION get_db_sampling_factor ;
Error:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Must declare the scalar variable "@StartDate". (137) (SQLExecDirectW)')
[SQL: DECLARE @Current AS DATE = DATEADD ( DD , 0 , @StartDate ) ; IF OBJECT_ID ( N'dbo.get_db_sampling_factor' , N'FN' ) IS NOT NULL DROP FUNCTION get_db_sampling_factor ;]
All tests that run after
pytsql/tests/unit/test_py_vs_cpp.py
Lines 141 to 144 in 7699639
will not use the C++ implementation, and print a misleading warning from here
Lines 150 to 155 in 7699639
like in the current CI
For my project, I'm currently trying to use pytsql
for use cases beyond replacing DB or table names. In particular, I want to make use of conditionals.
However, I run into some issues there:
IF 1 = 1
BEGIN
SET NONEXEC ON
END
GO
CREATE VIEW v AS ...
SET NONEXEC OFF
... however, pytsql
indirectly throws me an unexpected error via sqlalchemy
:
sqlalchemy.exec.ResourceClosedError: This result object does not return rows. It has been closed automatically
. (If it's helpful I can also post the full error, but that requires some anonymization first)
pytsql
s behavior is unexpectedThe ANTLR4 runtime package states that it supports Python 3.7 [0].
However, inspecting the source code of the package itself, there is no specification for the version, thus it seems highly likely any version above 3.7 might be safe to use too.
This error appeared to me during debugging pytsql, then it vanished, and now it resurfaced:
Traceback (most recent call last):
File "/home/martin/progs/pycharm-2022.2.1/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
AttributeError: 'Namespace' object has no attribute 'backend'
It could be a problem of my pycharm installation, but this is the first repo where it happens.
I checked both python 3.11 and 3.10.
I tested pycharm 2022.2.1 and 2022.2.4
With the current design of the library, SET statements have a rather weird behavior. In short, before every top-level statement all local variables are redeclared with their initial value. This is a behavior hidden from the user and is potentially dangerous.
To illustrate the current behavior, here are some examples:
Top-level SET
DECLARE @A INT = 10
SET @A = 5
PRINT @A
Will print 10. The SET is actually executed but has no effect since the variable is lost after batch execution and then redeclared with its initial value before the PRINT.
SET in control flow
DECLARE @A INT = 10
IF 1 = 1
BEGIN
SET @A = 5
PRINT @A
END
Will print 5 since the whole IF is executed as a single statement.
And a mixed example that shows the inconsistent behavior
DECLARE @A INT = 10
IF 1 = 1
BEGIN
SET @A = 5
PRINT @A
END
PRINT @A
Will produce 5 followed by a 10, combining the behavior of the previous two examples.
It's unfortunately not easy to fix this in general, but it would be nice to at least add a warning log whenever SET statements are parsed in a script.
Note that simply banning SET statements at top level is not sufficient, because of cases like the third example - hence it is not trivial to statically determine which scripts would work as expected.
I wonder if there is something that we can implement in pytsql to optionally disable printing of the #pytsql_prints
related lines to the SQLAlchemy logger.
To handle the freetds driver, a separate job is needed that installs the relevant driver, adding it to the drivers list ends up with
sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'libtdsodbc.so' : file not found (0) (SQLDriverConnect)")
Relates to PR #66
Issue: SQLAlchemy's docu states:
For SQL statements where a colon is required verbatim, as within an inline string, use a backslash to escape:
t = text(r"SELECT * FROM users WHERE name='\:username'")
However, pytsql
isn't incorporating this. Hence, the example above fails.
(Same for INSERT INTO dbo.tbl VALUES (':foo')
)
See to use https://pypi.org/project/flit-scm/ to get the link of the version from git into flit
packaging workflow.
As far as I see there is no option for returning the resulting table of a query when executing a file or sql string. This feature, if even possible, would be super nice, as it makes ongoing work with dataframes in python much easier! Basically something similiar like what pandas supports with df = pd.read_sql(string, con)
.
We are currently using ANTLR version 4.9.2 (both for generating and running the parsers, of course), whereas speedy-antlr-tool
started requiring ANTLR >= 4.10.
Note that if you are not careful, installing the latest speedy-antlr-tool
will silently bring the latest 4.11 ANTLR runtime, and this could additionally confuse things (e.g. you will get errors when you try to run the parsers, that were generated by the 4.9 version in the new 4.14 runtime):
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.