capitalone / giraffez Goto Github PK
View Code? Open in Web Editor NEWUser-friendly Teradata client for Python
Home Page: https://capitalone.github.io/giraffez
License: Apache License 2.0
User-friendly Teradata client for Python
Home Page: https://capitalone.github.io/giraffez
License: Apache License 2.0
Hi - When I export bigint fields the exported value does not match the table value.
The steps to recreate the problem are below.
Teradata Code:
create table nztemp.contains_a_bigint( bigint_field bigint not null) no primary index;
insert into nztemp.contains_a_bigint (bigint_field) values (9111200007015236);
Python Code:
import giraffez
with giraffez.Export('nztemp.contains_a_bigint',
host='my_host',
username='my_user',
password='my_password',
delimiter='\t',
null='',
encoding='dict') as export:
for row in export.results():
print row
>>> {'bigint_field': -1880798396L}
Of course I was expecting 9111200007015236.
Any help would be appreciated.
The docstring description for the ecoding attribute in the Export class documentation only documents the use of 'text', 'json', and 'archive'. However, in looking at example code and in the export._initiate function, it appears the 'dict' is also support for the 'dict' Python object type.
The 'dict' type is apparently needed for exporting smoothly to DataFrame objects, and so could see wide usage
I seem to be having a problem using the substring command. Simplified code as follows:
import giraffez
query1 = ''' select SUBSTRING('world' ,1,3) as hello'''
with giraffez.Cmd() as cmd:
results = cmd.execute_one(query1)
Any help or insight would be greatly appreciated!
Hi,
This is a good project, currently in my job I'm working with Teradata and python. It would be great if we can usegiraffez bteq interactive
and giraffez bteq <bteq file> <querybands>
.
Using giraffez CLI load command to put data in a table with a JSON column I get this error.
# giraffez -vvv load mycsv.txt "mydb.mytable"
/usr/local/lib/python3.6/dist-packages/giraffez/utils.py:129: UserWarning: USING MLOAD TO INSERT LESS THAN 100000 ROWS IS NOT RECOMMENDED - USE LOAD INSTEAD!
Connection: Connecting to data source 'other' ...
Connection: Connection to 'other' established successfully.
BulkLoad: Executing ...
source => "mycsv.txt"
output => "mydb.mytable"
BulkLoad: Initiating Teradata PT request (awaiting server) ...
Connection: Closing Teradata connection ...
Connection: Connection to 'other' closed.
Traceback (most recent call last):
File "/usr/local/bin/giraffez", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/giraffez/__main__.py", line 25, in main
MainCommand().run()
File "/usr/local/lib/python3.6/dist-packages/giraffez/core.py", line 80, in run
args.run(args)
File "/usr/local/lib/python3.6/dist-packages/giraffez/commandline.py", line 431, in run
exit_code = load.from_file(args.input_file, delimiter=args.delimiter, null=args.null, quotechar=args.quote_char, parse_dates=args.parse_dates)
File "/usr/local/lib/python3.6/dist-packages/giraffez/load.py", line 254, in from_file
self._initiate()
File "/usr/local/lib/python3.6/dist-packages/giraffez/load.py", line 430, in _initiate
self.mload.initiate(self.table, self.columns)
giraffez.TeradataError: 5674: LOBs are not allowed to be selected in Record or Indicator modes.
Looking online it seems like if I were using ODBC I could make this go away by adding this:
EnableLobSupport=Yes
Unfortunately I can't see how to pass connection string options to Giraffez. Is there a way to do this? Or is the JSON column not supported?
Thanks!
I think this may be related to an internal issue previously closed (144). This is on 1.1.7
from decimal import Decimal
import giraffez
test_cases = [
{'n': -0.08, 'd1': 32, 'd2': 2},
{'n': -0.08, 'd1': 30, 'd2': 2},
{'n': -0.08, 'd1': 32, 'd2': 4},
{'n': -0.1, 'd1': 28, 'd2': 1},
{'n': -0.1, 'd1': 20, 'd2': 0},
{'n': -999, 'd1': 32, 'd2': 2},
{'n': -999, 'd1': 30, 'd2': 2},
{'n': -99.9, 'd1': 30, 'd2': 2},
{'n': -0.08, 'd1': 2, 'd2': 2},
{'n': -0.08, 'd1': 3, 'd2': 2},
{'n': -0.08, 'd1': 4, 'd2': 2},
{'n': -0.08, 'd1': 16, 'd2': 2},
{'n': -0.08, 'd1': 17, 'd2': 2},
{'n': -0.08, 'd1': 18, 'd2': 2},
{'n': -0.08, 'd1': 19, 'd2': 2},
{'n': -0.08, 'd1': 20, 'd2': 2},
{'n': -0.08, 'd1': 21, 'd2': 2},
{'n': -0.08, 'd1': 22, 'd2': 2},
{'n': -0.08, 'd1': 23, 'd2': 2},
{'n': -0.08, 'd1': 24, 'd2': 2},
{'n': -0.08, 'd1': 31, 'd2': 2},
{'n': -0.08, 'd1': 32, 'd2': 2},
{'n': -0.08, 'd1': 33, 'd2': 2},
{'n': 0, 'd1': 33, 'd2': 2},
{'n': -0.0001, 'd1': 33, 'd2': 4},
{'n': -1000000000, 'd1': 33, 'd2': 4},
{'n': -100, 'd1': 20, 'd2': 2},
{'n': -99, 'd1': 20, 'd2': 2},
{'n': -99.1, 'd1': 20, 'd2': 2},
{'n': -99.1, 'd1': 19, 'd2': 2},
{'n': -99.1, 'd1': 18, 'd2': 2}
]
with giraffez.Cmd() as cmd:
for test_case in test_cases:
sql = 'SELECT CAST({n} AS DECIMAL({d1},{d2})) AS a'.format(**test_case)
res = cmd.execute(sql).one().first()
print(
'SQL:',
sql,
'Results:',
'{} == {}'.format(test_case['n'], res['a']),
res['a'] == Decimal(str(test_case['n'])))
Gives:
SQL: SELECT CAST(-0.08 AS DECIMAL(32,2)) AS a Results: -0.08 == -1.92 False
SQL: SELECT CAST(-0.08 AS DECIMAL(30,2)) AS a Results: -0.08 == -1.92 False
SQL: SELECT CAST(-0.08 AS DECIMAL(32,4)) AS a Results: -0.08 == -1.9200 False
SQL: SELECT CAST(-0.1 AS DECIMAL(28,1)) AS a Results: -0.1 == -1.9 False
SQL: SELECT CAST(-0.1 AS DECIMAL(20,0)) AS a Results: -0.1 == 0 False
SQL: SELECT CAST(-999 AS DECIMAL(32,2)) AS a Results: -999 == -999.00 True
SQL: SELECT CAST(-999 AS DECIMAL(30,2)) AS a Results: -999 == -999.00 True
SQL: SELECT CAST(-99.9 AS DECIMAL(30,2)) AS a Results: -99.9 == -100.10 False
SQL: SELECT CAST(-0.08 AS DECIMAL(2,2)) AS a Results: -0.08 == -0.08 True
SQL: SELECT CAST(-0.08 AS DECIMAL(3,2)) AS a Results: -0.08 == -0.08 True
SQL: SELECT CAST(-0.08 AS DECIMAL(4,2)) AS a Results: -0.08 == -0.08 True
SQL: SELECT CAST(-0.08 AS DECIMAL(16,2)) AS a Results: -0.08 == -0.08 True
SQL: SELECT CAST(-0.08 AS DECIMAL(17,2)) AS a Results: -0.08 == -0.08 True
SQL: SELECT CAST(-0.08 AS DECIMAL(18,2)) AS a Results: -0.08 == -0.08 True
SQL: SELECT CAST(-0.08 AS DECIMAL(19,2)) AS a Results: -0.08 == -1.92 False
SQL: SELECT CAST(-0.08 AS DECIMAL(20,2)) AS a Results: -0.08 == -1.92 False
SQL: SELECT CAST(-0.08 AS DECIMAL(21,2)) AS a Results: -0.08 == -1.92 False
SQL: SELECT CAST(-0.08 AS DECIMAL(22,2)) AS a Results: -0.08 == -1.92 False
SQL: SELECT CAST(-0.08 AS DECIMAL(23,2)) AS a Results: -0.08 == -1.92 False
SQL: SELECT CAST(-0.08 AS DECIMAL(24,2)) AS a Results: -0.08 == -1.92 False
SQL: SELECT CAST(-0.08 AS DECIMAL(31,2)) AS a Results: -0.08 == -1.92 False
SQL: SELECT CAST(-0.08 AS DECIMAL(32,2)) AS a Results: -0.08 == -1.92 False
SQL: SELECT CAST(-0.08 AS DECIMAL(33,2)) AS a Results: -0.08 == -1.92 False
SQL: SELECT CAST(0 AS DECIMAL(33,2)) AS a Results: 0 == 0.00 True
SQL: SELECT CAST(-0.0001 AS DECIMAL(33,4)) AS a Results: -0.0001 == -1.9999 False
SQL: SELECT CAST(-1000000000 AS DECIMAL(33,4)) AS a Results: -1000000000 == -1000000000.0000 True
SQL: SELECT CAST(-100 AS DECIMAL(20,2)) AS a Results: -100 == -100.00 True
SQL: SELECT CAST(-99 AS DECIMAL(20,2)) AS a Results: -99 == -99.00 True
SQL: SELECT CAST(-99.1 AS DECIMAL(20,2)) AS a Results: -99.1 == -100.90 False
SQL: SELECT CAST(-99.1 AS DECIMAL(19,2)) AS a Results: -99.1 == -100.90 False
SQL: SELECT CAST(-99.1 AS DECIMAL(18,2)) AS a Results: -99.1 == -99.10 True
Not sure what the issue is here, but these seem to return as expected through odbc/jdbc.
I want LDAP authentication to connect to Teradata with giraffez.
Is such a functionality available with giraffez?
What parameters should be specified to use LDAP with giraffez config --set
command?
Thank you
Is this a stable release? If so why isn't there a release version in the releases section?
The documentation for config.Config says that Config() "Creates an encryption key at ~/.giraffepg if one does not exist"
But it doesn't, instead raising a KeyNotFound
error:
In [1]: import giraffez.config
In [2]: cfg = giraffez.config.Config(mode='w')
---------------------------------------------------------------------------
KeyNotFound Traceback (most recent call last)
<ipython-input-2-27c6bd581e19> in <module>()
----> 1 cfg = giraffez.config.Config(mode='w')
/opt/anaconda/lib/python3.5/site-packages/giraffez/config.py in __init__(self, conf, mode, key_file)
127 if key_file is None:
128 self._key_file = home_file(".giraffepg")
--> 129 self._key_file = _get_key_file(key_file)
130 self.mode = mode
131 _check_file_permissions(self._config_file, 0o600)
/opt/anaconda/lib/python3.5/site-packages/giraffez/config.py in _get_key_file(key_file)
62 key_file = os.path.abspath(key_file)
63 if not file_exists(key_file):
---> 64 raise KeyNotFound("Key file is required to use giraffez config.")
65 return key_file
66
KeyNotFound: Key file is required to use giraffez config.
This is either a bug in the code or a bug in the docs, and I'd advocate for it being a bug in the code. It would be great if it were possible to use this config
API to create an initial configuration for a new user, which would require the ability to construct a Config object without a pre-existing .giraffepg
, maybe with the option to create one as part of the constructor.
load
detects the delimiter in the source file, but fails on rows that contain the delimiter in a quoted field.
For example, for the following source file:
one|two|three
four|five|six
seven|"eight|nine"|ten
$ giraffe load testfile.txt test_container.test_table
ERROR:root:Mismatch length for row
['seven', '"eight', 'nine"', 'ten']
['one', 'two', 'three']
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/giraffe/command.py", line 376, in insert
raise FormatError("Mismatch length for row\n\t{0}\n\t{1}".format(row, fields))
FormatError: Mismatch length for row
['seven', '"eight', 'nine"', 'ten']
['one', 'two', 'three']
Results
---> Rows successful: 1
---> Rows unsuccessful: 1
---> Total: 2
==========================
run took 4.9091s to finish
Expected results: delimiter is ignored within quotes, and load of seven|"eight|nine"|ten
succeeds with "eight|nine" as a single field.
Actual results: load of seven|"eight|nine"|ten
fails.
Would it be possible to expose or document the add_attribute
method of MLoad
and Export
objects?
For context I'm working with a Teradata instance that severely limits the number of active sessions. After some research I found this was possible by using add_attribute
method of the mload
object belonging to the BulkLoad
operation and then looking for the appropiate constants:
with giraffez.BulkLoad(table=table_name, **giraffez_opts) as load:
load.mload.add_attribute(giraffez.constants.TD_MIN_SESSIONS, 1)
load.mload.add_attribute(giraffez.constants.TD_MAX_SESSIONS, 2)
[...]
While the TPT Export driver works as expected on macOS, the Update driver fails to complete the Initiate method with Segmentation Fault: 11
. It has something to do with the lifetime of objects for the TPT, mostly teradata::client::API::Connection
, and how the compiler and/or platform handles the code differently.
I am experiencing problem loading Pandas dataframe to Teradata via MLOAD. First I got NULL values error as len() of NULL returns an error. I dropped all rows with null values but still, no luck. Now I have Teradata PT request failed with code '2583'.
This is my code:
with giraffez.MLoad("DBC.TestTable") as mload:
mload.columns = df.columns.tolist()
rows = list(zip(*[df[i].values.tolist() for i in df]))
for row in rows:
mload.load_row(row)
exit_code = mload.finish()
print("finished load with code {0}".format(exit_code))
if exit_code == 0:
mload.cleanup()
Is there a better and working way to mload Pandas DataFrame to Teradata?
Tools:
Python Version: 3.6
Teradata CLIv2 16.10.0.2
Windows 7
This is a corner case and I'm open to a "won't fix", but came across this and didn't expect the behaviour today: two examples where selecting columns that render out as having the same name will silently squash one of them in output:
import giraffez
table_name = 'schema.tester'
drop_sql = f"DROP TABLE {table_name}"
create_sql = f"""CREATE MULTISET TABLE {table_name} (
field_1 DECIMAL(18,0) TITLE 'Number',
field_2 DECIMAL(12,0) TITLE 'Number')"""
insert_sql = f"INSERT INTO {table_name} SELECT 1 AS field_1, 2 AS field_2"
read_sql = f"SELECT field_1, field_2 FROM {table_name}"
with giraffez.Cmd() as cmd:
if cmd.exists(table_name):
cmd.execute(drop_sql)
cmd.execute(create_sql)
cmd.execute(insert_sql)
res = list(cmd.execute(read_sql).to_dict())
print(res)
cmd.execute(drop_sql)
This code results in [{'number': 2.0}]
, because Column.name
uses self.title
over self.original_name
(here). Note, if you BulkExport out the data, the export.to_str() still works fine, but the to_dict() will overwrite (because it's a dict)
create_a = "CREATE MULTISET VOLATILE TABLE tbl_a AS (SELECT 1 AS fld_a) WITH DATA ON COMMIT PRESERVE ROWS"
create_b = "CREATE MULTISET VOLATILE TABLE tbl_b AS (SELECT 1 AS fld_a) WITH DATA ON COMMIT PRESERVE ROWS"
read_sql = "SELECT tbl_a.fld_a, tbl_b.fld_a FROM tbl_a INNER JOIN tbl_b ON tbl_a.fld_a = tbl_b.fld_a"
with giraffez.Cmd() as cmd:
cmd.execute(create_a)
cmd.execute(create_b)
res = cmd.execute(read_sql)
for row in res:
print(row)
This code results in {'fld_a': 1}
.
If you select the same field name in one query/table (e.g. SELECT 1 AS field_1, 2 AS field_1
), Teradata will error out with 3863: Duplicate definition of field.
Wondering if giraffez should exception out when there are duplicates in the list of column names? Or maybe just when calling to_dict()
or to_list()
on Cmd or Export, which would produce unexpected results? (especially if you're exporting, the fields will still be in export.columns.names
but would have fewer records in to_dict() items
). Both of these could be fixed by the user providing an alias, but as a user I'd like giraffez to be "noisier" if there was going to be an issue.
Is there a way to parameterize queries with Giraffez? I want to avoid any possible SQL injection when using the library.
Hi
I faced problem trying to load data from previously exported file. It is being reported as: "giraffez.errors.GiraffeTypeError: Column 'cif' does not exist" what is quite weird (as stated before the file was created via giraffez export database.table_name output_file.txt -d '\t').
Here is the definition of the table:
CREATE MULTISET TABLE DDHDP01P.AG_KHD_4463_2_BIK_NOWA ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
cif CHAR(19) CHARACTER SET LATIN NOT CASESPECIFIC,
nr_rachunku_bik INTEGER,
typ_finansowania CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC,
data_otwarcia DATE FORMAT 'YYYY-MM-DD',
przewid_data_zakonczenia DATE FORMAT 'YYYY-MM-DD',
okres INTEGER,
waluta CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
kurs CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
status CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC,
data_zamkniecia DATE FORMAT 'YYYY-MM-DD',
data_salda_rachunku DATE FORMAT 'YYYY-MM-DD',
kwota_do_splaty DECIMAL(18,2),
kwota_kredytu DECIMAL(18,2),
kwota_raty DECIMAL(18,2),
data_raportu DATE FORMAT 'YYYY-MM-DD',
kwota_limitu DECIMAL(18,2),
kwota_gwarancji DECIMAL(18,2),
liczba_rat_kapitalowych INTEGER,
czestotliwosc_splat_kapitalu VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( cif );
Peter
giraffez version: 1.1.8
Python version: 3.5.2
OS: Windows 7 Enterprise
Teradata Tools and Utilities version: 15.10.05.00
Hi!
I'm testing Giraffez on Python 3.7 and have the following error:
Traceback (most recent call last):
File "/Users/flolas/anaconda3/bin/giraffez", line 11, in <module>
sys.exit(main())
File "/Users/flolas/anaconda3/lib/python3.7/site-packages/giraffez/__main__.py", line 25, in main
MainCommand().run()
File "/Users/flolas/anaconda3/lib/python3.7/site-packages/giraffez/core.py", line 80, in run
args.run(args)
File "/Users/flolas/anaconda3/lib/python3.7/site-packages/giraffez/commandline.py", line 363, in run
if FileReader.check_length(args.input_file, MLOAD_THRESHOLD):
File "/Users/flolas/anaconda3/lib/python3.7/site-packages/giraffez/io.py", line 122, in check_length
for i, line in enumerate(f, 1):
RuntimeError: generator raised StopIteration
This is when executing:
giraffez insert com.csv PROD.CODES
When exporting data from Pandas that are not string-objects, raises this error:
Connection: Connecting to data source 'db1' ...
Connection: Connection to 'db1' established successfully.
MLoad: Requesting column info for table 'DBC.TestTable' ...
MLoad: Initiating Teradata PT request (awaiting server) ...
MLoad: Teradata PT request accepted.
Connection: Closing Teradata connection ...
MLoad: Closing Teradata PT connection ...
MLoad: Acquisition phase was not called before closing.
MLoad: Ending acquisition phase ...
MLoad: Acquisition phase ended.
MLoad: Teradata PT request complete.
Connection: Connection to 'db1' closed.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-4282e9006bed> in <module>()
3 rows = list(zip(*[df[i].values.tolist() for i in df]))
4 for row in rows:
----> 5 mload.load_row(row)
6 exit_code = mload.finish()
7 print("finished load with code {0}".format(exit_code))
C:\Users\MYUSERNAME\AppData\Local\Continuum\Anaconda3\lib\site-packages\giraffez\mload.py in load_row(self, items, panic)
372 self._initiate()
373 try:
--> 374 data = self.processor(items)
375 row_status = self.load.put_row(data)
376 if not self._handle_error(row_status):
C:\Users\MYUSERNAME\AppData\Local\Continuum\Anaconda3\lib\site-packages\giraffez\utils.py in _pipeline(x)
66 def pipeline(funcs):
67 def _pipeline(x):
---> 68 return reduce(lambda a, f: f(a), funcs, x)
69 return _pipeline
70
C:\Users\MYUSERNAME\AppData\Local\Continuum\Anaconda3\lib\site-packages\giraffez\utils.py in <lambda>(a, f)
66 def pipeline(funcs):
67 def _pipeline(x):
---> 68 return reduce(lambda a, f: f(a), funcs, x)
69 return _pipeline
70
C:\Users\MYUSERNAME\AppData\Local\Continuum\Anaconda3\lib\site-packages\giraffez\encoders.py in _encoder(items)
278 data.append(value)
279 elif column.type in VAR_TYPES:
--> 280 data.append(struct.pack("H", len(item)) + ensure_bytes(item))
281 elif column.type in DATE_TYPES:
282 date_value = Date.from_string(item)
TypeError: object of type 'float' has no len()
Non-string object have no len(): I had to transform my data to str to bypass this issue.
for i in df.columns.tolist():
df[i] = df[i].values.astype(str)
before passing it to mload:
with giraffez.MLoad("DBC.TestTable") as mload:
mload.columns = df.columns.tolist()
rows = list(zip(*[df[i].values.tolist() for i in df]))
for row in rows:
mload.load_row(row)
exit_code = mload.finish()
print("finished load with code {0}".format(exit_code))
if exit_code == 0:
mload.cleanup()
Is this a known issue?
In using the giraffez.Export
module, I've found that CHAR(x) types get padded on the right with a number of spaces equal to the the length of the field (in addition to any blank characters that exist at the end of the value that don't fill the full x characters). I believe expected behavior here would be to trim the value as happens with VARCHAR types, and the additional spaces equal to the length of the field are clearly unwelcome.
Here's an example that I think should be reproducible:
>>> with giraffez.Load(host=host, username=username, password=password) as load:
... load.execute("""create multiset table giraffez_issue (
... id_var int,
... variable_field varchar(10),
... fixed_field char(4),
... floaty decimal(6, 2)
... ) primary index (id_var)""")
...
... load.insert('giraffez_issue', [
... (1, 'alpha', 'A1', 1.1),
... (2, 'beta', 'B', 2.2),
... (3, 'gamma', 'C3PO', 3.0)
... ])
...
defaultdict(<class 'int'>, {'count': 3})
>>> with giraffez.Export('giraffez_issue', host=host, username=username,
... password=password, delimiter='|', null='') as export:
... for row in export.results():
... print('<<{row}>>'.format(row=row))
...
<<1|alpha|A1 |1.10>>
<<3|gamma|C3PO |3.00>>
<<2|beta|B |2.20>>
I need a way to pass credentials into the cmd.execute
statement without writing them out to .girafferc
. The documentation indicates that it should be possible to do this, but it doesn't seem to be working (e.g. giraffez cmd --help
doesn't list setting these parameters as options, despite what the documentation says).
Attempting to run import giraffez
produces the following error
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
~/miniconda3/envs/saul/lib/python3.6/site-packages/giraffez/__init__.py in <module>()
38 try:
---> 39 from . import _teradata
40 from . import _teradatapt
ImportError: dlopen(/Users/****/miniconda3/envs/saul/lib/python3.6/site-packages/giraffez/_teradata.cpython-36m-darwin.so, 2): Library not loaded: libcliv2.dylib
Referenced from: /Users/****/miniconda3/envs/saul/lib/python3.6/site-packages/giraffez/_teradata.cpython-36m-darwin.so
Reason: image not found
During handling of the above exception, another exception occurred:
~/github/rock/rock/upload_conn.py in <module>()
8 camelCase indicates function
9 """
---> 10 import giraffez
11
12 class upload_connector:
~/miniconda3/envs/saul/lib/python3.6/site-packages/giraffez/__init__.py in <module>()
52 documentation:
53 http://www.capitalone.io/giraffez/intro.html#environment.
---> 54 """.format(error))
55
56 from ._teradata import TeradataError
Exception: dlopen(/Users/****/miniconda3/envs/saul/lib/python3.6/site-packages/giraffez/_teradata.cpython-36m-darwin.so, 2): Library not loaded: libcliv2.dylib
Referenced from: /Users/****/miniconda3/envs/saul/lib/python3.6/site-packages/giraffez/_teradata.cpython-36m-darwin.so
Reason: image not found.
This indicates that either the giraffez C extensions did not compile
correctly, or more likely, there is an issue with the environment or
installation of the Teradata dependencies. Both the Teradata Call-Level
Interface Version 2 and Teradata Parallel Transporter API require
several environment variables to be set to find the shared library
files and error message catalog.
For more information, refer to this section in the giraffez
documentation:
http://www.capitalone.io/giraffez/intro.html#environment.
Using teradata 15.10 with the following enviroment variables set in my .bash_profile
export TERADATA_HOME=/Library/Application\ Support/teradata/client/15.10
export LD_LIBRARY_PATH=$TERADATA_HOME/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$TERADATA_HOME/tbuild/lib:$LD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=$LD_LIBRARY_PATH
export NLSPATH=$TERADATA_HOME/msg/%N:$NLSPATH
Looking at the lib folder in the teradatra folder I see libcliv2.dylib
so I'm having a hard time interpreting the above error.
Version Info:
giraffez: 2.0.17
OS: macOS Sierra Version 10.12.6
Python Version: 3.6.1
I have data in a table that is float format, but when I use .to_str(), the value gets rounded.
Specifically the value is 9 digits after the decimal like:
0.#########
and gets rounded to 6 after the decimal like:
0.######
Is this intended behavior?
Hi
I am still facing exception saying: libpxicu.so cannot open shared object file [...] but required libs are present and environment variables have been set up as follows:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/teradata/client/15.00/lib64/libcliv2.so"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/teradata/client/15.00/tbuild/lib64/libtelapi.so"
export TERADATA_HOME="/opt/teradata/client"
Where should I look for to solve my issue? Any hints?
Peter
bash-3.1$ pip install giraffez
Collecting giraffez
Using cached giraffez-1.0.0.dev0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\qze841\AppData\Local\Temp\pip-build-agnajbb9\giraffez\setup.py", line 371, in
long_description=open("README.rst").read(),
File "c:\Users\qze841\AppData\Local\Continuum\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2590: character maps to
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\qze841\AppData\Local\Temp\pip-build-agnajbb9\giraffez
bash-3.1$ pwd
/c/temp/giraffez
bash-3.1$ python setup.py install
Traceback (most recent call last):
File "setup.py", line 371, in
long_description=open("README.rst").read(),
File "c:\Users\qze841\AppData\Local\Continuum\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2636: character maps to
bash-3.1$ python --version
Python 3.5.2 :: Anaconda 4.1.1 (64-bit)
reading a decimal into a panda data frame has issues when the 1/10th digit is a zero.
import pandas as pd
import giraffe
slq1='select 10.01;'
sql_results=giraffe.Cmd().execute(slq1)
r1=sql_results.one()
df1 = pd.DataFrame(r1.rows, columns=r1.columns.safe_names)
print df1
0 10.1
slq2='select 10.08;'
sql_results=giraffe.Cmd().execute(slq2)
r1=sql_results.one()
df1 = pd.DataFrame(r1.rows, columns=r1.columns.safe_names)
print df1
0 10.8
my environment version info:
giraffez 2.0.19
macOS 10.13.2
python 3.5
Teradata ODBC / TTU 16.10
Teradata Server 15.10
When I run this for any number of tables I get the same result. I am able to get a legitimate response when I use giraffez cmd "select top 1 * from mydb.my_table"
I get the print out to stdout properly. But giraffez export
seems to be causing issues, only spits out the column headers:
myComp:~|⇒ giraffez export "select top 1 * from mydb.my_table"
Connection: Connecting to data source 'data-server' ...
Connection: Connection to 'data-server' established successfully.
Export: Initiating Teradata PT request (awaiting server) ...
Export: Teradata PT request accepted.
Export: Executing ...
delimiter => "|"
null => "NULL"
output => "stdout"
encoding => "str"
query => "select top 1 * from mydb.my_table"
--------------------------------
version|id|resid|tid|parentid|off|type|prod|status|dtl|flagb|flagd
Connection: Closing Teradata connection ...
Export: Closing Teradata PT connection ...
Export: Teradata PT request complete.
Connection: Connection to 'data-server' closed.
Traceback (most recent call last):
File "/Users/ash/miniconda3/envs/py35/bin/giraffez", line 11, in <module>
sys.exit(main())
File "/Users/ash/miniconda3/envs/py35/lib/python3.5/site-packages/giraffez/__main__.py", line 25, in main
MainCommand().run()
File "/Users/ash/miniconda3/envs/py35/lib/python3.5/site-packages/giraffez/core.py", line 80, in run
args.run(args)
File "/Users/ash/miniconda3/envs/py35/lib/python3.5/site-packages/giraffez/commandline.py", line 243, in run
for i, row in enumerate(exportfn(), 1):
File "/Users/ash/miniconda3/envs/py35/lib/python3.5/site-packages/giraffez/export.py", line 256, in _fetchall
data = self.export.get_buffer()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8a in position 376: invalid start byte
The typical way I connect to our Teradata instance using a SQL editor like DBeaver, the JDBC connection string has charset=UTF8 and my ODBC settings are also set that way in odbc.ini file. When I do sys.getdefaultencoding()
I am also getting 'utf-8'.
Hello,
First off, thanks for the great tool! My company has a definite interest in utilizing giraffez for our interactions with Teradata. It has been much more pleasant to work with than Teradata's native tooling.
One feature that we desire (and are willing to write a PR for) is to be able to know how many rows were affected by a given operation. We would use this for logging / analytics purposes. Ideally, this would be accomplished by reading a rowcount
attribute of the Cursor object(s) returned by the TeradataCmd.execute function. The DB API 2.0 mentions such a feature though I know giraffez isn't designed to be PEP-249 compliant.
In browsing Teradata's CLIv2 documentation, I found that ActivityCount gets returned as a field in the SUCCESS parcel. With some basic testing, I found that this parcel does come through the results processed in the teradata_handle_record function. As for how to propagate this value up to a Cursor object, I'm still looking for the best way.
After that bit of background info, here are the questions I hope to explore with this issue:
teradata_handle_record
would even be a good location given the "purpose" of that function.Thanks again for your time and the amazing tool!
Python 3.7 updated PyUnicode_AsUTF8 to return a const char*
instead of a plain char*
. This breaks the current version of Giraffez due to teradatapt.hpp#L114 since TConn::AddAttribute
expects a plain char*
.
The TPT definition for that method is shown here:
TELAPI_EXPORT void AddAttribute(TD_Attribute, char* value);
I believe that the simplest fix for this would be to cast away the const-ness of the result returned from PyUnicode_AsUTF8, though a better (and potentially safer, depending on CPython's implementation) approach might be to copy the string into a non-const location before passing to AddAttribute
. I suppose that ideally the AddAttribute
method would already expect a const char*
, but that's not really worth fussing about.
Regardless of the approach that is decided upon, I'm willing to draft up and test a PR.
Thanks,
Andrew D.
giraffez config --init
throws a NameError in 1.1.2:
Traceback (most recent call last):
File "/home/ec2-user/.local/bin/giraffez", line 9, in <module>
load_entry_point('giraffez==1.1.2', 'console_scripts', 'giraffez')()
File "/home/ec2-user/.local/lib/python2.7/site-packages/giraffez/__main__.py", line 25, in main
MainCommand().run()
File "/home/ec2-user/.local/lib/python2.7/site-packages/giraffez/core.py", line 91, in run
args.run(args)
File "/home/ec2-user/.local/lib/python2.7/site-packages/giraffez/commandline.py", line 147, in run
log.write(MESSAGE_WRITE_DEFAULT.format(args.conf))
NameError: global name 'MESSAGE_WRITE_DEFAULT' is not defined
commandline.py (line 147) just needs to reference the new variable name from this commit b37054e:
From:
log.write(MESSAGE_WRITE_DEFAULT.format(args.conf))
To:
log.write(message_write_default.format(args.conf))
This might be a silly question since I'm unfamiliar with Teradata, but I didn't see any mention of transactions in the docs or in the repo.
Does Giraffez support transactions that span multiple statements? Is the only way to do that by placing all the statements within a single transaction as part of a single TeradataCmd.execute()
execution? i.e. There is no way to define transactions at the connection level.
giraffez
is using CLIv2 which I have no knowledge in. I'm curious that what is the advantage of CLIv2 vs JDBC and ODBC?
When I wan't to upgrade giraffez by pip:
pip install --upgrade giraffez
I've got an error:
c:\temp\pip-build-sjcz246m\giraffez\giraffez\src\common.h(34): fatal error C1083: Couldn't open file: '_strptime.c': No such file or directory.
It is not happening when I clone repo from git.
Hi,
It would be good if we can export data to SFrame and Pandas directly from Giraffez.Export. I really want to collaborate to this project!
When executing back-to-back requests, particularly pretty quickly, Giraffez comments can fail with error message "TeradataError: CLIv2: initiate request failed"
Versions
I have noticed that sometimes executing requests back-to-back can lead to a first one to succeed and a second to fail; and further that if the cmd calls were split between two jupyter notebook cells then both would succeed.
---------------------------------------------------------------------------
error Traceback (most recent call last)
.../python/conda_envs/py3k/lib/python3.5/site-packages/giraffez/cmd.py in _execute(self, command)
105 try:
--> 106 results, error = self.cmd.execute(command)
107 if error is not None:
error: CLIv2: initiate request failed
During handling of the above exception, another exception occurred:
TeradataError Traceback (most recent call last)
<ipython-input-23-60624f138025> in <module>()
2 # a simple giraffez command execution
3 with gr.Cmd() as cmd:
----> 4 results = cmd.execute_one('select * sample 5 from padw.pl_appn_root')
5 samp_data = pd.DataFrame(results.items())
6 samp_data
.../python/conda_envs/py3k/lib/python3.5/site-packages/giraffez/cmd.py in execute_one(self, command, sanitize, silent)
210 :raises `giraffez.errors.GiraffeError`: if the return data could not be decoded
211 """
--> 212 return self.execute(command, sanitize, silent).one()
213
214 def execute_many(self, command, sanitize=True, parallel=False, silent=False):
.../python/conda_envs/py3k/lib/python3.5/site-packages/giraffez/cmd.py in execute(self, command, sanitize, silent)
196 command = prepare_statement(command) # accounts for comments and newlines
197 log.debug("Debug[2]", "Command (sanitized): {!r}".format(command))
--> 198 return Results(self._execute(command))
199
200 def execute_one(self, command, sanitize=True, silent=False):
.../python/conda_envs/py3k/lib/python3.5/site-packages/giraffez/cmd.py in _execute(self, command)
109 log.debug("Debug[2]", repr(results))
110 except _cli.error as error:
--> 111 raise TeradataError(error)
112 for result in results:
113 stmt_info = result.get('stmtinfo', None)
TeradataError: CLIv2: initiate request failed
Is there any plan to have a SQLAlchemy dialect for giraffez?
Currently while exporting data when we have a data type of Number the Export fail with an error : Cannot determine data type. Are there any plan to add it to the export as Decimal type
here is sample test table
CREATE MULTISET TABLE dwp_sandbox.DW_PO_TUX_HDR_test2 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
Q_PO_HDR_ID VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
PO_TOTAL_AMT NUMBER(22,4)
)
PRIMARY INDEX ( Q_PO_HDR_ID )
In file included from giraffez/cli/cmdobject.c:18:
giraffez/cli/cmdobject.h:37:10: fatal error: 'dbcarea.h' file not found
#include <dbcarea.h>
^
1 error generated.
error: command 'clang' failed with exit status 1
I've been running into an issue in an environment where I have pycryptodome installed (dependency of another package) when I try to encrypt passwords in .girafferc
:
± giraffez config --init
colorful: true
connections:
db1:
host: null
password: null
username: null
default: db1
Successfully created file '/Users/username/.girafferc'
Set the username and password using the config module:
giraffez config --set connections.db1.host <host>
giraffez config --set connections.db1.username <username>
giraffez config --set connections.db1.password <password>
Key file '/Users/username/.giraffepg' created successfully.
± giraffez config --set connections.db1.password some_pass
Traceback (most recent call last):
File "/Users/username/miniconda3/envs/env/bin/giraffez", line 11, in <module>
sys.exit(main())
File "/Users/username/miniconda3/envs/env/lib/python3.6/site-packages/giraffez/__main__.py", line 25, in main
MainCommand().run()
File "/Users/username/miniconda3/envs/env/lib/python3.6/site-packages/giraffez/core.py", line 80, in run
args.run(args)
File "/Users/username/miniconda3/envs/env/lib/python3.6/site-packages/giraffez/commandline.py", line 177, in run
c.set_value(key, value)
File "/Users/username/miniconda3/envs/env/lib/python3.6/site-packages/giraffez/config.py", line 298, in set_value
value = self.encrypt(value, path)
File "/Users/username/miniconda3/envs/env/lib/python3.6/site-packages/giraffez/config.py", line 171, in encrypt
value = "ENCRYPTED:{}".format(ensure_str(self.cipher.encrypt(value)))
File "/Users/username/miniconda3/envs/env/lib/python3.6/site-packages/giraffez/encrypt.py", line 64, in encrypt
return base64.b64encode(iv + cipher.encrypt(raw))
File "/Users/username/miniconda3/envs/env/lib/python3.6/site-packages/Crypto/Cipher/_mode_cbc.py", line 160, in encrypt
expect_byte_string(plaintext)
File "/Users/username/miniconda3/envs/env/lib/python3.6/site-packages/Crypto/Util/_raw_api.py", line 194, in expect_byte_string
raise TypeError("Only byte strings can be passed to C code")
TypeError: Only byte strings can be passed to C code
I think that if we change the line here from raw = self.pad(raw)
to raw = self.pad(raw).encode("utf-8")
that would fix the issue. In the decrypt
function it's already assumed that the string is utf-8 encoded, so I don't think this would introduce any downstream issues. That being said I've only done a little bit of testing.
From some reading it sounds like pycryptodome is more supported than pycrypto going forward, but I'm not familiar with the arguments there.
If I have a table with a pipe (|
) in a character field and try to export and write to a string, there's no checking for whether the delimiter exists in any of the fields:
import giraffez
sql = """SELECT * FROM
(
SELECT
CAST('first|field' AS VARCHAR(20)) AS first_char,
CAST('second field' AS VARCHAR(20)) AS second_char
) AS tbl
"""
with giraffez.BulkExport(sql) as export:
r = list(export.to_str(delimiter='|'))
print(r)
['first|field|second field']
I could bring it out with a different delimiter, or roll my own to_str
that takes in a dict
from to_dict
/to_list
, but wondering if there should be functionality pushed up into to_str
to error out, warn, or remove the field if the delimiter exists in a column?
I believe this is the same / similar issue as #47 this as I'm having the same issue.
giraffez 2.0.24
macOS 10.14.2 / Ubuntu 16
python 3.5 (on macOS) / python 3.6 (on Ubuntu box)
Teradata ODBC / TTU 16.10
Teradata Server 15.10
SQL client: dbeaver / Aqua Data Studio (tried both)
JDBC for SQL client: TeraJDBC_16.20.00.02
import giraffez as g
import pandas as pd
with g.Cmd() as cmd:
cmd.execute("""
CREATE SET TABLE mydb.mytable ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
location VARCHAR(250) CHARACTER SET UNICODE NOT CASESPECIFIC
PRIMARY INDEX ( location );""")
df = pd.DataFrame({'location': ['Düsseldorf','Marché Saint-Germain','İstanbul']})
df.to_csv('location.csv', index=False)
with g.BulkLoad(table='mydb.mytable') as load:
load.from_file('location.csv', table='mydb.mytable', delimiter=",", null='')
select *
from mydb.mytable
location
---------------------
Düsseldorf
Marché Saint-Germain
Ä°stanbul
Have the same result if I use the dataframe to_list and iterate over the rows as the OP had done (from #47). The character set of my connection to TD when I do the select * is utf-8 and I'm using a GUI client. I am able to see these specific unicode characters and non-latin characters in other tables where the variable's character set is UNICODE in the create table statement on the same database. So I believe this is isolated and not related to either my SQL client or the TD server itself.
Originally posted by @ausiddiqui in #47 (comment)
When I try to insert ~1 million rows using the following syntax:
with giraffez.Cmd(panic=True) as cmd:
cmd.insert('db.table_name', 'data.csv')
it runs for about a minute and inserts ~8,000 rows of data, and then stops with the following error:
giraffez.TeradataError: 3130: Response limit exceeded.
I believe this is caused by not closing the responses from Teradata (max allowed to keep open is 16):
https://stackoverflow.com/questions/18903284/teradata-database-3130-response-limit-exceeded
thank you
Is there a way to maintain the capitalization after querying and storing off of a select statement. This becomes an issue as Python is case sensitive, so the response from the query ends up in a dataframe where something was expected to be fiscalYear but ends up being stored as fiscalyear.
Would there be a way to add this as an overall option to set for the package or for specific modules, whichever is easier?
I have a DataFrame with columns holding non-ASCII characters. I would to chance session encofing to something similar to udaExec.connect(....,charset='utf8') in teradata and cnxn.setencoding(encoding='utf-8') in pyodbc.
I am able to load data but having funny characters with:
with giraffez.BulkLoad("BDA.Table", print_error_table=True) as load:
load.columns = df.columns.tolist()
for row in df.values.tolist():
load.put(row)
How do I set encoding to UTF-8? Is it done by giraffez.encoders? Is there how-to code example?
When running the following in 2.0.7:
sql = """
SELECT
field_1 AS alias_1,
field_2 AS alias_2
FROM
db.table
"""
with giraffez.Cmd() as cmd:
res = cmd.execute(sql)
all_recs = list(res.to_dict())
all_recs
looks like [{'FIELD_1': 123, 'FIELD_2': None}, {'FIELD_1': 456, 'FIELD_2': 123}, ...]
. Contrast with behaviour in version 1.1.7 on the same SQL:
with giraffez.Cmd() as cmd:
res = cmd.execute_one(sql)
all_recs = list(res.to_json())
and all_recs
had the "safe" column names, and lower-cased to boot, like [{'alias_1': 123, 'alias_2': None}, {'alias_1': 456, 'alias_2': 123}, ...]
Include the following lines in the license headers within the scripts. This is to ensure that the automated license scanning tools can identify the license.
SPDX-Copyright: Copyright (c) Capital One Services, LLC
SPDX-License-Identifier: Apache-2.0
In the docs: http://www.capitalone.io/giraffez/api.html#module-giraffez.load
There is:
with giraffez.Load() as load:
load.from_file('myfile.txt', 'database.my_table')
but I think it should be:
with giraffez.Load() as load:
load.from_file('database.my_table', 'myfile.txt')
This is the line: https://github.com/capitalone/giraffez/blob/master/giraffez/load.py#L54
I'm able to install 1.0.0.dev without issue, but getting a traceback related to requirements.txt
when upgrading:
$ pip install giraffez --upgrade --user
Collecting giraffez
Downloading giraffez-1.0.1.tar.gz (78kB)
100% || 81kB 1.6MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-m5rtWa/giraffez/setup.py", line 366, in <module>
with open('requirements.txt') as f:
IOError: [Errno 2] No such file or directory: 'requirements.txt'
From what i can tell, requirements.txt
was not included in the 1.0.1 tarball, but was in 1.0.0.dev:
$ wget ...giraffez-1.0.1...
$ tar -zxvf giraffez-1.0.1.tar.gz
$ cd giraffez-1.0.1 && ls
I have Mac OS X, Python 3.5
I cloned the repo and checked out 2.0.0. After running python setup.py install I got this error:
gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -I/Users/frw169/anaconda/include -arch x86_64 -I/Users/frw169/CODE/giraffez/giraffez -I/Users/frw169/anaconda/include/python3.5m -c giraffez/src/columns.c -o /Users/frw169/CODE/giraffez/build/giraffez/src/columns.o -Wfatal-errors
In file included from giraffez/src/columns.c:18:
In file included from giraffez/src/giraffez.h:30:
giraffez/src/teradata.h:44:10: fatal error: 'coperr.h' file not found
#include <coperr.h>
^
1 error generated.
error: command 'gcc' failed with exit status 1
For me, I find this coperr.h file in opt/teradata/client/16.00/include/coperr.h. Are there any environmental variables that need to be set? I'm currently having no trouble running the master branch (v 1.1.8)
Can anyone please let me know what is the underlying Teradata utility used for giraffez?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.