larsyencken / csvdiff Goto Github PK
View Code? Open in Web Editor NEWGenerate a diff between two tabular datasets expressed in CSV files.
License: BSD 3-Clause "New" or "Revised" License
Generate a diff between two tabular datasets expressed in CSV files.
License: BSD 3-Clause "New" or "Revised" License
In the API :diff_files example ,it can work sucessful with column 'name' but failed with column 'id'
Traceback (most recent call last):
File "differ.py", line 3, in
patch = csvdiff.diff_files('Skill.csv', 'Skill_1.csv', ['id'])
File "/usr/local/lib/python3.6/dist-packages/csvdiff/init.py", line 44, in diff_files
ignore_columns=ignored_columns)
File "/usr/local/lib/python3.6/dist-packages/csvdiff/patch.py", line 204, in create
from_indexed = records.index(from_records, index_columns)
File "/usr/local/lib/python3.6/dist-packages/csvdiff/records.py", line 58, in index
raise InvalidKeyError('invalid column name {k} as key'.format(k=k))
csvdiff.records.InvalidKeyError: invalid column name 'id' as key
column 'id','name' are both in my testing files
Hi
I am trying to setup a .spec file for include csvdiff in Fedora but I have a got this error with test
running build_ext
Traceback (most recent call last):
File "setup.py", line 54, in
test_suite='tests',
File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup
dist.run_commands()
File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/usr/lib/python2.7/site-packages/setuptools/command/test.py", line 138, in run
self.with_project_on_sys_path(self.run_tests)
File "/usr/lib/python2.7/site-packages/setuptools/command/test.py", line 118, in with_project_on_sys_path
func()
File "/usr/lib/python2.7/site-packages/setuptools/command/test.py", line 164, in run_tests
testLoader = cks
File "/usr/lib64/python2.7/unittest/main.py", line 94, in init
self.parseArgs(argv)
File "/usr/lib64/python2.7/unittest/main.py", line 149, in parseArgs
self.createTests()
File "/usr/lib64/python2.7/unittest/main.py", line 158, in createTests
self.module)
File "/usr/lib64/python2.7/unittest/loader.py", line 128, in loadTestsFromNames
suites = [self.loadTestsFromName(name, module) for name in names]
File "/usr/lib64/python2.7/unittest/loader.py", line 103, in loadTestsFromName
return self.loadTestsFromModule(obj)
File "/usr/lib/python2.7/site-packages/setuptools/command/test.py", line 35, in loadTestsFromModule
tests.append(self.loadTestsFromName(submodule))
File "/usr/lib64/python2.7/unittest/loader.py", line 100, in loadTestsFromName
parent, obj = obj, getattr(obj, part)
AttributeError: 'module' object has no attribute 'test_core'
error: Estado de salida erróneo de /var/tmp/rpm-tmp.LrMw5o (%check)
I wonder if you could point me to a starting point, how to implement this and if there are any caveats to think of.
Problem: I generate csv files from a database view and they don't have a unique identifier which could be used as index.
Idea: use the line number of the current row - 1 as index. (like adding a virtual colum in the csv)
With the current implementation this use-case will fail silently, as no changes are reported:
from csvdiff import *
diff_files("e.txt", "f.txt", [], ";")
I would like to implement this functionality and provide a pull request for this feature if you think that is a good idea.
e.txt
f.txt
I had to rename the files to .txt as github doesn't support .csv
Hi,
So, was using this plugin and could not use it with tab separated csv files. All good for comma separated ones, but not tab separated ones.
Is viable to do this support based on the current implementation?
PS: If so, and if not a priority, i can try to do a PR for this.
Thanks in advance
Is it possible to change the default separator komma to for example semicolumn.
Just an FYI - it installs through PIP, so I expected it to work. If you're not supporting old versions that's fair.
Traceback (most recent call last):
File "/usr/bin/csvdiff", line 7, in
from csvdiff import csvdiff_cmd
File "/usr/lib/python2.6/site-packages/csvdiff/init.py", line 14, in
from . import records, patch, error
File "/usr/lib/python2.6/site-packages/csvdiff/records.py", line 53
for r in record_seq
^
SyntaxError: invalid syntax
When running csvdiff with style pretty and compact it works as expected but with --style=summary I get the following error:
Traceback (most recent call last):
File "/bin/csvdiff", line 11, in
sys.exit(csvdiff_cmd())
File "/usr/lib/python2.7/site-packages/click/core.py", line 716, in call
return self.main(_args, *_kwargs)
File "/usr/lib/python2.7/site-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/usr/lib/python2.7/site-packages/click/core.py", line 889, in invoke
return ctx.invoke(self.callback, *_ctx.params)
File "/usr/lib/python2.7/site-packages/click/core.py", line 534, in invoke
return callback(_args, **kwargs)
File "/usr/lib/python2.7/site-packages/csvdiff/init.py", line 136, in csvdiff_cmd
_diff_and_summarize(from_csv, to_csv, index_columns, ostream)
File "/usr/lib/python2.7/site-packages/csvdiff/init.py", line 166, in _diff_and_summarize
_summarize_diff(diff, len(from_records), stream=stream)
AttributeError: DictReader instance has no attribute 'len'
Running Fedora 23 and Python 2.7, when trying to run Python 3.4 I get the following error (for any style):
Traceback (most recent call last):
File "/bin/csvdiff", line 7, in
from csvdiff import csvdiff_cmd
If more information is required please let me know.
I have the following 2 csv files,
view-1.csv
As of Date,Business Title,Email,Employee Type,Employee_ID
view-2.csv
'As of Date',As of Date,Business Title,Email,Employee Type,Employee_ID
Running
$ csvdiff Employee_ID view-1.csv view-2.csv
Throws the following error,
Traceback (most recent call last):
File "/Users/.pyenv/versions/3.7.5/bin/csvdiff", line 8, in <module>
sys.exit(csvdiff_cmd())
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/__init__.py", line 160, in csvdiff_cmd
significance=significance)
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/__init__.py", line 172, in _diff_files_to_stream
diff = diff_files(from_csv, to_csv, index_columns, sep=sep, ignored_columns=ignored_columns)
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/__init__.py", line 44, in diff_files
ignore_columns=ignored_columns)
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 211, in create
return create_indexed(from_indexed, to_indexed, index_columns)
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 222, in create_indexed
index_columns)
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 252, in _assemble
key=_change_key)
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 251, in <genexpr>
for k in changed),
File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 264, in record_diff
from_ = lhs[k]
KeyError: "'As of Date'"
I was expecting the output to be something like columns removed/added: 1, 'As of Date'
.
How to deal with a white space in the column name?
For example grocerylist.csv:
Product Part Number,Product Name,Amount In Stock
1,Banana,7
2,Apple,12
3,Raspberry,7
4,Mango,19
5,Potato,10
Now csvdiff breaks execution after Product because not finding Part
Thank you for contributing such a nice utility for the csv file comparison.
It would be great to have an option to ignore commented lines/rows or skip n lines from the beginning. In the following example if the user specifies '#' as a comment character the program should skip those lines from the comparison.
e.g. The commented header lines describing the file content.
#Author name
#Description
id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10
Hi -- I was just wondering if there is a way to specify multiple indices for comparing.
for eg :
csvdiff --style=pretty --output=diff.json -k element_name,parent_category a.csv b.csv
where -k specifies keys. Thanks in advance.
Currently the documentation only references command line usage. Would it be possible to get some documentation/examples on how to use csvdiff within Python code?
It'd be really nice if there was some way to say "Consider numeric values equivalent if they're equal to a certain number of decimal places."
Here's a simple function for doing so from Stack Overflow:
def nearly_equal(a, b, sig_fig=5):
return (a == b or
int(a * 10**sig_fig) == int(b * 10**sig_fig))
Duplicated rows appear to be ignored and the user is told that files are identical
.
2019-05-22 06:03:45 ⌚ |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| → head a.csv duplicate.csv
==> a.csv <==
id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10
==> duplicate.csv <==
id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
4,jeff,19
6,fred,10
2019-05-22 06:04:10 ⌚ |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| → csvdiff --style=summary id a.csv duplicate.csv
files are identical
2019-05-22 06:04:25 ⌚ |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| →
running test
running egg_info
writing requirements to csvdiff.egg-info\requires.txt
writing csvdiff.egg-info\PKG-INFO
writing top-level names to csvdiff.egg-info\top_level.txt
writing dependency_links to csvdiff.egg-info\dependency_links.txt
writing entry points to csvdiff.egg-info\entry_points.txt
reading manifest file 'csvdiff.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'csvdiff.egg-info\SOURCES.txt'
running build_ext
test_csvdiff_fails_without_enough_arguments (tests.test_csvdiff.TestCsvdiff) ... ok
test_csvdiff_fails_without_valid_key (tests.test_csvdiff.TestCsvdiff) ... ok
test_diff_command_valid_usage_with_difference (tests.test_csvdiff.TestCsvdiff) ... FAIL
test_diff_command_valid_usage_with_separator (tests.test_csvdiff.TestCsvdiff) ... FAIL
test_diff_records_multikey (tests.test_csvdiff.TestCsvdiff) ... ok
test_diff_records_nonstr_values (tests.test_csvdiff.TestCsvdiff) ... ok
test_diff_records_str_values (tests.test_csvdiff.TestCsvdiff) ... ok
test_diff_with_index_as_ignore_field (tests.test_csvdiff.TestCsvdiff) ... ERROR
test_diff_with_valid_ignore (tests.test_csvdiff.TestCsvdiff) ... ERROR
test_patch_add (tests.test_csvdiff.TestCsvdiff) ... ok
test_patch_change (tests.test_csvdiff.TestCsvdiff) ... ok
test_patch_cmd_fails_when_json_doesnt_match_schema (tests.test_csvdiff.TestCsvdiff) ... FAIL
test_patch_cmd_fails_when_json_is_invalid (tests.test_csvdiff.TestCsvdiff) ... FAIL
test_patch_cmd_valid_args (tests.test_csvdiff.TestCsvdiff) ... FAIL
test_patch_remove (tests.test_csvdiff.TestCsvdiff) ... ok
test_patch_schema_is_valid (tests.test_csvdiff.TestCsvdiff) ... ok
test_summarize (tests.test_csvdiff.TestCsvdiff) ... ok
test_summarize_cmd (tests.test_csvdiff.TestCsvdiff) ... ERROR
test_summarize_identical (tests.test_csvdiff.TestCsvdiff) ... ok
======================================================================
ERROR: test_diff_with_index_as_ignore_field (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 321, in test_diff_with_index_as_ignore_field
result = self.csvdiff_summary_cmd('id', self.a_file, self.b_file, ignore_columns='id')
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 67, in csvdiff_summary_cmd
with open(t.name, 'r') as istream:
IOError: [Errno 13] Permission denied: 'c:\\users\\fsc\\appdata\\local\\temp\\tmp9f2e3k'
======================================================================
ERROR: test_diff_with_valid_ignore (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 312, in test_diff_with_valid_ignore
with tmp_csv_files(lhs, rhs) as (lhs_file, rhs_file):
File "C:\Python27\lib\contextlib.py", line 17, in __enter__
return self.gen.next()
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 413, in tmp_csv_files
save_as_csv(arg, t.name)
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 420, in save_as_csv
with open(filename, 'w') as ostream:
IOError: [Errno 13] Permission denied: 'c:\\users\\fsc\\appdata\\local\\temp\\tmpu841l8'
======================================================================
ERROR: test_summarize_cmd (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 118, in test_summarize_cmd
with tmp_csv_files(lhs, rhs) as (lhs_file, rhs_file):
File "C:\Python27\lib\contextlib.py", line 17, in __enter__
return self.gen.next()
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 413, in tmp_csv_files
save_as_csv(arg, t.name)
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 420, in save_as_csv
with open(filename, 'w') as ostream:
IOError: [Errno 13] Permission denied: 'c:\\users\\fsc\\appdata\\local\\temp\\tmprp8oqu'
======================================================================
FAIL: test_diff_command_valid_usage_with_difference (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 157, in test_diff_command_valid_usage_with_difference
self.assertEqual(result.exit_code, 1)
AssertionError: -1 != 1
======================================================================
FAIL: test_diff_command_valid_usage_with_separator (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 179, in test_diff_command_valid_usage_with_separator
self.assertEqual(result.exit_code, 1)
AssertionError: -1 != 1
======================================================================
FAIL: test_patch_cmd_fails_when_json_doesnt_match_schema (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 343, in test_patch_cmd_fails_when_json_doesnt_match_schema
self.assertEqual(result.exit_code, 2)
AssertionError: -1 != 2
======================================================================
FAIL: test_patch_cmd_fails_when_json_is_invalid (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 338, in test_patch_cmd_fails_when_json_is_invalid
self.assertEqual(result.exit_code, 2)
AssertionError: -1 != 2
======================================================================
FAIL: test_patch_cmd_valid_args (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 329, in test_patch_cmd_valid_args
self.assertEqual(result.exit_code, 0)
AssertionError: -1 != 0
----------------------------------------------------------------------
Ran 19 tests in 0.030s
FAILED (failures=5, errors=3)
Windows 10, Version 1709
Python 2.4.17
Hey Lars - great tool ! very elegant ... !
Is there some way we can execute the --style summary from a python project?
Large File Error:
File "i:\anaconda3\lib\site-packages\csvdiff\records.py", line 53, in
for r in record_seq
MemoryError
Unicode Decode Error:
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5148: character maps to
I am getting error while doing CSV diff. The output does not show the actual issue.
Can you help me in narrowing down the issue. I am getting this for only one set of files.
Lars - I have 2 csv files and have initialised 2 variables to hold the paths respectively :
csv_file1 = '/home/notex/extractors/ccar/output/CCARFRY1420171231.csv'
csv_file2 = '/home/notex/extractors/ccar/output/CCAR-vFRY14-20180331.csv'
Also ran a :
import os.path
os.path.exists(csv_file1 )
which returns true.
But here :
!csvdiff --style=pretty --output=diff.json element_type,element_name,parent_category csv_file1 csv_file2
I am getting this :
Error: Invalid value for "FROM_CSV": Path "csv_file1" does not exist.
Is this a known bug? Thanks in advance.
I found this package to be too good for the folks who deal with the files. I loved it. But the difficulty which i am facing it is how to use this package in my project? instead of running from the terminal?
csvdiff --style=summary id a.csv b.csv -- I dont want to use like this
It would be great helpfull if you help me in executing the package in way which i specified below..
import csvdiff
csvdiff.csvdiff_cmd(id,a.csv,b.csv,--style="summary")
I should execute this package in another .py file
Our diffs have enough information to be reusable. The same diff format could be used to patch a dataset, and this might be an elegant way of recording a set of custom changes against an official version of a dataset.
Ideally, you wouldn't need to specify the set of columns used in the patch. Perhaps the diff format could include the column names in a key
property.
+ python tests/test_csvdiff.py
tests/test_csvdiff.py:146: DeprecationWarning: Please use assertEqual instead.
self.assertEquals(result.exit_code, 2)
../tmp/build/80754af9/csvdiff_1527844944335/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.6/site-packages/jsonschema/exceptions.py:30: ResourceWarning: unclosed file <_io.FileIO name=4 mode='rb+' closefd=True>
super(_Error, self).__init__(
/tmp/build/80754af9/csvdiff_1527844944335/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.6/site-packages/jsonschema/exceptions.py:30: ResourceWarning: unclosed file <_io.FileIO name=6 mode='rb+' closefd=True>
super(_Error, self).__init__(
...........F.....
======================================================================
FAIL: test_patch_cmd_valid_args (__main__.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_csvdiff.py", line 334, in test_patch_cmd_valid_args
self.assertRecordsEqual(result.records, expected)
File "tests/test_csvdiff.py", line 398, in assertRecordsEqual
self.assertEqual(records.sort(lhs), records.sort(rhs))
AssertionError: Lists differ: [Orde[19 chars]), ('amount', '13'), ('name', 'fred')]), Order[244 chars]')])] != [Orde[19 chars]), ('name', 'fred'), ('amount', '13')]), Order[244 chars]')])]
First differing element 0:
OrderedDict([('id', '6'), ('amount', '13'), ('name', 'fred')])
OrderedDict([('id', '6'), ('name', 'fred'), ('amount', '13')])
error message:
Traceback (most recent call last): File "C:\Python27\Scripts\csvdiff-script.py", line 9, in load_entry_point('csvdiff==0.3.1', 'console_scripts', 'csvdiff')() File "C:\Python27\lib\site-packages\click\core.py", line 716, in call return self.main(*args, **kwargs) File "C:\Python27\lib\site-packages\click\core.py", line 696, in main rv = self.invoke(ctx) File "C:\Python27\lib\site-packages\click\core.py", line 889, in invoke return ctx.invoke(self.callback, **ctx.params) File "C:\Python27\lib\site-packages\click\core.py", line 534, in invoke return callback(*args, **kwargs) File "C:\Python27\lib\site-packages\csvdiff-0.3.1-py2.7.egg\csvdiff_init_.py", line 151, in csvdiff_cmd sep=sep, ignored_columns=ignore_columns) File "C:\Python27\lib\site-packages\csvdiff-0.3.1-py2.7.egg\csvdiff_init_.py", line 181, in _diff_and_summarize diff = patch.create(from_records, to_records, index_columns, ignored_columns) File "C:\Python27\lib\site-packages\csvdiff-0.3.1-py2.7.egg\csvdiff\patch.py", line 208, in create from_indexed = records.filter_ignored(from_indexed, ignore_columns) File "C:\Python27\lib\site-packages\csvdiff-0.3.1-py2.7.egg\csvdiff\records.py", line 52, in filter_ignored sequence[key].pop(i) KeyError: u'aa'
error prompt from sequence[key].pop(i), need some operation for BOM CSV compare as following
sequence[key].pop(i.encode('utf-8-sig'))
If there are files that don't have a column header it would help to be able to specify the key and ignored columns by column number e.g., 0,3,6
. This can be simulated by a wrapper script that adds out a top line 0,1,2,3,...
into the two files. It would be helpful if cvsdiff supported this directly to avoid having to insert header lines into both files.
Details here. I am sure this is just some setup problem with my environment but just don't know where to start looking
It doesn't look like you can execute diffs when you've added or removed a column from a csv unless I am missing something. Perhaps this would be useful to implement? (Happy to help!)
Hello,
First, thank you for writing this tool. Very useful.
I got this error:
Traceback (most recent call last):
File "/home/user/.local/bin/csvpatch", line 11, in <module>
sys.exit(csvpatch_cmd())
File "/home/user/.local/lib/python3.4/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/user/.local/lib/python3.4/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/user/.local/lib/python3.4/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/.local/lib/python3.4/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/user/.local/lib/python3.4/site-packages/csvdiff/__init__.py", line 218, in csvpatch_cmd
patch_file(patch_stream, fromcsv_stream, tocsv_stream, strict=strict)
File "/home/user/.local/lib/python3.4/site-packages/csvdiff/__init__.py", line 69, in patch_file
fieldnames = from_records.fieldnames
AttributeError: 'generator' object has no attribute 'fieldnames'
when applying an empty patch file
{
"_index": [
"W",
"B",
"B2"
],
"added": [],
"changed": [],
"removed": []
}
to an almost empty document:
"UE","W","B","B2","B3","UB",
Traceback (most recent call last):
File "C:/Users/firsi/PycharmProjects/sql_compare/operation.py", line 64, in
compare_common(db_list1, db_list2)
File "C:/Users/firsi/PycharmProjects/sql_compare/operation.py", line 54, in compare_common
diff = csvdiff.diff_files(file1, file2, [(index.split(',')[0])])
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff_init_.py", line 44, in diff_files
ignore_columns=ignored_columns)
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\patch.py", line 204, in create
from_indexed = records.index(from_records, index_columns)
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 53, in index
for r in record_seq
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 51, in
obj = {
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 38, in iter
for lineno, r in enumerate(self.reader, 2):
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\csv.py", line 111, in next
self.fieldnames
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\csv.py", line 98, in fieldnames
self._fieldnames = next(self.reader)
UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 85: illegal multibyte sequence
when I diff two files I write with utf8, pycharm raise this error
This tool looks just what we need so I thought to kick the tires and look at the issue to understand any limitations. I noticed that folks are not reporting potential bugs with sample code. It is possible to configure the github repo with a template for issues. In that you can request that they supply a minimal example of the files to compare and the commandline. For example you can show them this sort of output as a valid bug report for folk to post:
± |master ?:27 ✗| → head a.csv b.csv
==> a.csv <==
id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10
==> b.csv <==
id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10
2019-05-22 05:49:29 ⌚ |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| → csvdiff --style=summary id a.csv b.csv
files are identical
In a recent separator change, tox didn't fail, but running python2.7 setup.py test
correctly found breaking tests.
What's the difference between the two, and how can we make sure that tox fails when it's supposed to?
Reference version where tox should fail: 0a31d59
Hi
Can you please add a manpage?
I am trying to package csvdiff in Fedora and executables files under /urs/bin need to have a man page
There is a example than can be usefull in
https://bugzilla.redhat.com/attachment.cgi?id=967892
Regards
Hi,
I like your tool!
How can I diff two files and export the difference in a CSV file (without patching any file)?
Thanks for this library! However I think there is currently no way to just output the diff as csv? Something like csvpatch --input=diff.json --output=diff.csv
. Let me know if there is a way! :)
not getting option to use multiple keys'-k':
Usage: csvdiff [OPTIONS] INDEX_COLUMNS FROM_CSV TO_CSV
Compare two csv files to see what rows differ between them. The files are
each expected to have a header row, and for each row to be uniquely
identified by one or more indexing columns.
Options:
--style [compact|pretty|summary]
Instead of the default compact output,
pretty-print or give a summary instead
-o, --output PATH Output to a file instead of stdout
-q, --quiet Don't output anything, just use exit codes
--sep TEXT Separator to use between fields [default:
comma]
-i, --ignore-columns CSV a comma seperated list of columns to ignore
from the comparison
--significance INTEGER Ignore numeric changes less than this
number of significant figures
--help Show this message and exit.
I have been using this tool a lot for reconciliation stuff in accounting, when I want to share this with non-techies YML is a really nice format that can be easily converted to a readable, presentable stuff. Would be nice to have support for this, I'll try to contribute to that myself.
Something like this:
I have been trying to make the command csvdiff work with multiple ignore columns but it looks like it is not working. Can you please provide an example of syntax? I have tried multiple styles and it seems to be not working. Please advise. Thanks.
Hi
The included tests cases are failing in Fedora 24 with python3.5, failied build is:
http://koji.fedoraproject.org/koji/taskinfo?taskID=13128669
Traceback is:
I am trying to run in basic form this module.
import csvdiff
diff = csvdiff.diff_files('output.txt', 'input.txt',[])
print(diff)
I am a littel bit confused. i am getting following errors:
"/python/test.py"
Traceback (most recent call last):
File "XXXX/python/test.py", line 4, in
diff = csvdiff.diff_files('output.txt', 'input.txt',[])
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff_init_.py", line 44, in diff_files
ignore_columns=ignored_columns)
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff\patch.py", line 211, in create
return create_indexed(from_indexed, to_indexed, index_columns)
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff\patch.py", line 222, in create_indexed
index_columns)
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff\patch.py", line 252, in _assemble
key=change_key)
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff\patch.py", line 251, in
for k in changed),
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff\patch.py", line 264, in record_diff
from = lhs[k]
KeyError: 'CAFEE01'
Process finished with exit code 1
This is more of a enhancement. Providing the capability to only have one type of json. For example, only deleted, added, or change.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.