larsyencken / csvdiff Goto Github PK

View Code? Open in Web Editor NEW

132.0 132.0 31.0 116 KB

Generate a diff between two tabular datasets expressed in CSV files.

License: BSD 3-Clause "New" or "Revised" License

Makefile 4.04% Python 95.96%

csvdiff's People

Contributors

Stargazers

Watchers

csvdiff's Issues

no such options --ignore-columns

I installed csvdiff "pip install csvdiff"

when I run this command "csvdiff --style=summary --ignore_columns=amount id a.csv b.csv", I get an error "Error: no such option: --ignore_columns"

invalid column name 'id' as key

In the API ：diff_files example ,it can work sucessful with column 'name' but failed with column 'id'

Traceback (most recent call last):
File "differ.py", line 3, in
patch = csvdiff.diff_files('Skill.csv', 'Skill_1.csv', ['id'])
File "/usr/local/lib/python3.6/dist-packages/csvdiff/init.py", line 44, in diff_files
ignore_columns=ignored_columns)
File "/usr/local/lib/python3.6/dist-packages/csvdiff/patch.py", line 204, in create
from_indexed = records.index(from_records, index_columns)
File "/usr/local/lib/python3.6/dist-packages/csvdiff/records.py", line 58, in index
raise InvalidKeyError('invalid column name {k} as key'.format(k=k))
csvdiff.records.InvalidKeyError: invalid column name 'id' as key

column 'id','name' are both in my testing files

test failed

I am trying to setup a .spec file for include csvdiff in Fedora but I have a got this error with test

running build_ext
Traceback (most recent call last):
File "setup.py", line 54, in
test_suite='tests',
File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup
dist.run_commands()
File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/usr/lib/python2.7/site-packages/setuptools/command/test.py", line 138, in run
self.with_project_on_sys_path(self.run_tests)
File "/usr/lib/python2.7/site-packages/setuptools/command/test.py", line 118, in with_project_on_sys_path
func()
File "/usr/lib/python2.7/site-packages/setuptools/command/test.py", line 164, in run_tests
testLoader = cks
File "/usr/lib64/python2.7/unittest/main.py", line 94, in init
self.parseArgs(argv)
File "/usr/lib64/python2.7/unittest/main.py", line 149, in parseArgs
self.createTests()
File "/usr/lib64/python2.7/unittest/main.py", line 158, in createTests
self.module)
File "/usr/lib64/python2.7/unittest/loader.py", line 128, in loadTestsFromNames
suites = [self.loadTestsFromName(name, module) for name in names]
File "/usr/lib64/python2.7/unittest/loader.py", line 103, in loadTestsFromName
return self.loadTestsFromModule(obj)
File "/usr/lib/python2.7/site-packages/setuptools/command/test.py", line 35, in loadTestsFromModule
tests.append(self.loadTestsFromName(submodule))
File "/usr/lib64/python2.7/unittest/loader.py", line 100, in loadTestsFromName
parent, obj = obj, getattr(obj, part)
AttributeError: 'module' object has no attribute 'test_core'
error: Estado de salida erróneo de /var/tmp/rpm-tmp.LrMw5o (%check)

Create on-the-fly index column if you don't explicitly specify one

I wonder if you could point me to a starting point, how to implement this and if there are any caveats to think of.

Problem: I generate csv files from a database view and they don't have a unique identifier which could be used as index.
Idea: use the line number of the current row - 1 as index. (like adding a virtual colum in the csv)

With the current implementation this use-case will fail silently, as no changes are reported:
from csvdiff import *
diff_files("e.txt", "f.txt", [], ";")
I would like to implement this functionality and provide a pull request for this feature if you think that is a good idea.

e.txt
f.txt
I had to rename the files to .txt as github doesn't support .csv

Add support to tab separated csv files

Hi,

So, was using this plugin and could not use it with tab separated csv files. All good for comma separated ones, but not tab separated ones.

Is viable to do this support based on the current implementation?

PS: If so, and if not a priority, i can try to do a PR for this.

Thanks in advance

Separator choice

Is it possible to change the default separator komma to for example semicolumn.

Syntax error line 53 for python 2.6.6

Just an FYI - it installs through PIP, so I expected it to work. If you're not supporting old versions that's fair.

Traceback (most recent call last):
File "/usr/bin/csvdiff", line 7, in
from csvdiff import csvdiff_cmd
File "/usr/lib/python2.6/site-packages/csvdiff/init.py", line 14, in
from . import records, patch, error
File "/usr/lib/python2.6/site-packages/csvdiff/records.py", line 53
for r in record_seq
^
SyntaxError: invalid syntax

--style=summary gives an error output

When running csvdiff with style pretty and compact it works as expected but with --style=summary I get the following error:

Traceback (most recent call last):
File "/bin/csvdiff", line 11, in
sys.exit(csvdiff_cmd())
File "/usr/lib/python2.7/site-packages/click/core.py", line 716, in call
return self.main(_args, *_kwargs)
File "/usr/lib/python2.7/site-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/usr/lib/python2.7/site-packages/click/core.py", line 889, in invoke
return ctx.invoke(self.callback, *_ctx.params)
File "/usr/lib/python2.7/site-packages/click/core.py", line 534, in invoke
return callback(_args, **kwargs)
File "/usr/lib/python2.7/site-packages/csvdiff/init.py", line 136, in csvdiff_cmd
_diff_and_summarize(from_csv, to_csv, index_columns, ostream)
File "/usr/lib/python2.7/site-packages/csvdiff/init.py", line 166, in _diff_and_summarize
_summarize_diff(diff, len(from_records), stream=stream)
AttributeError: DictReader instance has no attribute 'len'

Running Fedora 23 and Python 2.7, when trying to run Python 3.4 I get the following error (for any style):

Traceback (most recent call last):
File "/bin/csvdiff", line 7, in
from csvdiff import csvdiff_cmd

If more information is required please let me know.

KeyError when comparing csv files

I have the following 2 csv files,

view-1.csv

As of Date,Business Title,Email,Employee Type,Employee_ID

view-2.csv

'As of Date',As of Date,Business Title,Email,Employee Type,Employee_ID

Running

$ csvdiff Employee_ID view-1.csv view-2.csv

Throws the following error,

Traceback (most recent call last):
  File "/Users/.pyenv/versions/3.7.5/bin/csvdiff", line 8, in <module>
    sys.exit(csvdiff_cmd())
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/__init__.py", line 160, in csvdiff_cmd
    significance=significance)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/__init__.py", line 172, in _diff_files_to_stream
    diff = diff_files(from_csv, to_csv, index_columns, sep=sep, ignored_columns=ignored_columns)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/__init__.py", line 44, in diff_files
    ignore_columns=ignored_columns)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 211, in create
    return create_indexed(from_indexed, to_indexed, index_columns)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 222, in create_indexed
    index_columns)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 252, in _assemble
    key=_change_key)
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 251, in <genexpr>
    for k in changed),
  File "/Users/.pyenv/versions/3.7.5/lib/python3.7/site-packages/csvdiff/patch.py", line 264, in record_diff
    from_ = lhs[k]
KeyError: "'As of Date'"

I was expecting the output to be something like columns removed/added: 1, 'As of Date'.

(white)space in column name

How to deal with a white space in the column name?

For example grocerylist.csv:

Product Part Number,Product Name,Amount In Stock
1,Banana,7
2,Apple,12
3,Raspberry,7
4,Mango,19
5,Potato,10

Now csvdiff breaks execution after Product because not finding Part

Option(s) to ignore commented lines/rows or skip n lines from the beginning

Thank you for contributing such a nice utility for the csv file comparison.

It would be great to have an option to ignore commented lines/rows or skip n lines from the beginning. In the following example if the user specifies '#' as a comment character the program should skip those lines from the comparison.

e.g. The commented header lines describing the file content.

#Author name
#Description
id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10

Create multiple indices

Hi -- I was just wondering if there is a way to specify multiple indices for comparing.

for eg :
csvdiff --style=pretty --output=diff.json -k element_name,parent_category a.csv b.csv

where -k specifies keys. Thanks in advance.

Documentation for non-command line usage

Currently the documentation only references command line usage. Would it be possible to get some documentation/examples on how to use csvdiff within Python code?

Approximately equal numeric fields

It'd be really nice if there was some way to say "Consider numeric values equivalent if they're equal to a certain number of decimal places."

Here's a simple function for doing so from Stack Overflow:

def nearly_equal(a, b, sig_fig=5):
    return (a == b or 
            int(a * 10**sig_fig) == int(b * 10**sig_fig))

does not detect duplicated lines

Summary

Duplicated rows appear to be ignored and the user is told that files are identical.

Minimal Reproducible Example

 2019-05-22 06:03:45 ⌚  |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| → head a.csv duplicate.csv 
==> a.csv <==
id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10

==> duplicate.csv <==
id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
4,jeff,19
6,fred,10

 2019-05-22 06:04:10 ⌚  |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| → csvdiff --style=summary id a.csv duplicate.csv 
files are identical

 2019-05-22 06:04:25 ⌚  |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| →

Tests fail (5 failures, 3 errors) on Windows 10 with Python 2.7.14

running test
running egg_info
writing requirements to csvdiff.egg-info\requires.txt
writing csvdiff.egg-info\PKG-INFO
writing top-level names to csvdiff.egg-info\top_level.txt
writing dependency_links to csvdiff.egg-info\dependency_links.txt
writing entry points to csvdiff.egg-info\entry_points.txt
reading manifest file 'csvdiff.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'csvdiff.egg-info\SOURCES.txt'
running build_ext
test_csvdiff_fails_without_enough_arguments (tests.test_csvdiff.TestCsvdiff) ... ok
test_csvdiff_fails_without_valid_key (tests.test_csvdiff.TestCsvdiff) ... ok
test_diff_command_valid_usage_with_difference (tests.test_csvdiff.TestCsvdiff) ... FAIL
test_diff_command_valid_usage_with_separator (tests.test_csvdiff.TestCsvdiff) ... FAIL
test_diff_records_multikey (tests.test_csvdiff.TestCsvdiff) ... ok
test_diff_records_nonstr_values (tests.test_csvdiff.TestCsvdiff) ... ok
test_diff_records_str_values (tests.test_csvdiff.TestCsvdiff) ... ok
test_diff_with_index_as_ignore_field (tests.test_csvdiff.TestCsvdiff) ... ERROR
test_diff_with_valid_ignore (tests.test_csvdiff.TestCsvdiff) ... ERROR
test_patch_add (tests.test_csvdiff.TestCsvdiff) ... ok
test_patch_change (tests.test_csvdiff.TestCsvdiff) ... ok
test_patch_cmd_fails_when_json_doesnt_match_schema (tests.test_csvdiff.TestCsvdiff) ... FAIL
test_patch_cmd_fails_when_json_is_invalid (tests.test_csvdiff.TestCsvdiff) ... FAIL
test_patch_cmd_valid_args (tests.test_csvdiff.TestCsvdiff) ... FAIL
test_patch_remove (tests.test_csvdiff.TestCsvdiff) ... ok
test_patch_schema_is_valid (tests.test_csvdiff.TestCsvdiff) ... ok
test_summarize (tests.test_csvdiff.TestCsvdiff) ... ok
test_summarize_cmd (tests.test_csvdiff.TestCsvdiff) ... ERROR
test_summarize_identical (tests.test_csvdiff.TestCsvdiff) ... ok

======================================================================
ERROR: test_diff_with_index_as_ignore_field (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 321, in test_diff_with_index_as_ignore_field
    result = self.csvdiff_summary_cmd('id', self.a_file, self.b_file, ignore_columns='id')
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 67, in csvdiff_summary_cmd
    with open(t.name, 'r') as istream:
IOError: [Errno 13] Permission denied: 'c:\\users\\fsc\\appdata\\local\\temp\\tmp9f2e3k'

======================================================================
ERROR: test_diff_with_valid_ignore (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 312, in test_diff_with_valid_ignore
    with tmp_csv_files(lhs, rhs) as (lhs_file, rhs_file):
  File "C:\Python27\lib\contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 413, in tmp_csv_files
    save_as_csv(arg, t.name)
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 420, in save_as_csv
    with open(filename, 'w') as ostream:
IOError: [Errno 13] Permission denied: 'c:\\users\\fsc\\appdata\\local\\temp\\tmpu841l8'

======================================================================
ERROR: test_summarize_cmd (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 118, in test_summarize_cmd
    with tmp_csv_files(lhs, rhs) as (lhs_file, rhs_file):
  File "C:\Python27\lib\contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 413, in tmp_csv_files
    save_as_csv(arg, t.name)
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 420, in save_as_csv
    with open(filename, 'w') as ostream:
IOError: [Errno 13] Permission denied: 'c:\\users\\fsc\\appdata\\local\\temp\\tmprp8oqu'

======================================================================
FAIL: test_diff_command_valid_usage_with_difference (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 157, in test_diff_command_valid_usage_with_difference
    self.assertEqual(result.exit_code, 1)
AssertionError: -1 != 1

======================================================================
FAIL: test_diff_command_valid_usage_with_separator (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 179, in test_diff_command_valid_usage_with_separator
    self.assertEqual(result.exit_code, 1)
AssertionError: -1 != 1

======================================================================
FAIL: test_patch_cmd_fails_when_json_doesnt_match_schema (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 343, in test_patch_cmd_fails_when_json_doesnt_match_schema
    self.assertEqual(result.exit_code, 2)
AssertionError: -1 != 2

======================================================================
FAIL: test_patch_cmd_fails_when_json_is_invalid (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 338, in test_patch_cmd_fails_when_json_is_invalid
    self.assertEqual(result.exit_code, 2)
AssertionError: -1 != 2

======================================================================
FAIL: test_patch_cmd_valid_args (tests.test_csvdiff.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\workspace\csvdiff\tests\test_csvdiff.py", line 329, in test_patch_cmd_valid_args
    self.assertEqual(result.exit_code, 0)
AssertionError: -1 != 0

----------------------------------------------------------------------
Ran 19 tests in 0.030s

FAILED (failures=5, errors=3)

Windows 10, Version 1709
Python 2.4.17

Calling csvdiff --style=summary from python script

Hey Lars - great tool ! very elegant ... !

Is there some way we can execute the --style summary from a python project?

Not working with Large files or Unicode Chars

Large File Error:

File "i:\anaconda3\lib\site-packages\csvdiff\records.py", line 53, in
for r in record_seq
MemoryError

Unicode Decode Error:

return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5148: character maps to

ERROR: CSV parse error on line 90

I am getting error while doing CSV diff. The output does not show the actual issue.
Can you help me in narrowing down the issue. I am getting this for only one set of files.

Error: Invalid value for "FROM_CSV":

Lars - I have 2 csv files and have initialised 2 variables to hold the paths respectively :
csv_file1 = '/home/notex/extractors/ccar/output/CCARFRY1420171231.csv'
csv_file2 = '/home/notex/extractors/ccar/output/CCAR-vFRY14-20180331.csv'

Also ran a :
import os.path
os.path.exists(csv_file1 )
which returns true.

But here :
!csvdiff --style=pretty --output=diff.json element_type,element_name,parent_category csv_file1 csv_file2
I am getting this :
Error: Invalid value for "FROM_CSV": Path "csv_file1" does not exist.

Is this a known bug? Thanks in advance.

How to Use Package in Project?

I found this package to be too good for the folks who deal with the files. I loved it. But the difficulty which i am facing it is how to use this package in my project? instead of running from the terminal?
csvdiff --style=summary id a.csv b.csv -- I dont want to use like this

It would be great helpfull if you help me in executing the package in way which i specified below..
import csvdiff
csvdiff.csvdiff_cmd(id,a.csv,b.csv,--style="summary")

I should execute this package in another .py file

add a parameter for ignored_columns to the method diff_records

Add matching csvpatch utility

Our diffs have enough information to be reusable. The same diff format could be used to patch a dataset, and this might be an elegant way of recording a set of custom changes against an official version of a dataset.

Ideally, you wouldn't need to specify the set of columns used in the patch. Perhaps the diff format could include the column names in a key property.

Test failure with py3.6

+ python tests/test_csvdiff.py
tests/test_csvdiff.py:146: DeprecationWarning: Please use assertEqual instead.
  self.assertEquals(result.exit_code, 2)
../tmp/build/80754af9/csvdiff_1527844944335/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.6/site-packages/jsonschema/exceptions.py:30: ResourceWarning: unclosed file <_io.FileIO name=4 mode='rb+' closefd=True>
  super(_Error, self).__init__(
/tmp/build/80754af9/csvdiff_1527844944335/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.6/site-packages/jsonschema/exceptions.py:30: ResourceWarning: unclosed file <_io.FileIO name=6 mode='rb+' closefd=True>
  super(_Error, self).__init__(
...........F.....
======================================================================
FAIL: test_patch_cmd_valid_args (__main__.TestCsvdiff)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_csvdiff.py", line 334, in test_patch_cmd_valid_args
    self.assertRecordsEqual(result.records, expected)
  File "tests/test_csvdiff.py", line 398, in assertRecordsEqual
    self.assertEqual(records.sort(lhs), records.sort(rhs))
AssertionError: Lists differ: [Orde[19 chars]), ('amount', '13'), ('name', 'fred')]), Order[244 chars]')])] != [Orde[19 chars]), ('name', 'fred'), ('amount', '13')]), Order[244 chars]')])]

First differing element 0:
OrderedDict([('id', '6'), ('amount', '13'), ('name', 'fred')])
OrderedDict([('id', '6'), ('name', 'fred'), ('amount', '13')])

compare CSV(with BOM format), and use ignore_columns for first column will prompt error sequence[key].pop(i)

error message:
Traceback (most recent call last): File "C:\Python27\Scripts\csvdiff-script.py", line 9, in load_entry_point('csvdiff==0.3.1', 'console_scripts', 'csvdiff')() File "C:\Python27\lib\site-packages\click\core.py", line 716, in call return self.main(*args, **kwargs) File "C:\Python27\lib\site-packages\click\core.py", line 696, in main rv = self.invoke(ctx) File "C:\Python27\lib\site-packages\click\core.py", line 889, in invoke return ctx.invoke(self.callback, **ctx.params) File "C:\Python27\lib\site-packages\click\core.py", line 534, in invoke return callback(*args, **kwargs) File "C:\Python27\lib\site-packages\csvdiff-0.3.1-py2.7.egg\csvdiff_init_.py", line 151, in csvdiff_cmd sep=sep, ignored_columns=ignore_columns) File "C:\Python27\lib\site-packages\csvdiff-0.3.1-py2.7.egg\csvdiff_init_.py", line 181, in _diff_and_summarize diff = patch.create(from_records, to_records, index_columns, ignored_columns) File "C:\Python27\lib\site-packages\csvdiff-0.3.1-py2.7.egg\csvdiff\patch.py", line 208, in create from_indexed = records.filter_ignored(from_indexed, ignore_columns) File "C:\Python27\lib\site-packages\csvdiff-0.3.1-py2.7.egg\csvdiff\records.py", line 52, in filter_ignored sequence[key].pop(i) KeyError: u'aa'

error prompt from sequence[key].pop(i), need some operation for BOM CSV compare as following
sequence[key].pop(i.encode('utf-8-sig'))

enhancement to use column numbers rather than column names

If there are files that don't have a column header it would help to be able to specify the key and ignored columns by column number e.g., 0,3,6. This can be simulated by a wrapper script that adds out a top line 0,1,2,3,... into the two files. It would be helpful if cvsdiff supported this directly to avoid having to insert header lines into both files.

Export to xlsx using XlsxWriter

Would be nice to have support export to XLS, csv or plain DF like this, I'll try to contribute to that myself:

AttributeError: module 'click' has no attribute 'ParamType'

Details here. I am sure this is just some setup problem with my environment but just don't know where to start looking

accounting for columns

It doesn't look like you can execute diffs when you've added or removed a column from a csv unless I am missing something. Perhaps this would be useful to implement? (Happy to help!)

Error when patch and source files are almost empty

Hello,

First, thank you for writing this tool. Very useful.

I got this error:

Traceback (most recent call last):
  File "/home/user/.local/bin/csvpatch", line 11, in <module>
    sys.exit(csvpatch_cmd())
  File "/home/user/.local/lib/python3.4/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/.local/lib/python3.4/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/user/.local/lib/python3.4/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/.local/lib/python3.4/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/user/.local/lib/python3.4/site-packages/csvdiff/__init__.py", line 218, in csvpatch_cmd
    patch_file(patch_stream, fromcsv_stream, tocsv_stream, strict=strict)
  File "/home/user/.local/lib/python3.4/site-packages/csvdiff/__init__.py", line 69, in patch_file
    fieldnames = from_records.fieldnames
AttributeError: 'generator' object has no attribute 'fieldnames'

when applying an empty patch file

{
  "_index": [
    "W",
    "B",
    "B2"
  ],
  "added": [],
  "changed": [],
  "removed": []
}

to an almost empty document:

"UE","W","B","B2","B3","UB",

diff csv files encode with utf8

Traceback (most recent call last):
File "C:/Users/firsi/PycharmProjects/sql_compare/operation.py", line 64, in
compare_common(db_list1, db_list2)
File "C:/Users/firsi/PycharmProjects/sql_compare/operation.py", line 54, in compare_common
diff = csvdiff.diff_files(file1, file2, [(index.split(',')[0])])
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff_init_.py", line 44, in diff_files
ignore_columns=ignored_columns)
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\patch.py", line 204, in create
from_indexed = records.index(from_records, index_columns)
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 53, in index
for r in record_seq
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 51, in
obj = {
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 38, in iter
for lineno, r in enumerate(self.reader, 2):
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\csv.py", line 111, in next
self.fieldnames
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\csv.py", line 98, in fieldnames
self._fieldnames = next(self.reader)
UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 85: illegal multibyte sequence

when I diff two files I write with utf8, pycharm raise this error

provide a template to issues so that people supply examples

This tool looks just what we need so I thought to kick the tires and look at the issue to understand any limitations. I noticed that folks are not reporting potential bugs with sample code. It is possible to configure the github repo with a template for issues. In that you can request that they supply a minimal example of the files to compare and the commandline. For example you can show them this sort of output as a valid bug report for folk to post:

± |master ?:27 ✗| → head a.csv b.csv 
==> a.csv <==
id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10

==> b.csv <==
id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10

 2019-05-22 05:49:29 ⌚  |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| → csvdiff --style=summary id a.csv b.csv 
files are identical

Test fails in console but not in tox

In a recent separator change, tox didn't fail, but running python2.7 setup.py test correctly found breaking tests.

What's the difference between the two, and how can we make sure that tox fails when it's supposed to?

Reference version where tox should fail: 0a31d59

Can you add a manpage?

Can you please add a manpage?

I am trying to package csvdiff in Fedora and executables files under /urs/bin need to have a man page

There is a example than can be usefull in

https://bugzilla.redhat.com/attachment.cgi?id=967892

Regards

Proper way to get the diff in a CSV file

Hi,

I like your tool!

How can I diff two files and export the difference in a CSV file (without patching any file)?

Create diff.csv from diff.json

Thanks for this library! However I think there is currently no way to just output the diff as csv? Something like csvpatch --input=diff.json --output=diff.csv. Let me know if there is a way! :)

Not able to use multiple keys

not getting option to use multiple keys'-k':

Usage: csvdiff [OPTIONS] INDEX_COLUMNS FROM_CSV TO_CSV

Compare two csv files to see what rows differ between them. The files are
each expected to have a header row, and for each row to be uniquely
identified by one or more indexing columns.

Options:
--style [compact|pretty|summary]
Instead of the default compact output,
pretty-print or give a summary instead
-o, --output PATH Output to a file instead of stdout
-q, --quiet Don't output anything, just use exit codes
--sep TEXT Separator to use between fields [default:
comma]
-i, --ignore-columns CSV a comma seperated list of columns to ignore
from the comparison
--significance INTEGER Ignore numeric changes less than this
number of significant figures
--help Show this message and exit.

Export to YML

I have been using this tool a lot for reconciliation stuff in accounting, when I want to share this with non-techies YML is a really nice format that can be easily converted to a readable, presentable stuff. Would be nice to have support for this, I'll try to contribute to that myself.

Something like this:

Can you provide an example to use multiple ignore columns?

I have been trying to make the command csvdiff work with multiple ignore columns but it looks like it is not working. Can you please provide an example of syntax? I have tried multiple styles and it seems to be not working. Please advise. Thanks.

test fails

The included tests cases are failing in Fedora 24 with python3.5, failied build is:

http://koji.fedoraproject.org/koji/taskinfo?taskID=13128669

Traceback is:

/usr/bin/python3 setup.py test
running test
running egg_info
writing dependency_links to csvdiff.egg-info/dependency_links.txt
writing csvdiff.egg-info/PKG-INFO
writing requirements to csvdiff.egg-info/requires.txt
writing entry points to csvdiff.egg-info/entry_points.txt
writing top-level names to csvdiff.egg-info/top_level.txt
reading manifest file 'csvdiff.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'csvdiff.egg-info/SOURCES.txt'
running build_ext
tests (unittest.loader._FailedTest) ... ERROR

ERROR: tests (unittest.loader._FailedTest)

ImportError: Failed to import test module: tests
Traceback (most recent call last):
File "/usr/lib64/python3.5/unittest/loader.py", line 153, in loadTestsFromName
module = import(module_name)
ImportError: No module named 'tests'
Ran 1 test in 0.000s
FAILED (errors=1)
RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.oCPkDm (%check)
Bad exit status from /var/tmp/rpm-tmp.oCPkDm (%check)
Child return code was: 1
EXCEPTION: Command failed. See logs for output.
bash --login -c /usr/bin/rpmbuild -bb --target noarch --nodeps /builddir/build/SPECS/csvdiff.spec
Traceback (most recent call last):
File "/usr/lib/python3.4/site-packages/mockbuild/trace_decorator.py", line 84, in trace
result = func(_args, *_kw)
File "/usr/lib/python3.4/site-packages/mockbuild/util.py", line 526, in do
raise exception.Error("Command failed. See logs for output.\n # %s" % (command,), child.returncode)
mockbuild.exception.Error: Command failed. See logs for output.
bash --login -c /usr/bin/rpmbuild -bb --target noarch --nodeps /builddir/build/SPECS/csvdiff.spec

diff = csvdiff.diff_files('output.txt', 'input.txt',[])

I am trying to run in basic form this module.

import csvdiff


diff = csvdiff.diff_files('output.txt', 'input.txt',[])


print(diff)

I am a littel bit confused. i am getting following errors:

"/python/test.py"
Traceback (most recent call last):
File "XXXX/python/test.py", line 4, in
diff = csvdiff.diff_files('output.txt', 'input.txt',[])
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff_init_.py", line 44, in diff_files
ignore_columns=ignored_columns)
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff\patch.py", line 211, in create
return create_indexed(from_indexed, to_indexed, index_columns)
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff\patch.py", line 222, in create_indexed
index_columns)
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff\patch.py", line 252, in _assemble
key=change_key)
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff\patch.py", line 251, in
for k in changed),
File "C:\Users\DDDDD\AppData\Roaming\Python\Python37\site-packages\csvdiff\patch.py", line 264, in record_diff
from = lhs[k]
KeyError: 'CAFEE01'

Process finished with exit code 1

Json Report

This is more of a enhancement. Providing the capability to only have one type of json. For example, only deleted, added, or change.

larsyencken / csvdiff Goto Github PK

csvdiff's People

Contributors

Stargazers

Watchers

Forkers

csvdiff's Issues

Summary

Minimal Reproducible Example

ERROR: tests (unittest.loader._FailedTest)

ImportError: Failed to import test module: tests Traceback (most recent call last): File "/usr/lib64/python3.5/unittest/loader.py", line 153, in loadTestsFromName module = import(module_name) ImportError: No module named 'tests'

bash --login -c /usr/bin/rpmbuild -bb --target noarch --nodeps /builddir/build/SPECS/csvdiff.spec

bash --login -c /usr/bin/rpmbuild -bb --target noarch --nodeps /builddir/build/SPECS/csvdiff.spec

Recommend Projects

Recommend Topics

Recommend Org

ImportError: Failed to import test module: tests
Traceback (most recent call last):
File "/usr/lib64/python3.5/unittest/loader.py", line 153, in loadTestsFromName
module = import(module_name)
ImportError: No module named 'tests'