changaco / python-libarchive-c Goto Github PK

View Code? Open in Web Editor NEW

67.0 4.0 37.0 280 KB

Python interface to libarchive

License: Other

Python 100.00%

python libarchive ctypes

python-libarchive-c's Introduction

A Python interface to libarchive. It uses the standard ctypes module to dynamically load and access the C library.

Installation

pip install libarchive-c

Compatibility

python

python-libarchive-c is currently tested with python 3.8, 3.9, 3.10 and 3.11.

If you find an incompatibility with older versions you can send us a small patch, but we won't accept big changes.

libarchive

python-libarchive-c may not work properly with obsolete versions of libarchive such as the ones included in MacOS. In that case you can install a recent version of libarchive (e.g. with brew install libarchive on MacOS) and use the LIBARCHIVE environment variable to point python-libarchive-c to it:

export LIBARCHIVE=/usr/local/Cellar/libarchive/3.3.3/lib/libarchive.13.dylib

Usage

Import:

import libarchive

Extracting archives

To extract an archive, use the extract_file function:

os.chdir('/path/to/target/directory')
libarchive.extract_file('test.zip')

Alternatively, the extract_memory function can be used to extract from a buffer, and extract_fd from a file descriptor.

The extract_* functions all have an integer flags argument which is passed directly to the C function archive_write_disk_set_options(). You can import the EXTRACT_* constants from the libarchive.extract module and see the official description of each flag in the archive_write_disk(3) man page.

By default, when the flags argument is None, the SECURE_NODOTDOT, SECURE_NOABSOLUTEPATHS and SECURE_SYMLINKS flags are passed to libarchive, unless the current directory is the root (/).

Reading archives

To read an archive, use the file_reader function:

with libarchive.file_reader('test.7z') as archive:
    for entry in archive:
        for block in entry.get_blocks():
            ...

Alternatively, the memory_reader function can be used to read from a buffer, fd_reader from a file descriptor, stream_reader from a stream object (which must support the standard readinto method), and custom_reader from anywhere using callbacks.

To learn about the attributes of the entry object, see the libarchive/entry.py source code or run help(libarchive.entry.ArchiveEntry) in a Python shell.

Displaying progress

If your program processes large archives, you can keep track of its progress with the bytes_read attribute. Here's an example of a progress bar using tqdm:

with tqdm(total=os.stat(archive_path).st_size, unit='bytes') as pbar, \
     libarchive.file_reader(archive_path) as archive:
    for entry in archive:
        ...
        pbar.update(archive.bytes_read - pbar.n)

Creating archives

To create an archive, use the file_writer function:

from libarchive.entry import FileType

with libarchive.file_writer('test.tar.gz', 'ustar', 'gzip') as archive:
    # Add the `libarchive/` directory and everything in it (recursively),
    # then the `README.rst` file.
    archive.add_files('libarchive/', 'README.rst')
    # Add a regular file defined from scratch.
    data = b'foobar'
    archive.add_file_from_memory('../escape-test', len(data), data)
    # Add a directory defined from scratch.
    early_epoch = (42, 42)  # 1970-01-01 00:00:42.000000042
    archive.add_file_from_memory(
        'metadata-test', 0, b'',
        filetype=FileType.DIRECTORY, permission=0o755, uid=4242, gid=4242,
        atime=early_epoch, mtime=early_epoch, ctime=early_epoch, birthtime=early_epoch,
    )

Alternatively, the memory_writer function can be used to write to a memory buffer, fd_writer to a file descriptor, and custom_writer to a callback function.

For each of those functions, the mandatory second argument is the archive format, and the optional third argument is the compression format (called “filter” in libarchive). The acceptable values are listed in libarchive.ffi.WRITE_FORMATS and libarchive.ffi.WRITE_FILTERS.

File metadata codecs

By default, UTF-8 is used to read and write file attributes from and to archives. A different codec can be specified through the header_codec arguments of the *_reader and *_writer functions. Example:

with libarchive.file_writer('test.tar', 'ustar', header_codec='cp037') as archive:
    ...
with file_reader('test.tar', header_codec='cp037') as archive:
    ...

In addition to file paths (pathname and linkpath), the specified codec is used to encode and decode user and group names (uname and gname).

License

CC0 Public Domain Dedication

python-libarchive-c's People

Contributors

Stargazers

Watchers

python-libarchive-c's Issues

Notes about read/write formats & filters availability should be info, not warnings

Currently a warning level log message is produced for each read/write format and filter that's not available. This adds to log clutter and can trigger automated alerts. These should be informational messages.

Support reading CRC32 from header

Would it be possible to support reading the CRC32 hash for the contained files from headers that contain them, such as those for 7-zip? Currently the entries produced by file_reader() do not seem to do so.

Thank you.

Failing to import on Windows (using a a mingw build of libarchive)

Here is the trace:

>>> import libarchive.ffi
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\tst\lib\site-packages\libarchive\__init__.py", line 6, in <module>
    from .entry import ArchiveEntry
  File "c:\tst\lib\site-packages\libarchive\entry.py", line 11, in <module>
    from . import ffi
  File "c:\tst\lib\site-packages\libarchive\ffi.py", line 21, in <module>
    page_size = pythonapi.getpagesize()
  File "c:\Python27\Lib\ctypes\__init__.py", line 378, in __getattr__
    func = self.__getitem__(name)
  File "c:\Python27\Lib\ctypes\__init__.py", line 383, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: function 'getpagesize' not found

This fails here:
https://github.com/Changaco/python-libarchive-c/blob/master/libarchive/ffi.py#L21
getpagesize() is not a CPython function on non-posix builds.

I think the alternative to get the page size that is portable is to use mmap:

import mmap
mmap.PAGESIZE

consider adding LICENSE or COPYING file to the repository

Hi, I'm a package maintainer for a distribution of your great library. I would like to ask if its possible to add a LICENSE or COPYING file to the root of this project (which is very very common). This would be great so me and other maintainers may put the license besides the released version.

cheers and thanks for consideration

Add text of the LGPL to the dist

The license requires users to distribute it together with the code. It would make it easier to include the text (in the sdist and the wheel) so that everyone can comply with zero effort.

Modifying the path of an entry before adding it to an archive

How do I write to an archives specific folder?

For example:

myfile = r"c:\temp\amazing\location\data.txt"

I want this file path to be in my archive as:

PSEUDO CODE

archive_location = "/data/data.txt"

sd_output = r'c:\temp\amazing\archive.7z'
if os.path.isfile(sd_output):
    os.remove(sd_output)
with libarchive.file_writer(sd_output, '7zip') as archive:
    archive.add_files(myfile, archive_location) # doesn't work

Can I specify that? The documentation seems a bit vague.

Thank you

Tests with unicode path entires are failing

These tests are failing for me:

test_check_archiveentry_using_python_testtar
test_check_archiveentry_with_unicode_and_binary_entries_zip
test_check_archiveentry_with_unicode_entries_and_name_zip

A few details about my setup:

Mac OSX 10.13
Python 3.7.2
libarchive 3.3.3

I included the full log at the bottom of this issue.

I did a little digging, and while the paths render equivalently and are unicode-equivalent, the tests are actually looking for a slightly different byte sequence than the one present in the file. Below is a test I wrote that demonstrates the problem. (To run test_load, add it to one of the python files in tests/ and run py.test.)

def test_load():
    """Exhibits why the tests are busted."""

    # These byte sequences are unicode-equivalent, but not byte-for-byte
    # equivalent.
    good_sequence = b"u\xcc\x88" # present in unicode.zip
    wrong_sequence = b"\xc3\xbc" # present in unicode.zip.json

    good_str = good_sequence.decode("UTF-8")
    wrong_str = wrong_sequence.decode("UTF-8")

    assert good_str != wrong_str
    for mode in ("NFC", "NFD"):
        assert unicodedata.normalize(mode, good_str) == unicodedata.normalize(mode, wrong_str)

    # This file has the good sequence, not the bad one
    with open(join(data_dir, "unicode.zip"), "rb") as f:
        zipfile_bytes = f.read()
    assert good_sequence in zipfile_bytes
    assert wrong_sequence not in zipfile_bytes

    # Oops! This fails. The JSON has the bad sequence (not the good one)
    with open(join(data_dir, "unicode.zip.json"), encoding='UTF-8') as ex:
        x = json.load(ex)
    encoded_json_bytes = repr(x).encode("UTF-8")
    assert good_sequence in encoded_json_bytes
    assert wrong_sequence not in encoded_json_bytes

=================================== FAILURES ===================================
__________________________________ test_load ___________________________________

    def test_load():
        """Exhibits why the tests are busted."""
    
        # These byte sequences are unicode-equivalent, but not byte-for-byte
        # equivalent.
        good_sequence = b"u\xcc\x88" # present in unicode.zip
        wrong_sequence = b"\xc3\xbc" # present in unicode.zip.json
    
        good_str = good_sequence.decode("UTF-8")
        wrong_str = wrong_sequence.decode("UTF-8")
    
        assert good_str != wrong_str
        for mode in ("NFC", "NFD"):
            assert unicodedata.normalize(mode, good_str) == unicodedata.normalize(mode, wrong_str)
    
        # This file has the good sequence, not the bad one
        with open(join(data_dir, "unicode.zip"), "rb") as f:
            zipfile_bytes = f.read()
        assert good_sequence in zipfile_bytes
        assert wrong_sequence not in zipfile_bytes
    
        # Oops! This fails. The JSON has the bad sequence (not the good one)
        with open(join(data_dir, "unicode.zip.json"), encoding='UTF-8') as ex:
            x = json.load(ex)
        encoded_json_bytes = repr(x).encode("UTF-8")
>       assert good_sequence in encoded_json_bytes
E       assert b'u\xcc\x88' in b"[{'gid': 1000, 'isblk': False, 'ischr': False, 'isdev': False, 'isdir': True, 'isfifo': False, 'islnk': False, 'isre...e, 'linkpath': None, 'mode': 'rw-r--r--', 'mtime': 1268678259, 'path': 'a/gr\xc3\xbcn.png', 'size': 362, 'uid': 1000}]"

Log:

============================= test session starts ==============================
platform darwin -- Python 3.7.2, pytest-4.2.1, py-1.7.0, pluggy-0.8.1
rootdir: opensource/python-libarchive-c, inifile:
collected 29 items

tests/test_atime_mtime_ctime.py ........                                 [ 27%]
tests/test_convert.py .                                                  [ 31%]
tests/test_entry.py ..F.F.F                                              [ 55%]
tests/test_errors.py ....                                                [ 68%]
tests/test_rwx.py .......                                                [ 93%]
tests/test_security_flags.py ..                                          [100%]

=================================== FAILURES ===================================
_________________ test_check_archiveentry_using_python_testtar _________________

    def test_check_archiveentry_using_python_testtar():
>       check_entries(join(data_dir, 'testtar.tar'))

tests/test_entry.py:63: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

test_file = 'opensource/python-libarchive-c/tests/data/testtar.tar'
regen = False, ignore = []

    def check_entries(test_file, regen=False, ignore=''):
        ignore = ignore.split()
        fixture_file = test_file + '.json'
        if regen:
            entries = list(get_entries(test_file))
            with open(fixture_file, 'w', encoding='UTF-8') as ex:
                json.dump(entries, ex, indent=2, sort_keys=True)
        with open(fixture_file, encoding='UTF-8') as ex:
            expected = json.load(ex)
        actual = list(get_entries(test_file))
        for e1, e2 in zip(actual, expected):
            for key in ignore:
                e1.pop(key)
                e2.pop(key)
>           assert e1 == e2
E           AssertionError: assert {'gid': 100, ...': False, ...} == {'gid': 100, '...': False, ...}
E             Omitting 14 identical items, use -vv to show
E             Differing items:
E             {'path': 'pax/umlauts-ÄÖÜäöüß'} != {'path': 'pax/umlauts-ÄÖÜäöüß'}
E             Use -v to get the full diff

tests/test_entry.py:96: AssertionError
------------------------------ Captured log call -------------------------------
ffi.py                      88 WARNING  Pathname can't be converted from UTF-8 to current locale.
_________ test_check_archiveentry_with_unicode_and_binary_entries_zip __________

    def test_check_archiveentry_with_unicode_and_binary_entries_zip():
>       check_entries(join(data_dir, 'unicode.zip'))

tests/test_entry.py:71: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

test_file = 'opensource/python-libarchive-c/tests/data/unicode.zip'
regen = False, ignore = []

    def check_entries(test_file, regen=False, ignore=''):
        ignore = ignore.split()
        fixture_file = test_file + '.json'
        if regen:
            entries = list(get_entries(test_file))
            with open(fixture_file, 'w', encoding='UTF-8') as ex:
                json.dump(entries, ex, indent=2, sort_keys=True)
        with open(fixture_file, encoding='UTF-8') as ex:
            expected = json.load(ex)
        actual = list(get_entries(test_file))
        for e1, e2 in zip(actual, expected):
            for key in ignore:
                e1.pop(key)
                e2.pop(key)
>           assert e1 == e2
E           AssertionError: assert {'gid': 1000,...': False, ...} == {'gid': 1000, ...': False, ...}
E             Omitting 14 identical items, use -vv to show
E             Differing items:
E             {'path': 'a/grün.png'} != {'path': 'a/grün.png'}
E             Use -v to get the full diff

tests/test_entry.py:96: AssertionError
__________ test_check_archiveentry_with_unicode_entries_and_name_zip ___________

    def test_check_archiveentry_with_unicode_entries_and_name_zip():
>       check_entries(join(data_dir, '\ud504\ub85c\uadf8\ub7a8.zip'))

tests/test_entry.py:79: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

test_file = 'opensource/python-libarchive-c/tests/data/프로그램.zip'
regen = False, ignore = []

    def check_entries(test_file, regen=False, ignore=''):
        ignore = ignore.split()
        fixture_file = test_file + '.json'
        if regen:
            entries = list(get_entries(test_file))
            with open(fixture_file, 'w', encoding='UTF-8') as ex:
                json.dump(entries, ex, indent=2, sort_keys=True)
        with open(fixture_file, encoding='UTF-8') as ex:
            expected = json.load(ex)
        actual = list(get_entries(test_file))
        for e1, e2 in zip(actual, expected):
            for key in ignore:
                e1.pop(key)
                e2.pop(key)
>           assert e1 == e2
E           AssertionError: assert {'gid': 502, ...': False, ...} == {'gid': 502, '...': False, ...}
E             Omitting 14 identical items, use -vv to show
E             Differing items:
E             {'path': '프로그램.txt'} != {'path': '프로그램.txt'}
E             Use -v to get the full diff

tests/test_entry.py:96: AssertionError
=============================== warnings summary ===============================
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo
  opensource/python-libarchive-c/tests/__init__.py:86: DeprecationWarning: deprecated in favor of stat.filemode
    mode = tarfile.filemode(entry.mode)[1:]

-- Docs: https://docs.pytest.org/en/latest/warnings.html
=============== 3 failed, 26 passed, 13 warnings in 0.54 seconds ===============

Unable to read 7zip file

Using the same syntax as the example in your README,

with libarchive.file_reader('lpr_000b16a_e.7z') as archive:

I get:

ArchiveError: LZMA codec is unsupported (errno=-1, retcode=-30, archive_p=107949936)

I don't think this gets coverage in the tests, either. If the situation is indeed not platform-specific, updating the README.md would at least be needed.

libarchive version: 3.3.2
python-libarchive-c version: 2.7
OS: Windows 7
Python: 3.6
7-zip version: 16.04 # From which test file was created

Inconsistent behavior for encrypted rar/zip/7z

With a zip file you can only encrypt the entries' data - so you can still see all file names.
With rar and 7z (and maybe others) you can encrypt the data and - optionally - the file name.

Behavior when getting blocks for an entry via entry.get_blocks():

Encrypted zip:
- ERROR: Passphrase required for this entry (errno=-1, retcode=-25, archive_p=1880656)
Encrypted rar / file names not encrypted:
- NO ERROR. Returned encrypted bytes (I presume it's the encrypted data at least, did not verify)
Encrypted 7z / file names not encrypted:
- ERROR: The file content is encrypted, but currently not supported (errno=-1, retcode=-30, archive_p=6593280)

Encrypted rar / file names encrypted:
- ERROR. Fails when opening the archive with Encryption is not supported (errno=42, retcode=-30, archive_p=4559952)
Encrypted 7z / file names encrypted:
- ERROR. Fails when opening the archive but with different error than for rar: The archive header is encrypted, but currently not supported (errno=-1, retcode=-30, archive_p=5805136)

EDIT: It's probably some libarchive issue, yes?

Different mtime returned in different locales

See https://travis-ci.org/Changaco/python-libarchive-c/jobs/61875668#L832
The mtime returned is not the same on my machine as it is on the travis slave.
I would expect it to be UTC time or to honor some TZ env var.

ArchiveWrite.add_files() chokes on broken symlinks

Hello,

First, thanks for this wonderful project :)

I am trying to create a gzipped cpio_newc archive (Android ramdisk). The contents contain some root owned files so I am using sudo python3 to open the interpreter. The contents also include broken symlinks which need to remain intact as they are not broken once Android boots.

I am probably doing something wrong, but when I run the below command I get errors on the broken symlinks. It seems they are being followed but the destination does not exist.

Is there a way to say something like "follow_symlinks = False" so the broken symlinks are compressed with the archive but not followed?

Thanks for your time,
SuperR

sudo python3
[sudo] password for superr: 
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libarchive
>>> with libarchive.file_writer("ramdisk.gz", "cpio_newc", "gzip") as archive:
...     archive.add_files("ramdisk/")
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/superr/.local/lib/python3.5/site-packages/libarchive/write.py", line 64, in add_files
    with open(entry_sourcepath(entry_p), 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: b'ramdisk/sdcard'

Out of memory error when using libarchive.extract.extract_files with a list of multiple items

It appears that when you attempt to use python-libarchive-c to extract individual files, it crashes with an out of memory error when you attempt to use libarchive.extract.extract_files with a list containing more than one libarchive file entry object.

I've got a proof of concept python script here: https://gitlab.com/Conan_Kudo/pylibarchive-c-segfaultcase

I have logs to prove the error as well, using Fedora 24 packages and using the PyPI wheel: https://gitlab.com/Conan_Kudo/pylibarchive-c-segfaultcase/pipelines/4847862

How to make it work on Windows?

Hi,
I want to use this in my code but getting problems to make it work on windows with python 3.4
I've installed it with pip without a problem but whenever I use import libarchive it fails with:
OSError: [WinError 126] The specified module could not be found
This is coming from ffi.py from the code below:

libarchive_path = os.environ.get('LIBARCHIVE') or find_library('archive') 
libarchive = ctypes.cdll.LoadLibrary(libarchive_path)

I've never used ctypes before but if I understand correctly it is looking for external DLL. I found and installed http://gnuwin32.sourceforge.net/packages/libarchive.htm also I've added C:\Program Files (x86)\GnuWin32\bin to my %PATH% in environmental variables but it still cannot load the module. As it does not give me the name of the module, I'm not sure what module it is looking for. What am I missing?

How to set the keys that are written in the mtree format?

by default, when we use the mtree output format as:

import libarchive

with libarchive.fd_writer(1, 'mtree') as archive:
    archive.add_files('/home/vagrant/.config')

it produces entries like:

#mtree
./home/vagrant/.config time=1476796957.126414070 mode=700 gid=1000 uid=1000 type=dir
./home/vagrant/.config/user-dirs.dirs time=1476782120.828416069 mode=600 gid=1000 uid=1000 type=file size=632

but I need to change which key entries are actually written. that is, instead of time, mode, gid, uid keys I need to have the mode, gname, uname and type keys, but how can I do that from python?

As a reference, the libarchive C library has these default keys that I need to change somehow:

#define DEFAULT_KEYS    (F_DEV | F_FLAGS | F_GID | F_GNAME | F_SLINK | F_MODE\
             | F_NLINK | F_SIZE | F_TIME | F_TYPE | F_UID\
             | F_UNAME)

Inconsistent mode returned across OSes

See https://travis-ci.org/Changaco/python-libarchive-c/jobs/61875668#L847
Windows libarchive 3.1.2 returns rw-rw-rw- and Linux Ubuntu 12.x (though this is libarchive12) returns rwxrwxrwx
This could be either :

a bug in the binding code
a bug in upstream any version
a bug in libarchive12

Ability to decrypt zip and 7z files

Is it possible to decrypt and decompress zip and 7z files? There is a libarchive function archive_read_add_passphrase(struct archive *_a, const char *passphrase) that allows a caller to provide a password. Is that something you would consider adding/supporting?

Ship with vendored and tested libarchive shared lib

I do this on my side and would be happy to contribute this. I have win/lin/mac builds on 32 and 64 bits x86 archs.

Adding support for atime and mtime

Hey guys,

I have tried to add ctime and atime field support to the library. Here you may take a look at my efforts: https://github.com/zeroos/python-libarchive-c/tree/atime-and-ctime

Often the solution seems to work, but, among others, the test_convert test fails. I have tried to understand the issue here and it seems that the memory_writer(buf2, 'zip') call is loosing the information about the created/modified date. It looks a little bit as an upstream issue. Could you please confirm that and maybe provide some broader image, why those fields were skipped in the first place?

Best wishes,
Michał

Add tox support

I noticed that in travis.yml tests are run 'manually'. I propose to add tox. In my local instance I use following tox.ini:

[tox]
envlist=py27,py34

[testenv]
install_command=pip install {opts} {packages}
commands=
    py.test -v --color=yes --cov=libarchive --cov-config .coveragerc --cov-report term-missing {toxinidir}/tests {posargs}
deps=
    pytest
    pytest-cov
    six

new_archive_write should probably call write_fail when an exception is raised

From #50 (comment)

Something like that:

diff --git a/libarchive/write.py b/libarchive/write.py
index 35eb8f5..062c384 100644
--- a/libarchive/write.py
+++ b/libarchive/write.py
@@ -116,9 +116,12 @@ def new_archive_write(format_name, filter_name=None):
         getattr(ffi, 'write_add_filter_'+filter_name)(archive_p)
     try:
         yield archive_p
-    finally:
         ffi.write_close(archive_p)
         ffi.write_free(archive_p)
+    except:
+        ffi.write_fail(archive_p)
+        ffi.write_free(archive_p)
+        raise
 
 
 @contextmanager

Need a test to confirm that it's the right thing to do. I don't have time to take of it right now.

License audit - GPL2->LGPL2->CC0

While auditing the license changes that have occurred, we have come across some inconsistencies that need to be resolved.

python-libarchive-c began life as https://github.com/dsoprea/PyEasyArchive, which was licensed GLP2 by the author of PyEasyArchive, dsoprea. The GPL2 license was added to python-libarchive-c by dsoprea here: 55235ba
The GPL2 license was removed from python-libarchive-c in commit 7913bba by Changaco, who was not the original author of the code, and was not authorised by the GPL2 to do so. At this point, python-libarchive-c was being distributed in violation of the GPL.
This problem was brought to the attention of the python-libarchive-c project in the issue #7, and an attempt was made to solve the issue with #24, where Changaco switched licenses from the GPL2 to the LGPL2, and then immediately changing the licence to CC0.
At the time of the license change, Changaco was not the copyright holder of the code, that was held by dsoprea. Section 2(b) of GPL2 states: "You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.". Given that Changaco's rights to the code were limited to those limited by GPL2, Changaco was not allowed to change the license to LGPL2, or to CC0. After 7913bba, any users of this library are no longer license compliant.

To fix this, the copyright holder https://github.com/dsoprea needs to authorise the license change made in 7913bba and then in 2a6207d.

Add convenience functions for file types, similar to the std lib tarfile.tarinfo

It would be great to have convenience functions like in the std lib tarfile.tarinfo for file types rather than to handle flags:
See https://hg.python.org/cpython/file/192f9efe4a38/Lib/tarfile.py#l1459

Test failing with libarchive 3.2

Since libarchive 3.2, the test test_check_archiveentry_with_unicode_and_binary_entries_zip2 is failing:

collected 16 items 

tests/test_convert.py::test_convert PASSED
tests/test_entry.py::test_entry_properties PASSED
tests/test_entry.py::test_check_ArchiveEntry_against_TarInfo PASSED
tests/test_entry.py::test_check_archiveentry_using_python_testtar PASSED
tests/test_entry.py::test_check_archiveentry_with_unicode_and_binary_entries_tar PASSED
tests/test_entry.py::test_check_archiveentry_with_unicode_and_binary_entries_zip PASSED
tests/test_entry.py::test_check_archiveentry_with_unicode_and_binary_entries_zip2 FAILED
tests/test_entry.py::test_check_archiveentry_with_unicode_entries_and_name_zip PASSED
tests/test_errors.py::test_add_files_nonexistent PASSED
tests/test_errors.py::test_check_int_logs_warnings PASSED
tests/test_errors.py::test_check_null PASSED
tests/test_errors.py::test_error_string_decoding PASSED
tests/test_rwx.py::test_buffers PASSED
tests/test_rwx.py::test_fd PASSED
tests/test_rwx.py::test_files PASSED
tests/test_rwx.py::test_custom_writer PASSED

================================================================================================================== FAILURES ===================================================================================================================
________________________________________________________________________________________ test_check_archiveentry_with_unicode_and_binary_entries_zip2 _________________________________________________________________________________________

    def test_check_archiveentry_with_unicode_and_binary_entries_zip2():
>       check_entries(join(data_dir, 'unicode2.zip'))

tests/test_entry.py:76: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

test_file = '/tmp/debian/archivec/python-libarchive-c-2.1/tests/data/unicode2.zip', regen = False

    def check_entries(test_file, regen=False):
        fixture_file = test_file + '.json'
        if regen:
            entries = list(get_entries(test_file))
            with open(fixture_file, 'w', encoding='UTF-8') as ex:
                json.dump(entries, ex, indent=2)
        with open(fixture_file, encoding='UTF-8') as ex:
            expected = json.load(ex)
        actual = list(get_entries(test_file))
        for e1, e2 in zip(actual, expected):
>           assert e1 == e2
E           assert {'isblk': Fal...r': True, ...} == {'isblk': Fals...r': True, ...}
E             Common items:
E             {'isblk': False,
E              'ischr': False,
E              'isdev': False,
E              'isdir': True,
E              'isfifo': False,
E              'islnk': False,
E              'isreg': False,
E              'issym': False,
E              'linkpath': None,
E              'mtime': 1381752672,
E              'path': 'a/',
E              'size': 0}
E             Differing items:
E             {'mode': 'rwxrwxr-x'} != {'mode': 'rwxrwxrwx'}
E             Full diff:
E             {'isblk': False,
E             'ischr': False,
E             'isdev': False,
E             'isdir': True,
E             'isfifo': False,
E             'islnk': False,
E             'isreg': False,
E             'issym': False,
E             'linkpath': None,
E             -  'mode': 'rwxrwxr-x',
E             ?                  ^
E             +  'mode': 'rwxrwxrwx',
E             ?                  ^
E             'mtime': 1381752672,
E             'path': 'a/',
E             'size': 0}

tests/test_entry.py:93: AssertionError
===================================================================================================== 1 failed, 15 passed in 0.22 seconds =====================================================================================================

file permissions issue

With the 2.3 tarball from pypi, the file permissions of the files in site-packages/libarchive_c-2.3-py3.5.egg-info/* are incorrect (600), which breaks installation of packages which want to use this but are not run by root.
I'm not sure where these permissions come from.
It was the same with 2.2.

Adding LICENSE.md to MANIFEST.in

Would be nice to include LICENSE.md to MANIFEST.in so that the license file ships with sdists and other packages. Bonus points if it is added to setup.cfg for wheels.

Detect valid archives

I'm in the process of migrating from python2 using python-libarchive to python3 using python-libarchive-c. In all, the process has been quite easy (and many thanks to you folks for keeping a similar interface). However, I have run in to a bit of a snag.

I've been detecting whether or not a file is a legitimate archive based on trying to open it and detect errors. This works well for most files, but has been failing when I feed it something that looks like a /etc/passwd file. Previously, when libarchive returns Missing type keyword in mtree specification, the old python-libarchive would raise a ValueError. Now, this simply produces a Warning to the logger and the process continues.

Is there an easy way to make this produce an error that I can catch or otherwise detect invalid archives?

Keeping an archive open without using a context generator?

Hi,

First, thanks for all of your work on this project.

I have an existing codebase in which I want to use python-libarchive-c. The code isn't structured to allow using generators, being based around the tarfile module originally.

I managed to hack up a way to get what I want (criticism very welcome!)

It requires exporting ArchiveWrite from libarchive/init.py. Then in your own code you can do the following:

class NewArchiveWrite(libarchive.ArchiveWrite):
    def __init__(self, filename, format_name, filter_name=None, options=''):
        from libarchive import ffi
        self.filename = filename
        self.archive_p = ffi.write_new()
        libarchive.ArchiveWrite.__init__(self, self.archive_p)
        getattr(ffi, 'write_set_format_' + format_name)(self.archive_p)
        if filter_name:
            getattr(ffi, 'write_add_filter_' + filter_name)(self.archive_p)
        if options:
            if not isinstance(options, bytes):
                options = options.encode('utf-8')
            ffi.write_set_options(self.archive_p, options)
        ffi.write_open_filename_w(self.archive_p, self.filename)

    def __del__(self):
        from libarchive import ffi
        ffi.write_close(self.archive_p)
        ffi.write_free(self.archive_p)

.. then use this class. What do you think? Would a PR to export ArchiveWrite be acceptable? Is there a better way to achieve what I want?

Next release?

Hi, thanks for this really useful wrapper.

I was wondering when the next release is planned. There's been a least one important security related patch that I would love to see in an official release: 4784e43

Archive entry paths aren't always decoded (_getpathname can return bytes)

When an archive contains filenames with utf-8 characters, ffi.entry_pathname_w(self._entry_p) will return None, so ffi.entry_pathname(self._entry_p) gets invoked and that returns bytes

It would be much more convenient if the function always returned a unicode str

Unclear error message on Windows when libarchive can't be located

OS: windows 10
python version: 3.5.2

I install it via pip install libarchive-c , here is the Error traceback:

(C:\Users\**\Miniconda3) C:\Users\**>python -c "import libarchive"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\**\Miniconda3\lib\site-packages\libarchive\__init__.py", line 1, in <module>
    from .entry import ArchiveEntry
  File "C:\Users\**\Miniconda3\lib\site-packages\libarchive\entry.py", line 6, in <module>
    from . import ffi
  File "C:\Users\**\Miniconda3\lib\site-packages\libarchive\ffi.py", line 21, in <module>
    libarchive = ctypes.cdll.LoadLibrary(libarchive_path)
  File "C:\Users\**\Miniconda3\lib\ctypes\__init__.py", line 425, in LoadLibrary
    return self._dlltype(name)
  File "C:\Users\**\Miniconda3\lib\ctypes\__init__.py", line 347, in __init__
    self._handle = _dlopen(self._name, mode)
TypeError: bad argument type for built-in operation

Support a configurable way to load libarchive

The built-in ctypes functions can fall short to find libarchive consistently if this is not in canonical locations
It would be great to have a way to pass well known paths to installed libarchives rather than having the _LIB_FILEPATH and ctypes.cdll.LoadLibrary created/called at import time which makes it rather hard to monkey patch.

Add test slaves for Mac and Windows and possibly other OSes

Travis supports Mac slaves.
Appevyor offers Windows slaves.
The OpenSuse build service support most major Linux distro and arches.
CDash (also used for the upstream libarchve CI) seem to offer mac/Gentoo/OpenBSd and win

Getting at least a Mac and Windows would help a lot.

missing git release tag for 2.3

Hi, I have noticed that pypi contains a 2.3 version but there was no git tag for 2.3 published. I'm the maintainer for a distribution and keeping my watch on git tags (which is lot more convenient). Also it would be nice to always have the git tags (and therefor git source tarballs on github) in sync with all released versions 😄

hope to see a tag fast,
cheers

add_file_from_memory() should allow timestamp setting

When calling add_file_from_memory() one cannot set the timestamps for a given file. This leads to unexpected "epoch zero" timestamps (~ 1979-12-31, 1980-01-01) in the resulting archive.

How to construct an archive entry from scratch?

For some purposes it is useful to create an archive entry from scratch - e.g. device files for a linux root filesystem archive.

It seems that the necessary functions are not mapped with python-libarchive-c: From the examples and from reading the source it is unclear to me how to e.g. set the filename or the data for an entry not read from an existing archive.

If I failed to spot that this becomes a request for better documentation :)

Thanks,
Simon

Unrecognized archive format on 7z archive

I've used this in the past to be able to extract TBT updates. At least this recent one it's stopped working.

Here is the (simple) script I'm using just to test the parsing:

#!/usr/bin/python3
import libarchive

filename='Intel_TBT3_FW_UPDATE_NVM28_3H3DP_A01_4.28.06.001.exe'

with libarchive.file_reader(filename) as e:
    for entry in e:
        print(str(entry))

However I just noticed it failing on some recent ones.

Traceback (most recent call last):
  File "./test.py", line 6, in <module>
    with libarchive.file_reader(filename) as e:
  File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
    return next(self.gen)
  File "/home/supermario/.local/lib/python3.5/site-packages/libarchive/read.py", line 68, in file_reader
    ffi.read_open_filename_w(archive_p, path, block_size)
  File "/home/supermario/.local/lib/python3.5/site-packages/libarchive/ffi.py", line 88, in check_int
    raise archive_error(args[0], retcode)
  File "/home/supermario/.local/lib/python3.5/site-packages/libarchive/ffi.py", line 72, in archive_error
    raise ArchiveError(msg, errno(archive_p), retcode, archive_p)
libarchive.exception.ArchiveError: Unrecognized archive format (errno=84, retcode=-30, archive_p=45795904)

Whereas it works with 7z:

# 7z x Intel_TBT3_FW_UPDATE_NVM28_3H3DP_A01_4.28.06.001.exe 

7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,16 CPUs)

Processing archive: Intel_TBT3_FW_UPDATE_NVM28_3H3DP_A01_4.28.06.001.exe

Extracting  Intel/0x07E6_secure.bin
Extracting  Intel/FwUpdateCmd.exe
Extracting  Intel/FwUpdateApi.dll
Extracting  Intel

Everything is Ok

Folders: 1
Files: 3
Size:       342528
Compressed: 1102184

The file is available here: http://www.dell.com/support/home/us/en/04/drivers/driversdetails?driverId=3H3DP

I have libarchive 3.2.2-3.1 from here
I have libarchive-c 2.7 from pip3 install libarchive-c

Adding file to archive with absolute path

Hello,
I'm in a situation which I have to add files to an archive with absolute path.
Example:

I've tmpDir which contains usr/local/bin/a.bin, usr/share/doc/etc.txt, so the full path is tmpDir/usr/local/bin/a.bin, etc etc.

My goal is to add the file to the archive with the path /usr/local/bin/a.bin, etc.

I'm able to add the files as ./usr/local/bin/a.bin (note the dot at the beginning of the string), but when I extract the files it extracts them to the current folder, not to the absolute path.

Here's the actual code:

with libarchive.file_writer(tmpArchive.name, format, filter) as archive:
            for dirname, dirnames, filenames in os.walk(tmpDir):
                dirz = dirname.replace(tmpDir, ".")
                for f in filenames:
                    fn = os.path.join(dirz, f)
                    with in_dir(tmpDir):
                        archive.add_files(fn)

I've tried modifing the entry.path inside the archive, and of course it doesn't work.
Thanks

SECURE_NODOTDOT not having effect?

I'm trying to understand how the security flags work. I've created several dummy files, and made an archive (I'll attach or link it here if I can). With this test file, I see no effect from the EXTRACT_SECURE_NODOTDOT flag - symlinks with .. within the package are happily extracted, as are symlinks that point outside the archive. Perhaps I am just misunderstanding what this flag is for?

Note that the .zip was necessary for uploading the file github here - unzip first, then this example code should run.

uncommon.tar.bz2.zip

import os
import libarchive
import subprocess

# original folder contents:
# =========================

# drwxr-xr-x  4 msarahan  staff  128 May 19 17:03 a_folder
# drwxr-xr-x  2 msarahan  staff   64 May 19 17:09 empty_folder
# drwxr-xr-x  8 msarahan  staff  256 May 19 17:49 symlink_stuff

# ./a_folder:
# total 8
# -rw-r--r--  1 msarahan  staff  0 May 19 17:03 empty_file
# -rw-r--r--  1 msarahan  staff  5 May 19 17:03 text_file

# ./empty_folder:

# ./symlink_stuff:
# total 0
# lrwxr-xr-x  1 msarahan  staff  11 May 19 17:06 a_folder -> ../a_folder
# lrwxr-xr-x  1 msarahan  staff  21 May 19 17:26 dot_dot_out_of_archive -> ../../../../something
# lrwxr-xr-x  1 msarahan  staff  12 May 19 17:26 symlink_to_abs_path -> /usr/bin/env
# lrwxr-xr-x  1 msarahan  staff  33 May 19 17:25 symlink_to_empty_file -> ../a_folder/symlink_to_empty_file
# lrwxr-xr-x  1 msarahan  staff  30 May 19 17:25 symlink_to_symlink_to_empty_file -> a_folder/symlink_to_empty_file
# lrwxr-xr-x  1 msarahan  staff  18 May 19 17:05 symlink_to_text_file -> a_folder/text_file

tarball = 'uncommon.tar.bz2'

# extract stuff with security flags

flags = libarchive.extract.EXTRACT_TIME | \
        libarchive.extract.EXTRACT_PERM | \
        libarchive.extract.EXTRACT_SECURE_NODOTDOT | \
        libarchive.extract.EXTRACT_SECURE_SYMLINKS | \
        libarchive.extract.EXTRACT_SECURE_NOABSOLUTEPATHS | \
        libarchive.extract.EXTRACT_SPARSE | \
        libarchive.extract.EXTRACT_UNLINK
if not os.path.isabs(tarball):
    tarball = os.path.join(os.getcwd(), tarball)
try:
    os.makedirs("contents_with_security_flags")
except:
    pass
os.chdir("contents_with_security_flags")
libarchive.extract_file(tarball, flags)
os.chdir("..")
output = subprocess.check_output(["ls", "-lR", "contents_with_security_flags"])
print("Output with security flags")
print(output.decode())
print('\n\n')


# extract stuff without security flags
flags = libarchive.extract.EXTRACT_TIME | \
        libarchive.extract.EXTRACT_PERM
if not os.path.isabs(tarball):
    tarball = os.path.join(os.getcwd(), tarball)
try:
    os.makedirs("contents_without_security_flags")
except:
    pass
os.chdir("contents_without_security_flags")
libarchive.extract_file(tarball, flags)
os.chdir("..")
print("Output without security flags")
output = subprocess.check_output(["ls", "-lR", "contents_without_security_flags"])
print(output.decode())

Any way to list supported read and write filters?

Is read_support_filter_all useful here? Any tips greatly appreciated, not really a bug report, so sorry for filing it here, please advise if there's a better place.

Test failures on Windows

These tests fail for mysterious reasons.
They are all somehow related to writing archives.

I wonder why the new_archive_entry is reused in the paths loop when adding files rather than having a new entry? Nevertheless the failure might be due to the fact that somehow the file being added may be already opened for writing? Not sure. To investigate.. This could all be due to a single problem.
Possibly a build problem, though the built lib works rather well otherwise.

FWIW errno 22 is EINVAL

================================== FAILURES ===================================
________________________________ test_convert _________________________________

    def test_convert():

        # Collect information on what should be in the archive
        tree = treestat('libarchive')

        # Create an archive of our libarchive/ directory
        buf = bytes(bytearray(1000000))
        with memory_writer(buf, 'gnutar', 'xz') as archive1:
>           archive1.add_files('libarchive/')

tests\test_convert.py:21:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
libarchive\write.py:61: in add_files
    r = read_next_header2(read_p, entry_p)
libarchive\ffi.py:85: in check_int
    raise archive_error(args[0], retcode)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

archive_p = 3609976, retcode = -25

    def archive_error(archive_p, retcode):
        msg = _error_string(archive_p)
>       raise ArchiveError(msg, errno(archive_p), retcode, archive_p)
E       ArchiveError: <unprintable ArchiveError object>

libarchive\ffi.py:69: ArchiveError
____________________________ test_entry_properties ____________________________

    def test_entry_properties():

        buf = bytes(bytearray(1000000))
        with memory_writer(buf, 'gnutar') as archive:
>           archive.add_files('README.rst')

tests\test_entry.py:17:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <libarchive.write.ArchiveWrite object at 0x03460150>
paths = ('README.rst',), write_p = 3609224, block_size = 10240
entry_p = 47076136

    def add_files(self, *paths):
        """Read the given paths from disk and add them to the archive.
            """
        write_p = self._pointer

        block_size = ffi.write_get_bytes_per_block(write_p)
        if block_size <= 0:
            block_size = 10240  # pragma: no cover

        with new_archive_entry() as entry_p:
            entry = ArchiveEntry(None, entry_p)
            for path in paths:
                with new_archive_read_disk(path) as read_p:
                    while 1:
                        r = read_next_header2(read_p, entry_p)
                        if r == ARCHIVE_EOF:
                            break
                        entry.pathname = entry.pathname.lstrip('/')
                        read_disk_descend(read_p)
                        write_header(write_p, entry_p)
                        try:
>                           with open(entry_sourcepath(entry_p), 'rb') as f:
                                while 1:
E                               IOError: [Errno 13] Permission denied: '\\\\?\\c:\\w421\\python-libarchive-c\\README.rst'

libarchive\write.py:68: IOError
________________________________ test_buffers _________________________________

tmpdir = local('c:\\tmp\\pytest-49\\test_buffers0')

    def test_buffers(tmpdir):

        # Collect information on what should be in the archive
        tree = treestat('libarchive')

        # Create an archive of our libarchive/ directory
        buf = bytes(bytearray(1000000))
        with libarchive.memory_writer(buf, 'gnutar', 'xz') as archive:
>           archive.add_files('libarchive/')

tests\test_rwx.py:23:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
libarchive\write.py:61: in add_files
    r = read_next_header2(read_p, entry_p)
libarchive\ffi.py:85: in check_int
    raise archive_error(args[0], retcode)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

archive_p = 3668736, retcode = -25

    def archive_error(archive_p, retcode):
        msg = _error_string(archive_p)
>       raise ArchiveError(msg, errno(archive_p), retcode, archive_p)
E       ArchiveError: H??????: Couldn't visit directory (errno=22, retcode=-25, archive_p=3668736)

libarchive\ffi.py:69: ArchiveError
___________________________________ test_fd ___________________________________

tmpdir = local('c:\\tmp\\pytest-49\\test_fd0')

    def test_fd(tmpdir):
        archive_file = open(tmpdir.strpath+'/test.tar.bz2', 'w+b')
        fd = archive_file.fileno()

        # Collect information on what should be in the archive
        tree = treestat('libarchive')

        # Create an archive of our libarchive/ directory
>       with libarchive.fd_writer(fd, 'gnutar', 'bzip2') as archive:
            archive.add_files('libarchive/')

tests\test_rwx.py:45:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
c:\Python27\Lib\contextlib.py:17: in __enter__
    return self.gen.next()
libarchive\write.py:116: in fd_writer
    ffi.write_open_fd(archive_p, fd)
libarchive\ffi.py:85: in check_int
    raise archive_error(args[0], retcode)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

archive_p = 3669416, retcode = -30

    def archive_error(archive_p, retcode):
        msg = _error_string(archive_p)
>       raise ArchiveError(msg, errno(archive_p), retcode, archive_p)
E       ArchiveError: Failed to clean up compressor (errno=22, retcode=-30, archive_p=3669416)

libarchive\ffi.py:69: ArchiveError
_________________________________ test_files __________________________________

tmpdir = local('c:\\tmp\\pytest-49\\test_files0')

    def test_files(tmpdir):
        archive_path = tmpdir.strpath+'/test.tar.gz'

        # Collect information on what should be in the archive
        tree = treestat('libarchive')

        # Create an archive of our libarchive/ directory
        with libarchive.file_writer(archive_path, 'ustar', 'gzip') as archive:
>           archive.add_files('libarchive/')

tests\test_rwx.py:70:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
libarchive\write.py:61: in add_files
    r = read_next_header2(read_p, entry_p)
libarchive\ffi.py:85: in check_int
    raise archive_error(args[0], retcode)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

archive_p = 3609400, retcode = -25

    def archive_error(archive_p, retcode):
        msg = _error_string(archive_p)
>       raise ArchiveError(msg, errno(archive_p), retcode, archive_p)
E       ArchiveError: Hibarchive/: Couldn't visit directory (errno=22, retcode=-25, archive_p=3609400)

libarchive\ffi.py:69: ArchiveError
_____________________________ test_custom_writer ______________________________

    def test_custom_writer():

        # Collect information on what should be in the archive
        tree = treestat('libarchive')

        # Create an archive of our libarchive/ directory
        blocks = []

        def write_cb(data):
            blocks.append(data[:])
            return len(data)

        with libarchive.custom_writer(write_cb, 'zip') as archive:
>           archive.add_files('libarchive/')

tests\test_rwx.py:97:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
libarchive\write.py:61: in add_files
    r = read_next_header2(read_p, entry_p)
libarchive\ffi.py:85: in check_int
    raise archive_error(args[0], retcode)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

archive_p = 3666496, retcode = -25

    def archive_error(archive_p, retcode):
        msg = _error_string(archive_p)
>       raise ArchiveError(msg, errno(archive_p), retcode, archive_p)
E       ArchiveError: 9: Couldn't visit directory (errno=22, retcode=-25, archive_p=3666496)

libarchive\ffi.py:69: ArchiveError
===================== 6 failed, 5 passed in 1.53 seconds ======================

Append to a new archive

Hello,
I'm trying to append files to a newly created archive.
I've a list of files to append to a new archive, created with tempfile.NamedTemporaryFile().
Using libarchive.file_writer, the archive ended up with only the last file on the list.

Using libarchive.fd_writer, the archive has only the first file on the list.

Here are the PoCs:

files = ['a', 'b', 'c', 'd']
tmpArchive = tempfile.NamedTemporaryFile()

for f in files:
    if someCheks(f):
        with libarchive.file_writer(tmpArchive, 'gnutar', 'xz') as archive:
            archive.add_files(f)

and

files = ['a', 'b', 'c', 'd']
tmpArchive = tempfile.NamedTemporaryFile()
tmpArchiveFd = tmpArchive.fileno()

for f in files:
    if someCheks(f):
        with libarchive.fd_writer(tmpArchiveFd, 'gnutar', 'xz') as archive:
            archive.add_files(f)

Not sure if it's related, but at the end of the loop, I'm doing something like:

with open(tmpArchive.name, 'r+b') as f:
            ret = f.read()
            f.close()

return ret

since I need the buffer/data/content.

Provide sane defaults for file_writer

It seems as though the line:

with libarchive.file_writer('test.tar.gz', 'ustar', 'gzip') as archive:

could be boiled down to:

with libarchive.file_writer('test.tar.gz') as archive:

as format_name and filter_name can be inferred by filepath's extension.

Currently, users have to:

Dig through ffi.py source code to see the list of valid format_names and filter_names
- Providing a docstring for file_writer would be invaluable here.
Already know what types of formats to use for what filters.
- I'm sure a list of sane default could be chosen for most formats, which could still be overwritten if specified in the function call.

I'm sure this is a jarring experience to most.

Installing python-libarchive using pip via wheel

Summary
Installing python-libarchive via pip using command "pip3 install python-libarchive" tries to build wheel from source code for any platform.

Problem description
Python-libarchive don't have wheel on PyPI repository. So, while installing python-libarchive via pip, it builds wheel from tar file provided in PyPI which takes more time to install. Making wheel available will benefit users by minimizing installation time.

Expected Output
Pip should be able to download python-libarchive wheel from PyPI repository rather than building it from source code.

@Changaco , I was able to build wheel with command "python setup.py bdist_wheel" and libarchive_c-2.9.post3-py2.py3-none-any.whl has been created in dist folder. I have tried installing in both x86 and aarch64 platform with command "pip install libarchive_c-2.9.post3-py2.py3-none-any.whl", it got installed successfully. Please let me know if any help required in building wheel and uploading it to PyPI repository.

unable to build from github source tarballs

Hey,

It is not possible to build this project from the github tarballs provided through git tags.

This happens because there is no PKG-INFO file in the repo (and in the tarball there is of cause not .git directory as its not a checkout). This makes it impossible to build from the source tarballs prvided through github.
I would highly appreciate it if I could build from this source tarball.

Traceback (most recent call last):
  File "setup.py", line 9, in <module>
    version=get_version(),
  File "/build/python-libarchive-c/src/python-libarchive-c-2.1/version.py", line 29, in get_version
    with open(join(d, 'PKG-INFO')) as f:
IOError: [Errno 2] No such file or directory: '/build/python-libarchive-c/src/python-libarchive-c-2.1/PKG-INFO'

have a nice day,
anthraxx

Writing 7zip file

Hello,

What is the syntax to write a 7z file?

Reading is simple:

with libarchive.file_reader('test.7z') as archive:
    for entry in archive:
        for block in entry.get_blocks():

The help just shows the following:

file_writer(filepath, format_name, filter_name=None, archive_write_class=<class 'libarchive.write.ArchiveWrite'>, options='')

So what is the format_name options? What do I put for the filter_name?

Also if I have a compressed file, is there a way to get all this information from that file to copy it exactly and just add new contents?

add_file_from_memory() can't add binary files

add_file_from_memory() fails if given a bytestring of nonzero length (regardless of content):

import libarchive

with libarchive.file_writer('bytes.tar', 'pax') as ar:
	content = b'bytes'
	ar.add_file_from_memory('bytes.bin', len(content), content)

Traceback (most recent call last):
  File "bytes.py", line 5, in <module>
    ar.add_file_from_memory('bytes.bin', len(content), content)
  File "/usr/lib/python3.6/site-packages/libarchive/write.py", line 105, in add_file_from_memory
    write_data(archive_pointer, chunk, len(chunk))
TypeError: object of type 'int' has no len()

Of course, converting the bytestring to string as a workaround is not always an option.

A better testcase would include some actual invalid UTF-8 (like b'\x80'), to assert binary cleanliness, but as demonstrated, that was not the problem here.

Python 3.6.4
python3-libarchive-c 2.7

Extract empty files?

I have a .tar.bz2 file that has empty (zero-length) files in it, like you'd get from just using touch to create a file. These are not getting extracted when I use this code to extract the tarball:

def _tar_xf(tarball, dir_path):
    flags = libarchive.extract.EXTRACT_TIME | \
            libarchive.extract.EXTRACT_PERM | \
            libarchive.extract.EXTRACT_SECURE_NODOTDOT | \
            libarchive.extract.EXTRACT_SECURE_SYMLINKS | \
            libarchive.extract.EXTRACT_SECURE_NOABSOLUTEPATHS | \
            libarchive.extract.EXTRACT_SPARSE | \
            libarchive.extract.EXTRACT_UNLINK
    if not os.path.isabs(tarball):
        tarball = os.path.join(os.getcwd(), tarball)
    with utils.tmp_chdir(dir_path):
        libarchive.extract_file(tarball, flags)

I think it might be because the loop to stream data ends early at https://github.com/Changaco/python-libarchive-c/blob/master/libarchive/extract.py#L55, but in changing that to always call the write function at least once, it still doesn't work. I've tried to inspect the entry object, but I don't know how to make sense of it - all the interesting fields, like name and path are just empty. Do you have any tips for how to debug/improve this issue? I'd be happy to submit a PR if you point me in the right direction.

Autodetection of libarchive version installed by Brew

The version of libarchive included in Mac OS X is obsolete, consequently many people use Homebrew to install another version, but then they have to set the LIBARCHIVE environment variable, otherwise the obsolete version is still used by the Python module.

It could be a good idea to automatically detect and use by default the libarchive version installed by Brew, by doing something similar to the shell one-liner find "$(brew --cellar libarchive)" -name libarchive.13.dylib | sort | tail -1 and the Python script brew_find_libarchive.

Provide actual error details

Most ffi functions do not have an error check
As a result, it is hard to investigate errors. There should be an error check for every function and appropriate exceptions raised as a needed using the actual message provided by libarchive.

To understand why read:
libarchive/libarchive#459 (comment)
and libarchive/libarchive#538 (comment)

I should not have needed to actually write C code to find a bug as part of #9 . If I had an exception returned with the error message from libarchive, I would have found the solution in a snap to the problem of empty pathname.

add_file_from_memory() fails on NUL character

import libarchive

with libarchive.file_writer('nul.tar', 'pax') as ar:
	content = 'ascii\0NUL'
	ar.add_file_from_memory('nul.bin', len(content), content)

Traceback (most recent call last):
  File "embedded_nul.py", line 5, in <module>
    ar.add_file_from_memory('nul.bin', len(content), content)
  File "/usr/lib/python3.6/site-packages/libarchive/write.py", line 105, in add_file_from_memory
    write_data(archive_pointer, chunk, len(chunk))
ctypes.ArgumentError: argument 2: <class 'ValueError'>: embedded null character

I can't treat the content as binary (the proper solution) either, because of #68.