Giter Club home page Giter Club logo

python-idzip's People

Contributors

bauman avatar codito avatar davidnemeskey avatar mobiusklein avatar mozbugbox avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

python-idzip's Issues

TypeError: `output` must be a file-like object supporting write, tell, flush, and close!

python-idzip version 0.3

Sample code to reproduce the issue

import idzip

with open("/tmp/r.txt", mode="wb") as f:
    zipfile = idzip.IdzipFile(fileobj=f, mode="wb")
    zipfile.write(b"\x00ed")
    zipfile.close()

Stacktrace

Traceback (most recent call last):
  File "/tmp/trial.py", line 4, in <module>
    zipfile = idzip.IdzipFile(fileobj=f, mode="wb")
  File "/home/arun/src/gh/stardict/.venv/lib/python3.6/site-packages/idzip/api.py", line 39, in __init__                                                       
    self._impl = self._make_writer(fileobj, sync_size=sync_size, mtime=mtime)
  File "/home/arun/src/gh/stardict/.venv/lib/python3.6/site-packages/idzip/api.py", line 53, in _make_writer                                                   
    return IdzipWriter(filespec, sync_size=sync_size, mtime=mtime)
  File "/home/arun/src/gh/stardict/.venv/lib/python3.6/site-packages/idzip/compressor.py", line 266, in __init__                                               
    "`output` must be a file-like object supporting "
TypeError: `output` must be a file-like object supporting write, tell, flush, and close!                                                                       
Exception ignored in: <bound method IOStreamWrapperMixin.__del__ of <idzip.compressor.IdzipWriter object at 0x7f6039c2d4a8>>                                   
Traceback (most recent call last):
  File "/home/arun/src/gh/stardict/.venv/lib/python3.6/site-packages/idzip/_stream.py", line 22, in __del__                                                    
    if not self.closed:
  File "/home/arun/src/gh/stardict/.venv/lib/python3.6/site-packages/idzip/_stream.py", line 7, in closed                                                      
    return self.stream.closed
  File "/home/arun/src/gh/stardict/.venv/lib/python3.6/site-packages/idzip/compressor.py", line 293, in stream                                                 
    return self.output
AttributeError: 'IdzipWriter' object has no attribute 'output'

_select_member optimization

Profiling of randomly accessing a large number of positions from a large file shows that more time is spent linearly searching for _Member objects than actually performing complex computation on them. I propose to optimize IdzipReader._select_member to detect when the position requested is within the set of parsed members and use a binary search to select the correct member in O(logn) time rather than O(n) time.

Is it at any point reasonable to just load all _Members in a single pass?

IdzipWriter does not properly handle write loads that are not a integral factor of MAX_MEMBER_SIZE

I noticed while compressing a large file with a sloppily written loop that if I did not use a chunk size of MAX_MEMBER_SIZE or an integral fraction thereof, the content of the file may be corrupted.

import idzip
src = '...'
dest = '...'
with open(src, 'rb') as infh, idzip.open(dest, 'wb') as outfh:
    # chunk_size = idzip.MAX_MEMBER_SIZE
    chunk_size = 2 ** 28
    chunk = infh.read(chunk_size)
    while chunk:
        outfh.write(chunk)
        chunk = infh.read(chunk_size)    

I think the issue is in how IdzipWriter.write tries to handle large buffers gracefully, but I need to investigate further.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.