Giter Club home page Giter Club logo

Comments (12)

miurahr avatar miurahr commented on August 22, 2024

SevenZipFile.extract() calls Worker class to extract target. The core logic inside py7zr is as follows:

  1. walk through file lists and detect directories and symbolic links, and then memory target output filepath into worker class.

for f in self.files:

self.worker.register_filelike(f.id, outfilename)

  1. making directories

py7zr/py7zr/py7zr.py

Lines 765 to 767 in 98ac851

for target_dir in sorted(target_dirs):
try:
target_dir.mkdir()

  1. calling worker extractor

self.worker.extract(self.fp, parallel=(not self.password_protected))

  1. making symbolic links

py7zr/py7zr/py7zr.py

Lines 778 to 785 in 98ac851

# create symbolic links on target path as a working directory.
# if path is None, work on current working directory.
for t in target_sym:
sym_dst = t.resolve()
with sym_dst.open('rb') as b:
sym_src = b.read().decode(encoding='utf-8') # symlink target name stored in utf-8
sym_dst.unlink() # unlink after close().
sym_dst.symlink_to(pathlib.Path(sym_src))

We should add a callback to step3 worker extractor.

Here is an extractor main part.

py7zr/py7zr/compression.py

Lines 201 to 207 in 98ac851

for i in range(numfolders):
r, w = multiprocessing.Pipe(duplex=False)
p = ctx.Process(target=self.extract_single,
args=(filename, folders[i].files,
self.src_start + positions[i], self.src_start + positions[i + 1], w))
p.start()
extract_processes.append((p, r))

Processes are run in spawned another process concurrently. We need to add some shared memory or pipe to communicate extract worker process and main process.

Here is a core part of worker extractor

py7zr/py7zr/compression.py

Lines 229 to 235 in 98ac851

fp.seek(src_start)
for f in files:
fileish = self.target_filepath.get(f.id, None)
if fileish is not None:
with fileish.open(mode='wb') as ofp:
if not f.emptystream:
self.decompress(fp, f.folder, ofp, f.uncompressed[-1], f.compressed, src_end)

Here line 231 get a output file path for target compressed chunk which register in step1.

We can add a code to report progress to main process here.

from py7zr.

MiyamuraMiyako avatar MiyamuraMiyako commented on August 22, 2024

Yes, I think just report IsDirectory, FileName and PercentDone.

from py7zr.

miurahr avatar miurahr commented on August 22, 2024

py7zr extract files concurrently using threading. This may cause report become such as
[(IsDirectory, FileName, DoneItems, SubTotalItems), (IsDirectory, FileName, DoneItems, SubTotalItems).... ]

For example, test data 'tests/data/mblock_1.7z' has 3 blocks, when running '7z l mblock_1.7z' we can see

Listing archive: ./mblock_1.7z

--
Path = ./mblock_1.7z
Type = 7z
Physical Size = 631690
Headers Size = 2305
Method = LZMA2:1536k BCJ
Solid = +
Blocks = 3

So py7zr will run with 4 threads, first one extract directories at first,
second one extract following files;

2015-11-18 18:04:24 .R..A         5263        94014  C/7z.h
2015-11-09 18:41:08 .R..A         1548               C/7zAlloc.c
2015-03-26 01:07:58 .R..A          403               C/7zAlloc.h
.etc...

and third one extract concurrently with second thread following files;

2014-12-03 23:35:44 ....A         5522        25204  DOC/7zC.txt
2010-09-16 21:57:16 ....A         7573               DOC/7zFormat.txt
2016-09-27 19:42:33 ....A         3031               DOC/Methods.txt
etc...

and last extract concurrently

2016-10-05 00:13:34 ....A        35328       510167  bin/7zS2.sfx
2016-10-05 00:13:32 ....A        35328               bin/7zS2con.sfx
2016-10-05 00:12:31 ....A       113152               bin/7zSD.sfx
...etc

from py7zr.

MiyamuraMiyako avatar MiyamuraMiyako commented on August 22, 2024

All extract thread concurrently submit maybe cause issue, I think need add synchronize lock, but maybe cause extract efficiency.

from py7zr.

github-actions avatar github-actions commented on August 22, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

from py7zr.

miurahr avatar miurahr commented on August 22, 2024

@MiyamuraMiyako now #130 implement a primitive progress callback mechanism.
Could you try it? You can see how to use it on test_basic.py:446 test_extract_callback test case.

from py7zr.

miurahr avatar miurahr commented on August 22, 2024

Current API does not provide percentage, but provide filename and extracted bytes.
User can get list of filename, its property and file size from list() method, then API user can calculate a percentage, display whether it is a directory.

from py7zr.

MiyamuraMiyako avatar MiyamuraMiyako commented on August 22, 2024

I tested this version, It can achieve I need.

from py7zr.

miurahr avatar miurahr commented on August 22, 2024

Now you can find a CLI option --verbose at #130 which produce such as

$ python -m py7zr x --verbose tests/data/test_1.7z 
- scripts                                                                  (0%)
- scripts/py7zr                                                           (15%)
- setup.cfg                                                               (23%)
- setup.py                                                               (100%)

You can find a sample code in py7zr/cli.py


class CliExtractCallback(py7zr.callbacks.ExtractCallback):

    def __init__(self, total_bytes, ofd=sys.stdout):
        self.ofd = ofd
        self.archive_total = total_bytes
        self.total_bytes = 0
        self.columns, _ = shutil.get_terminal_size(fallback=(80, 24))
        self.pwidth = 0

    def report_start(self, processing_file_path, processing_bytes):
        self.ofd.write('- {}'.format(processing_file_path))
        self.pwidth += len(processing_file_path) + 2

    def report_end(self, processing_file_path, wrote_bytes):
        self.total_bytes += int(wrote_bytes)
        plest = self.columns - self.pwidth
        progress = self.total_bytes / self.archive_total
        msg = '({:.0%})\n'.format(progress)
        if plest - len(msg) > 0:
            self.ofd.write(msg.rjust(plest))
        else:
            self.ofd.write(msg)
        self.pwidth = 0


if __name__ == "__main__":
    with py7zr.SevenZipFile('target.7z', 'r') as a:
        archive_info = a.archiveinfo()
        cb = CliExtractCallback(total_bytes=archive_info.uncompressed)
        a.extractall(callback=cb)

from py7zr.

miurahr avatar miurahr commented on August 22, 2024

v0.7.0b3 released with callback.

from py7zr.

gmankab avatar gmankab commented on August 22, 2024

hello

please help me add file to archive, and printing callback in percents

from py7zr.

miurahr avatar miurahr commented on August 22, 2024

@gmankab please raise another ticket to request callback feature for archive/write functions.

from py7zr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.