Giter Club home page Giter Club logo

Comments (16)

danpovey avatar danpovey commented on August 24, 2024 1

Likely a system problem, like full disk, OOM killer, something like that.
If you rerun that command it should hopefully continue from the stage that failed.

from icefall.

csukuangfj avatar csukuangfj commented on August 24, 2024

What's the output of the following command?

head -n 5 /home/maduo/miniconda3/envs/k2-fsa20210823/bin/lhotse

If the first line of the output is #!python, you would probably need to manually modify it to #!/usr/bin/env python3

from icefall.

shanguanma avatar shanguanma commented on August 24, 2024

Thanks for your reply, you are right,

(base) maduo@pd:~$ head -n 5 /home/maduo/miniconda3/envs/k2-fsa20210823/bin/lhotse
#!python
"""
Use this script like:

$ lhotse --help

Yes, I manually modified it.

from icefall.

csukuangfj avatar csukuangfj commented on August 24, 2024

@pzelasko
Looks like the shebang problem is reproducible.

from icefall.

shanguanma avatar shanguanma commented on August 24, 2024

I occurs another error in the command lhotse download librispeech , the errror is as follows:

(k2-fsa20210823) maduo@pd:~/w2021/k2-fsa_20210823/icefall/egs/librispeech/ASR$ ./prepare.sh --stage -0 --stop_stage 3
2021-08-23 17:42:44 (prepare.sh:57:main) dl_dir: /mnt/4T/md/icefall_recipes/librispeech/download
2021-08-23 17:42:44 (prepare.sh:66:main) stage 0: Download data
Downloading LibriSpeech parts:   0%|                                                                                                                                                       | 0/7 [00:00<?, ?it/s]2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: dev-clean
2021-08-23 17:42:45,494 INFO [librispeech.py:69] Skipping dev-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-clean/.completed exists.
2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: dev-other
2021-08-23 17:42:45,494 INFO [librispeech.py:69] Skipping dev-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-other/.completed exists.
2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: test-clean
2021-08-23 17:42:45,494 INFO [librispeech.py:69] Skipping test-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-clean/.completed exists.
2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: test-other
2021-08-23 17:42:45,494 INFO [librispeech.py:69] Skipping test-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-other/.completed exists.
2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: train-clean-100
2021-08-23 17:42:45,494 INFO [librispeech.py:69] Skipping train-clean-100 because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-100/.completed exists.
2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: train-clean-360
Downloading LibriSpeech parts:  71%|██████████████████████████████████████████████████████████████████████████████████████████████████████▏                                        | 5/7 [00:57<00:23, 11.54s/it]

Aborted!

from icefall.

csukuangfj avatar csukuangfj commented on August 24, 2024

If you rerun that command it should hopefully continue from the stage that failed.

Yes, rerun

./prepare.sh --stage -0 --stop_stage 3

will continue to download train-clean-360.

from icefall.

shanguanma avatar shanguanma commented on August 24, 2024

Thanks for your reply, I follow your suggestion, but it is still fail. the the below is as follows:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        3.6T  1.4T  2.1T  41% /mnt/4T
(k2-fsa20210823) maduo@pd:~/w2021/k2-fsa_20210823/icefall/egs/librispeech/ASR$ rm -rf /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-360
(k2-fsa20210823) maduo@pd:~/w2021/k2-fsa_20210823/icefall/egs/librispeech/ASR$ ./prepare.sh --stage -0 --stop_stage 3
2021-08-23 18:21:09 (prepare.sh:57:main) dl_dir: /mnt/4T/md/icefall_recipes/librispeech/download
2021-08-23 18:21:09 (prepare.sh:66:main) stage 0: Download data
Downloading LibriSpeech parts:   0%|                                                                                                                                                       | 0/7 [00:00<?, ?it/s]2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: dev-clean
2021-08-23 18:21:10,792 INFO [librispeech.py:69] Skipping dev-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-clean/.completed exists.
2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: dev-other
2021-08-23 18:21:10,792 INFO [librispeech.py:69] Skipping dev-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-other/.completed exists.
2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: test-clean
2021-08-23 18:21:10,792 INFO [librispeech.py:69] Skipping test-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-clean/.completed exists.
2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: test-other
2021-08-23 18:21:10,792 INFO [librispeech.py:69] Skipping test-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-other/.completed exists.
2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: train-clean-100
2021-08-23 18:21:10,792 INFO [librispeech.py:69] Skipping train-clean-100 because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-100/.completed exists.
2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: train-clean-360
Downloading LibriSpeech parts:  71%|██████████████████████████████████████████████████████████████████████████████████████████████████████▏                                        | 5/7 [00:54<00:21, 10.89s/it]

Aborted!

from icefall.

danpovey avatar danpovey commented on August 24, 2024

@pzelasko do you have any ideas?
This is really a Lhotse issue.
Perhaps we need a better way to either continue partial downloads, or better debug why downloads are killed.

from icefall.

pzelasko avatar pzelasko commented on August 24, 2024

OK I will look into it. I am not sure what's the reason, maybe there is a timeout somewhere. Will check.

I'll try to take care of the shebang issue first though.

from icefall.

pzelasko avatar pzelasko commented on August 24, 2024

Was there some exception stack trace?

It looks to me like it didn't even start downloading train-clean-360, otherwise there would have been a partial download message like this:

image

@shanguanma could you try wrapping this whole loop:

https://github.com/lhotse-speech/lhotse/blob/master/lhotse/recipes/librispeech.py#L55

with try/except like the following:

try:
  for ...
except Exception as e:
  print(type(e))
  print(e)
  print(locals())
  raise

and see what comes out? I am also not 100% sure that it is a Lhotse error, your job might be getting killed etc., but we can try to debug it.

from icefall.

shanguanma avatar shanguanma commented on August 24, 2024

@pzelasko ,I follow your suggestion, the errors is as follows:

(k2-fsa20210823) maduo@pd:~/w2021/k2-fsa_20210823/icefall/egs/librispeech/ASR$ ./prepare.sh --stage 0 --stop_stage 0
2021-08-24 19:20:54 (prepare.sh:57:main) dl_dir: /mnt/4T/md/icefall_recipes/librispeech/download
2021-08-24 19:20:54 (prepare.sh:66:main) stage 0: Download data
Downloading LibriSpeech parts:   0%|                                                                                                                                                       | 0/7 [00:00<?, ?it/s]2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: dev-clean
2021-08-24 19:20:55,222 INFO [librispeech.py:69] Skipping dev-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-clean/.completed exists.
2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: dev-other
2021-08-24 19:20:55,222 INFO [librispeech.py:69] Skipping dev-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-other/.completed exists.
2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: test-clean
2021-08-24 19:20:55,222 INFO [librispeech.py:69] Skipping test-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-clean/.completed exists.
2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: test-other
2021-08-24 19:20:55,222 INFO [librispeech.py:69] Skipping test-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-other/.completed exists.
2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: train-clean-100
2021-08-24 19:20:55,222 INFO [librispeech.py:69] Skipping train-clean-100 because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-100/.completed exists.
2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: train-clean-360
Downloading LibriSpeech parts:  71%|██████████████████████████████████████████████████████████████████████████████████████████████████████▏                                        | 5/7 [00:57<00:23, 11.59s/it]<class 'EOFError'>
Compressed file ended before the end-of-stream marker was reached
{'target_dir': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download'), 'dataset_parts': ('dev-clean', 'dev-other', 'test-clean', 'test-other', 'train-clean-100', 'train-clean-360', 'train-other-500'), 'force_download': False, 'alignments': False, 'base_url': 'http://www.openslr.org/resources', 'alignments_url': 'https://drive.google.com/uc?id=1WYfgr31T-PPwMcxuAq09XZfHQO5Mw8fE', 'part': 'train-clean-360', 'url': 'http://www.openslr.org/resources/12', 'part_dir': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-360'), 'completed_detector': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-360/.completed'), 'tar_name': 'train-clean-360.tar.gz', 'tar_path': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/train-clean-360.tar.gz'), 'tar': <tarfile.TarFile object at 0x7f5fb3d76f70>, 'e': EOFError('Compressed file ended before the end-of-stream marker was reached')}

Aborted!

from icefall.

pzelasko avatar pzelasko commented on August 24, 2024

You have a partially downloaded archive. If you remove train-clean-360.tar.gz and re-run it should work. I will change the code to handle this correctly.

from icefall.

shanguanma avatar shanguanma commented on August 24, 2024

@pzelasko ,I follow your suggestion( I remove train-clean-360.tar.gz and re-run it)the errors is as follows:

Downloading LibriSpeech parts:   0%|                                                                                                                                                       | 0/7 [00:00<?, ?it/s]2021-08-24 19:31:09,098 INFO [librispeech.py:56] Processing split: dev-clean
2021-08-24 19:31:09,098 INFO [librispeech.py:69] Skipping dev-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-clean/.completed exists.
2021-08-24 19:31:09,098 INFO [librispeech.py:56] Processing split: dev-other
2021-08-24 19:31:09,098 INFO [librispeech.py:69] Skipping dev-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-other/.completed exists.
2021-08-24 19:31:09,099 INFO [librispeech.py:56] Processing split: test-clean
2021-08-24 19:31:09,099 INFO [librispeech.py:69] Skipping test-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-clean/.completed exists.
2021-08-24 19:31:09,099 INFO [librispeech.py:56] Processing split: test-other
2021-08-24 19:31:09,099 INFO [librispeech.py:69] Skipping test-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-other/.completed exists.
2021-08-24 19:31:09,099 INFO [librispeech.py:56] Processing split: train-clean-100
2021-08-24 19:31:09,099 INFO [librispeech.py:69] Skipping train-clean-100 because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-100/.completed exists.
2021-08-24 19:31:09,099 INFO [librispeech.py:56] Processing split: train-clean-360
Downloading train-clean-360.tar.gz: 21.5GB [1:32:18, 4.16MB/s]                                                                                                                                                   Downloading LibriSpeech parts:  86%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                    | 6/7 [1:37:25<16:14, 974.31s/it]2021-08-24 21:08:34,933 INFO [librispeech.py:56] Processing split: train-other-500
Downloading train-other-500.tar.gz:  80%|███████████████████████████████████████████████████████████████████████████████████████████████████████▊                         | 22.9G/28.5G [1:50:48<26:51, 3.70MB/s]Downloading LibriSpeech parts:  86%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                   | 6/7 [3:28:14<34:42, 2082.37s/it]<class 'urllib.error.ContentTooShortError'>
<urlopen error retrieval incomplete: got only 24626188545 out of 30593501606 bytes>
{'target_dir': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download'), 'dataset_parts': ('dev-clean', 'dev-other', 'test-clean', 'test-other', 'train-clean-100', 'train-clean-360', 'train-other-500'), 'force_download': False, 'alignments': False, 'base_url': 'http://www.openslr.org/resources', 'alignments_url': 'https://drive.google.com/uc?id=1WYfgr31T-PPwMcxuAq09XZfHQO5Mw8fE', 'part': 'train-other-500', 'url': 'http://www.openslr.org/resources/12', 'part_dir': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-other-500'), 'completed_detector': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-other-500/.completed'), 'tar_name': 'train-other-500.tar.gz', 'tar_path': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/train-other-500.tar.gz'), 'tar': <tarfile.TarFile object at 0x7f7bdff7edf0>, 'e': ContentTooShortError('retrieval incomplete: got only 24626188545 out of 30593501606 bytes')}
Traceback (most recent call last):
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/bin/lhotse", line 24, in <module>
    cli()
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/lhotse/bin/modes/recipes/librispeech.py", line 32, in librispeech
    download_librispeech(target_dir, dataset_parts='librispeech' if full else 'mini_librispeech')
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/lhotse/recipes/librispeech.py", line 75, in download_librispeech
    urlretrieve_progress(f'{url}/{tar_name}', filename=tar_path, desc=f'Downloading {tar_name}')
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/lhotse/utils.py", line 335, in urlretrieve_progress
    return urlretrieve(url=url, filename=filename, reporthook=reporthook, data=data)
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/urllib/request.py", line 286, in urlretrieve
    raise ContentTooShortError(
urllib.error.ContentTooShortError: <urlopen error retrieval incomplete: got only 24626188545 out of 30593501606 bytes>

from icefall.

pkufool avatar pkufool commented on August 24, 2024

The download error occurred to me too. My download of train-clean-360.tar.gz was kill twice at the same position (71%).
image

And this is the error of train-other-500.tar.gz.
image

I guess it is the problem of weak network connection, it takes too long to download these data, may be some unstable connection during these time.

from icefall.

danpovey avatar danpovey commented on August 24, 2024

Perhaps it would be better to use another tool for downloading, that allows continuing? E.g. wget?
Maybe that urlretrieve is only good for short files.

from icefall.

pzelasko avatar pzelasko commented on August 24, 2024

I think it’s a server side timeout… for now please use wget for these two files like Dan suggested, I might not have enough time to improve the downloading in Lhotse right away, but I definitely want it to “just work” and will work on it.

from icefall.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.