mpc001 / auto_avsr Goto Github PK

View Code? Open in Web Editor NEW

140.0 5.0 35.0 31.44 MB

Auto-AVSR: Lip-Reading Sentences Project

License: Apache License 2.0

Python 99.82% Shell 0.18%

lipreading visual-speech-recognition

auto_avsr's People

Contributors

Stargazers

Watchers

Forkers

park323 jupiter0913 sarahsalimpour hamza-zaman sectum1919 pphuc25 ruoyxue eosmers jinmingche orena1 torriephd feiyunobug shoksmile pranavdbhat reflectionl vshanyiao fengdalu therealvigilante

auto_avsr's Issues

Re-implementation error

Does the problem previously posed as a question(#20) affect performance?

I'm re-training with a newly updated code.

Here, as a question,
I'm using A100 GPUs(4) to perform training,

so I'm wondering if it's the right way to perform training by 8 times less than the A100 GPUs(32) you used.

And, we're training your code countless times without modifying it, but 96.6% like [vsr_trlrs3_23h_base.pth] is not coming out,
only 99.4% is coming out as a result, and I need some advice.

Running Demo gets ModuleNotFoundError: No module named 'six'

I've tried to run demo with a video for vsr executing the next line:

(TT) PS D:\auto_avsr> python demo.py data.modality='audio' pretrained_model_path='.\asr_trlrs3vox2_base.pth' file_path='.\avsr_english_1.mp4'

but i got the next error

Traceback (most recent call last):
  File "demo.py", line 7, in <module>
    from lightning import ModelModule
  File "D:\auto_avsr\lightning.py", line 7, in <module>
    from espnet.nets.batch_beam_search import BatchBeamSearch
  File "D:\auto_avsr\espnet\nets\batch_beam_search.py", line 8, in <module>
    from espnet.nets.beam_search import BeamSearch, Hypothesis
  File "D:\auto_avsr\espnet\nets\beam_search.py", line 9, in <module>
    from espnet.nets.e2e_asr_common import end_detect
  File "D:\auto_avsr\espnet\nets\e2e_asr_common.py", line 16, in <module>
    import six
ModuleNotFoundError: No module named 'six'

I already execute step by step to setup enviroment, and already install c++ requirements but still get the same error

Conda enviroment
Python 3.8.18
Windows 11 w/ powershell

No module named 'ibug.face_alignment'

from ibug.face_alignment import FANPredictor
ModuleNotFoundError: No module named 'ibug.face_alignment'

`cut_or_pad` function is wrong

The current version is:

def cut_or_pad(data, size, dim=0):
    if data.size(dim) < size:
        padding = size - data.size(dim)
        data = torch.nn.functional.pad(data, (0, padding), "constant")
    elif data.size(dim) > size:
        data = data[:size]
    assert data.size(dim) == size
    return data

The right version should be:

def cut_or_pad(data, size, dim=0):
    if data.size(dim) < size:
        padding = size - data.size(dim)
        data = torch.nn.functional.pad(data, (0, 0, 0, padding), "constant")           # modified
        size = data.size(dim)                                                          # added
    elif data.size(dim) > size:
        data = data[:size]
    assert data.size(dim) == size
    return data

What result should be obtained under normal circumstances after preprocessing?

What result should be obtained under normal circumstances after preprocessing?
Currently I only have a 0KB file called lrs3_train_transcript_lengths_seg24s.csv. Is this correct?

Issue with hydra - Error merging data/dataset=cstm Key 'defaults' not in 'FairseqConfig'

Hello,

I left the below comment under issue #3 but since it's closed I am not sure the comment will be seen.

I am trying to run the training on a custom dataset and also experiencing this issue.
The file cstm.yaml is placed in here auto_avsr/conf/data/dataset and looks like this:
`defaults:

self
root: "/content/drive/MyDrive/sepedi/data/preprocess_datasets"
label_dir: "labels"
train_file: "train_labels.csv"
val_file: "val_labels.csv"
test_file: "test_labels.csv"`

As suggested above, I tried renaming the conf/config.yaml file. However, when I run:

!python main.py exp_dir=exp \
exp_name=trainaudio \
data.modality=audio \
ckpt_path='content/drive/MyDrive/LRS3_A_WER1.0/model.pth' \
+data/dataset=cstm \ trainer.num_nodes=1

I get:

Error merging data/dataset=cstm Key 'defaults' not in 'FairseqConfig' full_key: defaults reference_type=Optional[FairseqConfig] object_type=FairseqConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

When running:

%env HYDRA_FULL_ERROR=1 !python main.py exp_dir=exp \ exp_name=trainaudio \ data.modality=audio \ ckpt_path='content/drive/MyDrive/LRS3_A_WER1.0/model.pth' \ +data/dataset=cstm \ trainer.num_nodes=1

I get:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 720, in _merge_config
ret = OmegaConf.merge(cfg, loaded_cfg)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/omegaconf.py", line 321, in merge
target.merge_with(*others[1:])
File "/usr/local/lib/python3.10/dist-packages/omegaconf/basecontainer.py", line 331, in merge_with
self._format_and_raise(key=None, value=None, cause=e)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/base.py", line 95, in _format_and_raise
format_and_raise(
File "/usr/local/lib/python3.10/dist-packages/omegaconf/_utils.py", line 629, in format_and_raise
_raise(ex, cause)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/_utils.py", line 610, in _raise
raise ex # set end OC_CAUSE=1 for full backtrace
File "/usr/local/lib/python3.10/dist-packages/omegaconf/basecontainer.py", line 329, in merge_with
self._merge_with(*others)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/basecontainer.py", line 347, in _merge_with
BaseContainer._map_merge(self, other)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/basecontainer.py", line 314, in _map_merge
dest[key] = src._get_node(key)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/dictconfig.py", line 258, in setitem
self._format_and_raise(
File "/usr/local/lib/python3.10/dist-packages/omegaconf/base.py", line 95, in _format_and_raise
format_and_raise(
File "/usr/local/lib/python3.10/dist-packages/omegaconf/_utils.py", line 629, in format_and_raise
_raise(ex, cause)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/_utils.py", line 610, in _raise
raise ex # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigKeyError: Key 'defaults' not in 'FairseqConfig'
full_key: defaults
reference_type=Optional[FairseqConfig]
object_type=FairseqConfig

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/content/auto_avsr/main.py", line 74, in
main()
File "/usr/local/lib/python3.10/dist-packages/hydra/main.py", line 32, in decorated_main
_run_hydra(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 346, in _run_hydra
run_and_report(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 201, in run_and_report
raise ex
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 198, in run_and_report
return func()
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 347, in
lambda: hydra.run(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 100, in run
cfg = self.compose_config(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 507, in compose_config
cfg = self.config_loader.load_configuration(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
return self._load_configuration(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 256, in _load_configuration
cfg = self._merge_defaults_into_config(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 805, in _merge_defaults_into_config
hydra_cfg = merge_defaults_list_into_config(hydra_cfg, user_list)
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 777, in merge_defaults_list_into_config
merged_cfg = self._merge_config(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 724, in _merge_config
raise ConfigCompositionException(
hydra.errors.ConfigCompositionException: Error merging data/dataset=cstm

I am running this on colab due to issues installing fairseq editable locally.

Thank you in advance!

How to get audio from mp4 using torchaudio

Hi， when I run the preprocess_lrs2lrs3.py ， I got an error when 'audio_data = aud_dataloader.load_data(data_filename)' .
It seem that the sox do not support the mp4 files, how can i to solve it ?

Thank you very much.

Hydra Conflict Problem

Hi
Thank you for sharing the code.

For Installing additional packages in step 3.4 I got this error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ipython 7.34.0 requires jedi>=0.16, which is not installed.
arviz 0.15.1 requires setuptools>=60.0.0, but you have setuptools 59.5.0 which is incompatible.
cvxpy 1.3.2 requires setuptools>65.5.1, but you have setuptools 59.5.0 which is incompatible.
fairseq 0.12.2 requires hydra-core<1.1,>=1.0.7, but you have hydra-core 1.3.0 which is incompatible.
fairseq 0.12.2 requires omegaconf<2.1, but you have omegaconf 2.3.0 which is incompatible.
Successfully installed GitPython-3.1.32 antlr4-python3-runtime-4.9.3 av-10.0.0 docker-pycreds-0.4.0 gitdb-4.0.10 hydra-core-1.3.0 lightning-utilities-0.9.0 omegaconf-2.3.0 pathtools-0.1.2 pyDeprecate-0.3.1 pytorch-lightning-1.5.10 sentencepiece-0.1.99 sentry-sdk-1.29.2 setproctitle-1.3.2 setuptools-59.5.0 smmap-5.0.0 torchmetrics-1.0.3 wandb-0.15.8

Can you help me about this?

A potential bug

I used part of your code in my work, and I find a potential bug (I have not run your original code though). Please can you give it a check? Specifically, this line pads the audio data if its length is smaller than 640 times the corresponding video data length. And this line says the variable data has a size of Tx1, so the torch.nn.functional.pad function in this line will result a output size of Tx(1+padding). This seems incorrect to me. I think the padding result is supposed to be (T+padding)x1, and this line may need to be changed to something like torch.nn.functional.pad(data, (0, 0, 0, padding), "constant"). I know I may be wrong as I have not run your original code. Please can you check it anyway?

Thanks!

Running demo.py in colab results in

Hi!

Thanks a lot for the model.
I try to run AVSR model in colab using demo.py. I'm using asr_trlrwlrs2lrs3vox2avsp_base.pth and I've specified the modality as 'audiovisual'. I'm getting this error:

Error executing job with overrides: ['data.modality=[audiovisual]', 'pretrained_model_path=[/content/asr_trlrwlrs2lrs3vox2avsp_base.pth]', 'file_path=[/content/de0fe3b3380fcc9575a8193b43226e51.mp4]']
Traceback (most recent call last):
  File "/content/auto_avsr/demo.py", line 77, in main
    pipeline = InferencePipeline(cfg)
  File "/content/auto_avsr/demo.py", line 30, in __init__
    self.modelmodule = ModelModule(cfg)
  File "/content/auto_avsr/lightning.py", line 29, in __init__
    self.model = E2E(len(self.token_list), self.backbone_args)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'ModelModule' object has no attribute 'backbone_args'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I tried also specifying ['audio', 'video'] but that doesn't seem right.

AVSpeech Dataset

I noticed that there isn't a location to download the videos of the AVSpeech dataset. I understand that you likely took the time to download the videos, trim them, label them etc. Is it possible that you could share the video dataset itself similar to how the other datasets are available? or at least share your code on how to extract the AVSpeech dataset in the same manner you did so that we can reproduce your results?

can i know input time?

Hello, thank you for sharing a great model.

I leave an issue with a question.

What is the time dimension that goes into the input of the model? (In the paper, we found that the fps is 25 but not the time dimension.)
What happens if I put a video longer than the input time dimension?

Thank you.

Number of GPUs for training

Hi,

Thanks for releasing the training code for Auto-AVSR. I was curious to know the number of gpus used for training the model when using different amount of data like 23/438/3448 hours for LRS3.

Thanks

How to use AUTO-AVSR to train a Chinese AVSR model

How to use AUTO-AVSR to train a Chinese AVSR model, e.g. using CMLR data sets.Visual_Speech_Recognition_for_Multiple_Languages this project came to my attention.What is the relationship between this project and the AUTO-AVSR project

Unicode Decode Error when running the LRS2 data preparation

Thank you for providing the training code for the Auto AVSR.

I am facing an issue when trying to run the preprocess_lrs2lrs3.py file using the LRS2 dataset. I am seeing the below error:

Traceback (most recent call last):
File "preprocess_lrs2lrs3.py", line 77, in
text_transform = TextTransform()
File "A:\Projects\auto_avsr\preparation\transforms.py", line 152, in init
units = open(dict_path).read().splitlines()
File "C:\Users\Girish\anaconda3\envs\autoavsr\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4416: character maps to

Any help to resolve this would be greatly appreciated!

VSR Model Training Issues

We are learning the VSR model as it is without any modifications.

The code you proceed with during your study is as follows.

python train.py exp_dir=[exp_dir]
exp_name=[exp_name]
data.modality="video"
data.dataset.root_dir=[root_dir]
data.dataset.train_file="lrs3_train_transcript_lengths_seg24s.csv"
data.dataset.val_file="lrs3_test_transcript_lengths_seg24s.csv"
trainer.num_nodes="1"
trainer.gpus="5"
data.max_frames="1800"
optimizer.lr="0.0002" \

: However, even after training the code several times, the values of "decoder_acc_step" and "decoder_acc_val" do not change when they exceed Epoch 30.

This means that the loss value does not drop.

Is there anything else important to set up when training in particular?

Thank you for your response in advance.

Asking for how to build the corpus for LRS2

Thank your for sharing your excellent work. I want to train the model based on the LRS2 dataset and thus wonder whether the corpus for LRS3 is also applicable for LRS2? If not, can you provide any recipe to build the new corpus?

Something went wrong with hydra and omegaconf

When I run

python main.py exp_dir=exp \
               exp_name=train_24_scratch \
               data.modality=vsr \
               optimizer.lr=3e-4 \

for training, something error with hydra.
Error occus:

Error merging 'config' with schema
Key 'exp_name' not in 'FairseqConfig'
        full_key: exp_name
        reference_type=Optional[Dict[Union[str, Enum], Any]]
        object_type=FairseqConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

So I run

HYDRA_FULL_ERROR=1 python main.py exp_dir=exp \
               exp_name=train_24_scratch \
               data.modality=vsr \
               optimizer.lr=3e-4 \

and it shows:

Traceback (most recent call last):
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 618, in _load_config_impl
    merged = OmegaConf.merge(schema.config, ret.config)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/omegaconf.py", line 321, in merge
    target.merge_with(*others[1:])
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/basecontainer.py", line 331, in merge_with
    self._format_and_raise(key=None, value=None, cause=e)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
    _raise(ex, cause)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/basecontainer.py", line 329, in merge_with
    self._merge_with(*others)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/basecontainer.py", line 347, in _merge_with
    BaseContainer._map_merge(self, other)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/basecontainer.py", line 314, in _map_merge
    dest[key] = src._get_node(key)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 258, in __setitem__
    self._format_and_raise(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
    _raise(ex, cause)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigKeyError: Key 'exp_name' not in 'FairseqConfig'
        full_key: exp_name
        reference_type=Optional[Dict[Union[str, Enum], Any]]
        object_type=FairseqConfig
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "main.py", line 74, in <module>
    main()
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra
    run_and_report(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/utils.py", line 347, in <lambda>
    lambda: hydra.run(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 100, in run
    cfg = self.compose_config(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 507, in compose_config
    cfg = self.config_loader.load_configuration(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
    return self._load_configuration(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 224, in _load_configuration
    job_cfg, job_cfg_load_trace = self._load_primary_config(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 819, in _load_primary_config
    ret, load_trace = self._load_config_impl(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 628, in _load_config_impl
    raise ConfigCompositionException(
hydra.errors.ConfigCompositionException: Error merging 'config' with schema

Any help to resolve this would be greatly appreciated!

How to train an auto-avsr model from scratch through curriculum learning

Thank you for sharing the code.

I am interested in training a visual-only model from scratch on the LRS2 dataset, using curriculum learning.
I want to know the optimal learning rate and the number of epochs for training the model using a subset of LRS2 that includes only short utterances lasting no more than 4 seconds (100 frames).
Could you provide details on how you trained the visual-only model available in the model zoo using only the LRS3 dataset (438 hours)?

The AV-ASR pretrained models.

Thanks for sharing your work.
May I ask when will the audiovisual pretrained models be likely to be released?

Number of gpus / total batch size to reproduce the results in the paper

Hi,

Thanks for sharing your code. If my understanding of the code is correct, the size of batch is actually related to the number of gpus for training. If I want to get a good result (i.e. reproduce the result in the paper), how many gpus I need? How many gpus are used to get the result in the paper?

Thank you!

cannot import name 'eval_env' from 'torchaudio._internal.module_utils'

When I run:
python train.py exp_dir=D:/pycharmProject/auto_avsr-main/auto_avsr-main/checkpoints exp_name=exp1 data.modality=video data.dataset.root_dir=D:/BaiduNetdiskDownload/pre

I got this error:
Traceback (most recent call last): File "train.py", line 10, in <module> from datamodule.data_module import DataModule File "D:\pycharmProject\auto_avsr-main\auto_avsr-main\datamodule\data_module.py", line 6, in <module> from .av_dataset import AVDataset File "D:\pycharmProject\auto_avsr-main\auto_avsr-main\datamodule\av_dataset.py", line 4, in <module> import torchaudio File "D:\anaconda3\envs\auto_avsr\lib\site-packages\torchaudio\__init__.py", line 1, in <module> from torchaudio import ( # noqa: F401 File "D:\anaconda3\envs\auto_avsr\lib\site-packages\torchaudio\_extension\__init__.py", line 5, in <module> from torchaudio._internal.module_utils import eval_env, fail_with_message, is_module_available, no_op ImportError: cannot import name 'eval_env' from 'torchaudio._internal.module_utils' (D:\anaconda3\envs\auto_avsr\lib\site-packages\torchaudio\_internal\module_utils.py)

About the real-time AVSR model

Hi ,
Thanks for sharing your work. May I ask when will the real-time audiovisual pretrained models be likely to be released?
I download a realtime avsr model from https://download.pytorch.org/torchaudio/tutorial-assets/device_avsr_model.pt .
Then I want to test the eval.py(https://github.com/pytorch/audio/tree/main/examples/avsr/eval.py ). But it was wrong when load the model

The Audiovisual pretrained model.

Thanks for sharing your work.
And thanks for releasing the pretrained VSR and ASR models.
Is it possible to release the pretrained AVSR models?