speechbrain / speechbrain Goto Github PK
View Code? Open in Web Editor NEWA PyTorch-based Speech Toolkit
Home Page: http://speechbrain.github.io
License: Apache License 2.0
A PyTorch-based Speech Toolkit
Home Page: http://speechbrain.github.io
License: Apache License 2.0
I noticed that the GPU memory consumption will gradually increase to the maximum and then drop back to normal and then increase again (3G->12G->3G on my computer). Probably due to data_loader caching, it might not be a problem.
Hey !
the file save_opt isn't created in the current TIMIT preparer. Is it voluntary? If yes, then we have to remove it from the skip check, otherwise it won't skip.
While discussing with @jjery2243542 we probably linked a problem of training convergence of GRU models to a bad initialisation of the weights. Indeed, PyTorch RNNs (LSTM,GRU,RNN) are initialised without respect to Glorot / He criterion (uniform with 1/n). LiGRU do not need that thanks to ReLU + BatchNorm ( alleviate saturation) but standard tanh-based RNN might saturate with an increasing number of neurons (@jjery2243542 might be experiencing this for librispeech).
We could add an init function that applies glorot on input to hidden and orthogonal init on hidden to hidden (+ set all the biases to zero).
Augmentations can be made deterministic and so should have unittests.
Now the RNN modules have no such mechanism.
More than one person pointed out the need for a file like res.res to better monitor training. This file should report the following information for each epoch:
epoch01 tr_loss valid_loss valid_err learning rate
I think that this can be easily implemented within the brain class.
As title. It looks like there is an unnecessary transposing. Also, it will reshape back to 4D tensor if the input is a 4D tensor. It seems also unnecessary, and the reshape could be changed.
Maybe I am missing something, but right now it seems to me that we have to manually add every new loss function to ComputeCost init with an if statement.
In this way it could grow excessively large and will be cumbersome to maintain.
Maybe it could be turned into a wrapper. For most loss function basically we will need only the zero padding masking functionality only. Otherwise one will have to add BCE, BCEwithlogits, dice_loss etc etc.
I think it is a little bit not so modular and transparent using this module. Maybe we could define it in the recipe?
Perhaps a better name could be used for this file to indicate that it does configuration of the logger, not actual logging.
List of things to convert:
Instructions from proposal document:
xxx.cfg
files to corresponding directory in recipes
[global]
section to a yaml file (e.g. params.yaml
), rename to constants:
[functions]
section to the yaml file=
to :
functions:
into saveables:
and functions:
replicate
parts), move all model code to a model.py file. Define a new subclass of torch.nn.Module
that takes all key model parameters (e.g. number of layers, etc.) and use these parameters to build the model.experiment.py
, instantiate an Experiment
object and pass:execute_computations
would be called, instead:There are some things that are currently not possible to override due to the command-line overrides being loaded using yaml.safe_load()
rather than load_extended_yaml()
. For example:
activation: !torch.nn.LeakyReLU
may be nice to override with a different activation. This can be overridden in python just fine, but not from the command line, e.g.
python experiment.py params.yaml --yaml_overrides="{activation: !torch.nn.ReLU}"
A first very basic test would make sure they don't crash.
Another small issue: when running pytest, a new exp folder appears. The exp folder contains speech samples from the data_augmentation part.
I think it is important to also report a running averaged loss or current mini-batch loss on the progress bar. For instance, I'm dealing with 1000+ hours of training data, one epoch takes more than 4 hours. I definitely don't want to wait 4H to debug the fact that the loss went to NaN :P What do you think?
I noticed that the file lobes/augment/spec_augment.yaml still contain the features. Is that done on purpose? In our recent pull request, @pplantinga separated the data augmentation from the feature extraction...
In /lobes/models/cnn_block.yaml
we use the following hack to specify the number of channels in the cnn layers:
out_channels: !ref <block_index> * <channels>
This is not very general and people might want a more flexible way to specify the number of channels (think about hyperparameter tuning for instance). I'm wondering if it is possible to be more explicit and allow the new replication function to digest parameter lists like this:
n_channels = [128, 256]
This way users can select different parameters for each different replica. @pplantinga, @Gastron any thought on that?
Use native GitHub support for CI
Currently if an experiment has been run, the yaml linter chokes on generated yaml files
This can be fixed by adding # yamllint disable
at the top of these files.
Will also help with making sure SpeechBrain can be used via pip install
aka in toolkit fashion rather than in framework fashion.
Necessary steps to make this happen:
log_config.yaml
is not expected to be in a certain placeEvery time I run the same experiment (e.,g minimal_examples/neural_networks/autoencoder) on the same machine I got slightly different results. Since we set up the seed, this shouldn't happen.
When I run for the experiment folder the minimal examples, it now fails for ASR_CTC, ASR_DNN_HMM, spk_ID (it only works for the autoencoder one). It complains about:
File "/home/mirco/speechbrain_github/speechbrain/speechbrain/yaml.py", line 258, in deref
raise ValueError('The reference "%s" is not valid' % ref)
ValueError: The reference "<output_folder>" is not valid
!ref <output_folder>
Hey,
What are the motivations to have both a .py for the data preparation on each recipe's directory and in the data_prepare.py lib?
SpeechBrain should have a version number defined
Checklist:
Steps listed in proposal:
>>> assert func(tensor([1.])) == 7.
__init__
changes:def init_params(self, first_input):
raise
statements. Pick a built-in error that seems appropriate (ValueError
is common). These statements will automatically be logged.logging
is imported, and at the end of the imports, add the following line to define the logger for the module:logger = logging.getLogger(__name__)
logger.<level>(message)
As we discussed during the call, I think the current system is not very transparent when it manages the additional labels that might occur in CTC (blank) or attention-based models (e.g, EOS). I agree with Aku that this is connected with the dataloader and the creation of the label dictionary. We can thus revise this part as well if we want to revise the dataloader part.
When the doctest for pooling is run, it fails with the following message:
__________________ [doctest] speechbrain.nnet.pooling.Pooling __________________
035 ceil_mode : int
036 When True, will use ceil instead of floor to compute the output shape.
037
038 Example
039 -------
040 >>> pool = Pooling('max',3)
041 >>> inputs = torch.rand(10, 50, 40)
042 >>> pool.init_params(inputs)
043 >>> output=pool(inputs)
044 >>> output.shape
Expected:
torch.Size([10, 50, 38])
Got:
torch.Size([10, 50, 13])
Now it will only print 0.00. Maybe we should print it with scientific notation.
The README still describes the old CFG file architecture. The README should be updated.
I think first step would be to remove out-of-date info and add the latest development guidelines.
Timit preparation has already been moved to recipes/TIMIT
so the same needs to happen for LibriSpeech and VoxCeleb.
Note that in order to use the data preparation script inside the experiment.py
, the path to the preparation has to be added. Example from TIMIT:
# This hack needed to import data preparation script from ..
current_dir = os.path.dirname(os.path.abspath(__file__))
sys.path.append(os.path.dirname(current_dir))
from timit_prepare import TIMITPreparer # noqa E402
1- Check if all the project is now jitable
2- Do the little changes to features and archictecture.py to make everything jitable
I will focus on that once the performance issue is fixed.
The current config file format can be a little difficult to understand. More flexible and readable might be a new format using YAML for hyperparameters and Python for scripts.
Training time for each epoch is an important thing that we might want to save. As far as I can see, in the current version this information is lost (but could be really useful is users want to compare the training time of different models). We can add a field called training time (e.g, tr_time= 207 sec) in the training logger.
In all the current minimal examples, we check the final PER on the test set rather than on the training set. To be a significant test, it should be done on the training loss (test loss can be arbitrary high, but the training one should be very small because we can "memorize" the data within the neural network).
While this may be too difficult to implement in a transparent way, it would be really cool to have a training/test loop abstraction, similar to keras' fit and predict methods. One way this could be done is to add train_step
, fit
and predict
methods to the sequential
class (or a similar class such as lobe
or brain
. These could be overridden by sub-classes or by just replacing the function using self.function = newfunction
Some advantages: we can include just one dataset preparation script, as well as a single experiment.py
file per task.
We could integrate support for tensorboard so that its easy to review training progress.
Change README to reflect new configuration format.
It could be nice to ensure that poor quality YAML does not get added by adding a yaml linter to the CI. Some software we could consider using: https://github.com/adrienverge/yamllint
Now the implementation is iterating through each sentence and finding the actual length for each one.
Using a mask to do it could be more elegant and efficient.
example:
mask = length_to_mask(lengths, max_len=target.shape[1])
loss = cost(prob, lab) # without reduction
loss = torch.sum(loss * mask) / torch.sum(mask)
The padding mode of "reflect" is used regardless of the value for argument padding_mode
for Conv
(which is especially surprising given the default is "zeros").
Hi Peter,
Is there any specific reason for changing directory structure to from recipes//// to recipes//// ?
Usually it looks good (sorted) when you see different tasks as soon as you enter the recipes/. If the reason in one unique data_prep.py file... then one can import it from some standard location. Also, data prep for one dataset may vary depending on the task.
I did some first experiments, and apparently the issue could be connected to jit. In fact, the significant slow down happens when using our custom rnn called ligru (that is the only module we compile):
model | pytorch1.5 | pytorch1.4
ligru-jit 2 min 51 sec 1min 52 sec
ligru-nojit 3 min 32 sec 3min 45 sec
In practice, for some reason, jit is much more effective in pytorch 1.4 rather than pytorch 1.5 (at least on our ligru). I tried to simplify the model (e.g, removing batch norm, dropout, bidirectional, etc) and the issue still appears.
In line 127.
self.kernel_size=tuple(self.kernel_size,)
This will not make it a tuple if self.kernel_size is an int.
It could be nice to ensure the documentation of argument lists matches the actual arguments.
One option is the hdf5 data type.
Another is super fast data loading from one of our collaborators (I've forgotten which one).
Currently, it is hard to know if overrides work correctly because it fails silently when the override does not match what is in a yaml file. We could potentially also add a flag that would turn off this error or convert it to a warning.
Some things that could happen:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.