Giter Club home page Giter Club logo

speechbrain's Introduction

SpeechBrain Logo

Typing SVG

| ๐Ÿ“˜ Tutorials | ๐ŸŒ Website | ๐Ÿ“š Documentation | ๐Ÿค Contributing | ๐Ÿค— HuggingFace | โ–ถ๏ธ YouTube | ๐Ÿฆ X |

GitHub Repo stars Please, help our community project. Star on GitHub!

Exciting News (January, 2024): Discover what is new in SpeechBrain 1.0 here!

๐Ÿ—ฃ๏ธ๐Ÿ’ฌ What SpeechBrain Offers

  • SpeechBrain is an open-source PyTorch toolkit that accelerates Conversational AI development, i.e., the technology behind speech assistants, chatbots, and large language models.

  • It is crafted for fast and easy creation of advanced technologies for Speech and Text Processing.

๐ŸŒ Vision

  • With the rise of deep learning, once-distant domains like speech processing and NLP are now very close. A well-designed neural network and large datasets are all you need.

  • We think it is now time for a holistic toolkit that, mimicking the human brain, jointly supports diverse technologies for complex Conversational AI systems.

  • This spans speech recognition, speaker recognition, speech enhancement, speech separation, language modeling, dialogue, and beyond.

๐Ÿ“š Training Recipes

  • We share over 200 competitive training recipes on more than 40 datasets supporting 20 speech and text processing tasks (see below).

  • We support both training from scratch and fine-tuning pretrained models such as Whisper, Wav2Vec2, WavLM, Hubert, GPT2, Llama2, and beyond. The models on HuggingFace can be easily plugged in and fine-tuned.

  • For any task, you train the model using these commands:

python train.py hparams/train.yaml
  • The hyperparameters are encapsulated in a YAML file, while the training process is orchestrated through a Python script.

  • We maintained a consistent code structure across different tasks.

  • For better replicability, training logs and checkpoints are hosted on Dropbox.

drawing Pretrained Models and Inference

  • Access over 100 pretrained models hosted on HuggingFace.
  • Each model comes with a user-friendly interface for seamless inference. For example, transcribing speech using a pretrained model requires just three lines of code:
from speechbrain.inference import EncoderDecoderASR

asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-conformer-transformerlm-librispeech", savedir="pretrained_models/asr-transformer-transformerlm-librispeech")
asr_model.transcribe_file("speechbrain/asr-conformer-transformerlm-librispeech/example.wav")

drawing Documentation

  • We are deeply dedicated to promoting inclusivity and education.
  • We have authored over 30 tutorials on Google Colab that not only describe how SpeechBrain works but also help users familiarize themselves with Conversational AI.
  • Every class or function has clear explanations and examples that you can run. Check out the documentation for more details ๐Ÿ“š.

๐ŸŽฏ Use Cases

  • ๐Ÿš€ Research Acceleration: Speeding up academic and industrial research. You can develop and integrate new models effortlessly, comparing their performance against our baselines.

  • โšก๏ธ Rapid Prototyping: Ideal for quick prototyping in time-sensitive projects.

  • ๐ŸŽ“ Educational Tool: SpeechBrain's simplicity makes it a valuable educational resource. It is used by institutions like Mila, Concordia University, Avignon University, and many others for student training.

๐Ÿš€ Quick Start

To get started with SpeechBrain, follow these simple steps:

๐Ÿ› ๏ธ Installation

Install via PyPI

  1. Install SpeechBrain using PyPI:

    pip install speechbrain
  2. Access SpeechBrain in your Python code:

    import speechbrain as sb

Install from GitHub

This installation is recommended for users who wish to conduct experiments and customize the toolkit according to their needs.

  1. Clone the GitHub repository and install the requirements:

    git clone https://github.com/speechbrain/speechbrain.git
    cd speechbrain
    pip install -r requirements.txt
    pip install --editable .
  2. Access SpeechBrain in your Python code:

    import speechbrain as sb

Any modifications made to the speechbrain package will be automatically reflected, thanks to the --editable flag.

โœ”๏ธ Test Installation

Ensure your installation is correct by running the following commands:

pytest tests
pytest --doctest-modules speechbrain

๐Ÿƒโ€โ™‚๏ธ Running an Experiment

In SpeechBrain, you can train a model for any task using the following steps:

cd recipes/<dataset>/<task>/
python experiment.py params.yaml

The results will be saved in the output_folder specified in the YAML file.

๐Ÿ“˜ Learning SpeechBrain

  • Website: Explore general information on the official website.

  • Tutorials: Start with basic tutorials covering fundamental functionalities. Find advanced tutorials and topics in the Tutorials menu on the SpeechBrain website.

  • Documentation: Detailed information on the SpeechBrain API, contribution guidelines, and code is available in the documentation.

๐Ÿ”ง Supported Technologies

  • SpeechBrain is a versatile framework designed for implementing a wide range of technologies within the field of Conversational AI.
  • It excels not only in individual task implementations but also in combining various technologies into complex pipelines.

๐ŸŽ™๏ธ Speech/Audio Processing

Tasks Datasets Technologies/Models
Speech Recognition AISHELL-1, CommonVoice, DVoice, KsponSpeech, LibriSpeech, MEDIA, RescueSpeech, Switchboard, TIMIT, Tedlium2, Voicebank CTC, Transducers, Transformers, Seq2Seq, Beamsearch techniques for CTC,seq2seq,transducers), Rescoring, Conformer, Branchformer, Hyperconformer, Kaldi2-FST
Speaker Recognition VoxCeleb ECAPA-TDNN, ResNET, Xvectors, PLDA, Score Normalization
Speech Separation WSJ0Mix, LibriMix, WHAM!, WHAMR!, Aishell1Mix, BinauralWSJ0Mix SepFormer, RESepFormer, SkiM, DualPath RNN, ConvTasNET
Speech Enhancement DNS, Voicebank SepFormer, MetricGAN, MetricGAN-U, SEGAN, spectral masking, time masking
Text-to-Speech LJSpeech, LibriTTS Tacotron2, Zero-Shot Multi-Speaker Tacotron2, FastSpeech2
Vocoding LJSpeech, LibriTTS HiFiGAN, DiffWave
Spoken Language Understanding MEDIA, SLURP, Fluent Speech Commands, Timers-and-Such Direct SLU, Decoupled SLU, Multistage SLU
Speech-to-Speech Translation CVSS Discrete Hubert, HiFiGAN, wav2vec2
Speech Translation Fisher CallHome (Spanish), IWSLT22(lowresource) wav2vec2
Emotion Classification IEMOCAP, ZaionEmotionDataset ECAPA-TDNN, wav2vec2, Emotion Diarization
Language Identification VoxLingua107, CommonLanguage ECAPA-TDNN
Voice Activity Detection LibriParty CRDNN
Sound Classification ESC50, UrbanSound CNN14, ECAPA-TDNN
Self-Supervised Learning CommonVoice, LibriSpeech wav2vec2
Interpretability ESC50 Learning-to-Interpret (L2I), Non-Negative Matrix Factorization (NMF), PIQ
Speech Generation AudioMNIST Diffusion, Latent Diffusion
Metric Learning REAL-M, Voicebank Blind SNR-Estimation, PESQ Learning
Alignment TIMIT CTC, Viterbi, Forward Forward
Diarization AMI ECAPA-TDNN, X-vectors, Spectral Clustering

๐Ÿ“ Text Processing

Tasks Datasets Technologies/Models
Language Modeling CommonVoice, LibriSpeech n-grams, RNNLM, TransformerLM
Response Generation MultiWOZ GPT2, Llama2
Grapheme-to-Phoneme LibriSpeech RNN, Transformer, Curriculum Learning, Homograph loss

๐Ÿ” Additional Features

SpeechBrain includes a range of native functionalities that enhance the development of Conversational AI technologies. Here are some examples:

  • Training Orchestration: The Brain class serves as a fully customizable tool for managing training and evaluation loops over data. It simplifies training loops while providing the flexibility to override any part of the process.

  • Hyperparameter Management: A YAML-based hyperparameter file specifies all hyperparameters, from individual numbers (e.g., learning rate) to complete objects (e.g., custom models). This elegant solution drastically simplifies the training script.

  • Dynamic Dataloader: Enables flexible and efficient data reading.

  • GPU Training: Supports single and multi-GPU training, including distributed training.

  • Dynamic Batching: On-the-fly dynamic batching enhances the efficient processing of variable-length signals.

  • Mixed-Precision Training: Accelerates training through mixed-precision techniques.

  • Efficient Data Reading: Reads large datasets efficiently from a shared Network File System (NFS) via WebDataset.

  • Hugging Face Integration: Interfaces seamlessly with HuggingFace for popular models such as wav2vec2 and Hubert.

  • Orion Integration: Interfaces with Orion for hyperparameter tuning.

  • Speech Augmentation Techniques: Includes SpecAugment, Noise, Reverberation, and more.

  • Data Preparation Scripts: Includes scripts for preparing data for supported datasets.

SpeechBrain is rapidly evolving, with ongoing efforts to support a growing array of technologies in the future.

๐Ÿ“Š Performance

  • SpeechBrain integrates a variety of technologies, including those that achieves competitive or state-of-the-art performance.

  • For a comprehensive overview of the achieved performance across different tasks, datasets, and technologies, please visit here.

๐Ÿ“œ License

  • SpeechBrain is released under the Apache License, version 2.0, a popular BSD-like license.
  • You are free to redistribute SpeechBrain for both free and commercial purposes, with the condition of retaining license headers. Unlike the GPL, the Apache License is not viral, meaning you are not obligated to release modifications to the source code.

๐Ÿ”ฎFuture Plans

We have ambitious plans for the future, with a focus on the following priorities:

  • Scale Up: Our aim is to provide comprehensive recipes and technologies for training massive models on extensive datasets.

  • Scale Down: While scaling up delivers unprecedented performance, we recognize the challenges of deploying large models in production scenarios. We are focusing on real-time, streamable, and small-footprint Conversational AI.

๐Ÿค Contributing

  • SpeechBrain is a community-driven project, led by a core team with the support of numerous international collaborators.
  • We welcome contributions and ideas from the community. For more information, check here.

๐Ÿ™ Sponsors

  • SpeechBrain is an academically driven project and relies on the passion and enthusiasm of its contributors.
  • As we cannot rely on the resources of a large company, we deeply appreciate any form of support, including donations or collaboration with the core team.
  • If you're interested in sponsoring SpeechBrain, please reach out to us at [email protected].
  • A heartfelt thank you to all our sponsors, including the current ones:

Image 1 ย  ย  Image 3 ย  ย  Image 4



Image 5 ย  ย  Image 2 ย  ย  Image 6



Image 7 ย  ย  Image 9 ย  ย  Image 8 ย  ย 

๐Ÿ“– Citing SpeechBrain

If you use SpeechBrain in your research or business, please cite it using the following BibTeX entry:

@misc{speechbrain,
  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and Franรงois Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}

speechbrain's People

Contributors

30stomercury avatar adel-moumen avatar aheba avatar anautsch avatar asumagic avatar benoitwang avatar ddavidebb avatar elenaras avatar flexthink avatar fpaissan avatar gaellelaperriere avatar gastron avatar jasonswfu avatar jerrygood0703 avatar jianyuanzhong avatar jim-hays-root avatar jjery2243542 avatar lorenlugosch avatar mravanelli avatar nauman-daw avatar poonehmousavi avatar popcornell avatar pplantinga avatar pradnya-git-dev avatar salah-zaiem avatar sangeet2020 avatar tparcollet avatar underdogliu avatar williamaris avatar ycemsubakan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speechbrain's Issues

Cannot run minimal examples from recipe/minimal_example/*

When I run for the experiment folder the minimal examples, it now fails for ASR_CTC, ASR_DNN_HMM, spk_ID (it only works for the autoencoder one). It complains about:

  File "/home/mirco/speechbrain_github/speechbrain/speechbrain/yaml.py", line 258, in deref
    raise ValueError('The reference "%s" is not valid' % ref)
ValueError: The reference "<output_folder>" is not valid

!ref <output_folder>

Update README

The README still describes the old CFG file architecture. The README should be updated.

I think first step would be to remove out-of-date info and add the latest development guidelines.

pytorch 1.5 much slower than 1.4

I did some first experiments, and apparently the issue could be connected to jit. In fact, the significant slow down happens when using our custom rnn called ligru (that is the only module we compile):
model | pytorch1.5 | pytorch1.4
ligru-jit 2 min 51 sec 1min 52 sec
ligru-nojit 3 min 32 sec 3min 45 sec

In practice, for some reason, jit is much more effective in pytorch 1.4 rather than pytorch 1.5 (at least on our ligru). I tried to simplify the model (e.g, removing batch norm, dropout, bidirectional, etc) and the issue still appears.

Pooling Doctest fails

When the doctest for pooling is run, it fails with the following message:

 __________________ [doctest] speechbrain.nnet.pooling.Pooling __________________
035     ceil_mode : int
036         When True, will use ceil instead of floor to compute the output shape.
037 
038     Example
039     -------
040     >>> pool = Pooling('max',3)
041     >>> inputs = torch.rand(10, 50, 40)
042     >>> pool.init_params(inputs)
043     >>> output=pool(inputs)
044     >>> output.shape
Expected:
    torch.Size([10, 50, 38])
Got:
    torch.Size([10, 50, 13])

Tensorboard support

We could integrate support for tensorboard so that its easy to review training progress.

Easier access to training results (e.g, res.res)

More than one person pointed out the need for a file like res.res to better monitor training. This file should report the following information for each epoch:

epoch01 tr_loss valid_loss valid_err learning rate

I think that this can be easily implemented within the brain class.

padding_mode not respected in Conv

The padding mode of "reflect" is used regardless of the value for argument padding_mode for Conv (which is especially surprising given the default is "zeros").

Versioning

SpeechBrain should have a version number defined

More transparent additional symbols

As we discussed during the call, I think the current system is not very transparent when it manages the additional labels that might occur in CTC (blank) or attention-based models (e.g, EOS). I agree with Aku that this is connected with the dataloader and the creation of the label dictionary. We can thus revise this part as well if we want to revise the dataloader part.

Do not lint generated YAML files

Currently if an experiment has been run, the yaml linter chokes on generated yaml files

  • Fix generated params.yaml
  • Fix generated ckpt.yaml

This can be fixed by adding # yamllint disable at the top of these files.

More transparent parameter specification (e.g, n_channels)

In /lobes/models/cnn_block.yaml we use the following hack to specify the number of channels in the cnn layers:
out_channels: !ref <block_index> * <channels>

This is not very general and people might want a more flexible way to specify the number of channels (think about hyperparameter tuning for instance). I'm wondering if it is possible to be more explicit and allow the new replication function to digest parameter lists like this:
n_channels = [128, 256]
This way users can select different parameters for each different replica. @pplantinga, @Gastron any thought on that?

Making features and neural networks jitable

1- Check if all the project is now jitable
2- Do the little changes to features and archictecture.py to make everything jitable
I will focus on that once the performance issue is fixed.

Recipes directory structure

Hi Peter,
Is there any specific reason for changing directory structure to from recipes//// to recipes//// ?

Usually it looks good (sorted) when you see different tasks as soon as you enter the recipes/. If the reason in one unique data_prep.py file... then one can import it from some standard location. Also, data prep for one dataset may vary depending on the task.

training/test loop abstraction

While this may be too difficult to implement in a transparent way, it would be really cool to have a training/test loop abstraction, similar to keras' fit and predict methods. One way this could be done is to add train_step, fit and predict methods to the sequential class (or a similar class such as lobe or brain. These could be overridden by sub-classes or by just replacing the function using self.function = newfunction

save_pkl() missing on TIMIT data preparer

Hey !

the file save_opt isn't created in the current TIMIT preparer. Is it voluntary? If yes, then we have to remove it from the skip check, otherwise it won't skip.

README update

Change README to reflect new configuration format.

The minimal example for CTC

  1. Now the param.yaml is using spk_id as label. This is weird.
  2. In my opinion, the minimal example could use training data to validation and test so that we could know whether it is overfitting or not.

transparency of CRDNN

I think it is a little bit not so modular and transparent using this module. Maybe we could define it in the recipe?

Load YAML overrides using `load_extended_yaml`

There are some things that are currently not possible to override due to the command-line overrides being loaded using yaml.safe_load() rather than load_extended_yaml(). For example:

activation: !torch.nn.LeakyReLU

may be nice to override with a different activation. This can be overridden in python just fine, but not from the command line, e.g.

python experiment.py params.yaml --yaml_overrides="{activation: !torch.nn.ReLU}"

Move LibriSpeech and VoxCeleb preparation to recipes dir

Timit preparation has already been moved to recipes/TIMIT so the same needs to happen for LibriSpeech and VoxCeleb.

Note that in order to use the data preparation script inside the experiment.py, the path to the preparation has to be added. Example from TIMIT:

# This hack needed to import data preparation script from ..
current_dir = os.path.dirname(os.path.abspath(__file__))
sys.path.append(os.path.dirname(current_dir))
from timit_prepare import TIMITPreparer  # noqa E402

Ensure experiments can be run from anywhere

Will also help with making sure SpeechBrain can be used via pip install aka in toolkit fashion rather than in framework fashion.

Necessary steps to make this happen:

  • make sure log_config.yaml is not expected to be in a certain place
  • add the params file as an argument, so we don't depend on cwd.

Wrong Overfitting Tests

In all the current minimal examples, we check the final PER on the test set rather than on the training set. To be a significant test, it should be done on the training loss (test loss can be arbitrary high, but the training one should be very small because we can "memorize" the data within the neural network).

Convert Speechbrain classes to new format

Checklist:

  • Features
  • Augmentation
  • Architectures
  • Losses
  • Optimizers
  • Data io
  • Data processing
  • Utils

Steps listed in proposal:

  1. Class name change: uppercase the name of the class (CapWords for multi-word)
  2. Documentation changes:
  • Remove parameters: config (but not sub-parameters), funct_name, global_config, functions, logger, first_input, and move arguments to init doc
  • Match documentation format to follow โ€œnumpy styleโ€ (example on next page): https://www.sphinx-doc.org/en/master/usage/extensions/example_numpy.html
  • Docstring should have the following sections: Arguments, Example, Returns or Yields (if just returns None or docstring starts with โ€œReturnsโ€, this section can be omitted). The docstring should start with a one-line description. An additional section that may be added: Hyperparameters (for lobes, with an include statement so the yaml parameters are visible).
  • Convert example to doctest-type example and ensure it is runnable with:
    python -m doctest speechbrain/path/to/file.py
    Doctest tests that the output of the example is the same as what you write, so you may need to write out the output of the example. You can also use e.g. an assertion:
    >>> assert func(tensor([1.])) == 7.
    Which can get around tricky output formats from PyTorch, but still shows the behaviour and if the assert fails, doctest complains.
    If you need data or directories, you can use the sample data in the samples directory, or you can make temporary directories with the standard library tempfile module.
  • Run the automatic API documentation and make sure your docstring is parsed correctly. Particularly the Args section may get interpreted wrong easily. To test, run:
    pdoc --html --template_dir pdoc_templates
    speechbrain.<module-youโ€™re-working-on>
  1. Parameter changes:
  • Replace โ€˜configโ€™ parameter with actual parameters + defaults
  • Remove parameters: funct_name, global_config, functions, logger, first_input
  1. __init__ changes:
  • Remove type checking (i.e. expected_options and expected_inputs)
  • Move code depending on first_input (excluding shape check) to a method:
    def init_params(self, first_input):
  1. Forward changes
  • Convert input list to separate parameters
  • Add docstring with Parameters and Returns sections (and NO DESCRIPTION)
  1. Logger changes
  • Logger calls at the level of โ€œerrorโ€ or above (this is default) should be converted to raise statements. Pick a built-in error that seems appropriate (ValueError is common). These statements will automatically be logged.
  • If any logging statements remain in the file (at the level of โ€œwarnโ€ or โ€œinfoโ€ or โ€œdebugโ€), converting them involves two steps:
  1. At the top of the file, ensure logging is imported, and at the end of the imports, add the following line to define the logger for the module:
    logger = logging.getLogger(__name__)
  2. Every time logger_write() is called, convert to
    logger.<level>(message)
    logger.info() should be used for output to the console (rare)
    logger.debug() should be used for output to the log file (common)

Adding Training Time in training logger

Training time for each epoch is an important thing that we might want to save. As far as I can see, in the current version this information is lost (but could be really useful is users want to compare the training time of different models). We can add a field called training time (e.g, tr_time= 207 sec) in the training logger.

Adding averaged loss or current loss on tzip/tqdm.

I think it is important to also report a running averaged loss or current mini-batch loss on the progress bar. For instance, I'm dealing with 1000+ hours of training data, one epoch takes more than 4 hours. I definitely don't want to wait 4H to debug the fact that the loss went to NaN :P What do you think?

Proper CUDNN RNN initialisation

While discussing with @jjery2243542 we probably linked a problem of training convergence of GRU models to a bad initialisation of the weights. Indeed, PyTorch RNNs (LSTM,GRU,RNN) are initialised without respect to Glorot / He criterion (uniform with 1/n). LiGRU do not need that thanks to ReLU + BatchNorm ( alleviate saturation) but standard tanh-based RNN might saturate with an increasing number of neurons (@jjery2243542 might be experiencing this for librispeech).

We could add an init function that applies glorot on input to hidden and orthogonal init on hidden to hidden (+ set all the biases to zero).

Rename `logging.yaml`

Perhaps a better name could be used for this file to indicate that it does configuration of the logger, not actual logging.

linear.py doesn't allow 3D tensor input

As title. It looks like there is an unnecessary transposing. Also, it will reshape back to 4D tensor if the input is a 4D tensor. It seems also unnecessary, and the reshape could be changed.

Add support for fast data types

One option is the hdf5 data type.
Another is super fast data loading from one of our collaborators (I've forgotten which one).

Improve config file format

The current config file format can be a little difficult to understand. More flexible and readable might be a new format using YAML for hyperparameters and Python for scripts.

Data_preparation in two places?

Hey,

What are the motivations to have both a .py for the data preparation on each recipe's directory and in the data_prepare.py lib?

ComputeCost could not scale well as more losses are added

Maybe I am missing something, but right now it seems to me that we have to manually add every new loss function to ComputeCost init with an if statement.
In this way it could grow excessively large and will be cumbersome to maintain.

Maybe it could be turned into a wrapper. For most loss function basically we will need only the zero padding masking functionality only. Otherwise one will have to add BCE, BCEwithlogits, dice_loss etc etc.

Convert all recipes to new format

List of things to convert:

  • neural nets
  • augmentation
  • data prep
  • multichannel
  • features?
  • data_reading?
  • scoring?

Instructions from proposal document:

  1. Copy experiment xxx.cfg files to corresponding directory in recipes
  2. Move [global] section to a yaml file (e.g. params.yaml), rename to constants:
  3. Move each element of [functions] section to the yaml file
  • Convert all = to :
  • Remove final [\endtag]
  1. Split functions: into saveables: and functions:
  2. For most models (especially ones with replicate parts), move all model code to a model.py file. Define a new subclass of torch.nn.Module that takes all key model parameters (e.g. number of layers, etc.) and use these parameters to build the model.
  3. Move all code in cfg hierarchy computation sections to an โ€˜experiment.py` python file
  4. At top of experiment.py, instantiate an Experiment object and pass:
  • Params file object
  • Command line parameters (i.e. sys.argv[1:])
  1. When execute_computations would be called, instead:
  • Create a dataloader if necessary
  • Add a loop to code if necessary

Suggestion: using a mask to do avoid_pad in losses.py

Now the implementation is iterating through each sentence and finding the actual length for each one.
Using a mask to do it could be more elegant and efficient.

example:
mask = length_to_mask(lengths, max_len=target.shape[1])
loss = cost(prob, lab) # without reduction
loss = torch.sum(loss * mask) / torch.sum(mask)

Replicability Issue

Every time I run the same experiment (e.,g minimal_examples/neural_networks/autoencoder) on the same machine I got slightly different results. Since we set up the seed, this shouldn't happen.

N-gram Language modeling

  • Load ARPA models
  • Ngram probability interface
  • Compute perplexity (and validate with other existing tools)
  • Small scale text data loading (RAM)
  • Large text data loading
  • Modified Kneser-Ney estimation
  • LM saving (ARPA format)

GPU memory increases and then goes back to a low level

I noticed that the GPU memory consumption will gradually increase to the maximum and then drop back to normal and then increase again (3G->12G->3G on my computer). Probably due to data_loader caching, it might not be a problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.