Giter Club home page Giter Club logo

conv-tasnet's Introduction

ConvTasNet

A PyTorch implementation of the TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation

Requirements

see requirements.txt

Usage

./nnet/separate.py /path/to/checkpoint --input /path/to/mix.scp --gpu 0 > separate.log 2>&1 &
  • evaluate
./nnet/compute_si_snr.py /path/to/ref_spk1.scp,/path/to/ref_spk2.scp /path/to/inf_spk1.scp,/path/to/inf_spk2.scp

Result (on best configuratures in the paper)

ID Settings Causal Norm Param Loss Si-SDR
0 adam/lr:1e-3/wd:1e-5/32-batch/2gpu N BN/relu 8.75M -17.59/-15.45 14.63
1 adam/lr:1e-2/wd:1e-5/20-batch/2gpu N gLN/relu - -16.09/-15.21 14.58
2 adam/lr:1e-3/wd:1e-5/20-batch/2gpu N gLN/relu - -17.91/-16.54 15.87
3 adam/lr:1e-2/wd:1e-5/32-batch/2gpu N BN/sigmoid - -14.51/-13.40 12.62
4 adam/lr:1e-2/wd:1e-5/32-batch/2gpu N BN/relu - -17.20/-15.38 14.58
5 adam/lr:1e-3/wd:1e-5/20-batch/2gpu N gLN/sigmoid - -17.20/-16.11 15.55
6 adam/lr:1e-3/wd:1e-5/32-batch/2gpu Y BN/relu - -15.25/-12.47 11.42
7 adam/lr:1e-3/wd:1e-5/24-batch/2gpu N cLN/relu - -18.72/-16.17 15.25

Reference

Luo Y, Mesgarani N. TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation[J]. arXiv preprint arXiv:1809.07454, 2018.

conv-tasnet's People

Contributors

funcwj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

conv-tasnet's Issues

GPU Specs

Hi, Thanks for the repo! what is your GPU specs. I am not able to fit 32 or 20 batches in 2 GPUs. I use a TitanX 12 GB and Tesla k40 12GB. Also can you share your loss curve?

Specify on which split is each loss in the "Result" table (README.md)

Hi,
Thanks for sharing your implementation. To understand the performance of your system, could you precise on which sets are computed the columns "Loss" and "SI-SDR" in the table "Result" or your README? Is it train/valid/test? For example, my guess for line 0 of the table would be train_si_sdr=17.59, valid_si_sdr = 15.45 and test_si_sdr=14.63, is it right?

Pertained model

Hey,
Thank you for the implementation!
Can you please share a pre-trained model?

raise KeyError("Missing utterance {}!".format(index)) KeyError: 'Missing utterance clnsp1!'

Traceback (most recent call last):
File "train.py", line 86, in
run(args)
File "train.py", line 50, in run
trainer.run(train_loader, dev_loader, num_epochs=args.epochs)
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/nnet/libs/trainer.py", line 215, in run
cv = self.eval(dev_loader)
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/nnet/libs/trainer.py", line 203, in eval
for egs in data_loader:
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/nnet/libs/dataset.py", line 143, in iter
for chunks in self.eg_loader:
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/lib64/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/lib64/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/lib64/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/lib64/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/lib64/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/lib64/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/lib64/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/nnet/libs/dataset.py", line 42, in getitem
ref = [reader[key] for reader in self.ref]
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/nnet/libs/dataset.py", line 42, in
ref = [reader[key] for reader in self.ref]
File "/hardmnt/moissan0/home/mnabih/piccadilly0/home/mnabih/PycharmProjects/Co/nnet/libs/audio.py", line 119, in getitem
raise KeyError("Missing utterance {}!".format(index))
KeyError: 'Missing utterance clnsp10!'

How to do multiprocessing in dataloader?

This code really helps. However, when the batch size is large, loading data becomes a bottleneck for training. There's no implementation of multi worker in dataloader. Hope to solve this problem. Any suggestions about this issue?

Softmax

Hi, i see u have tabulated all results based on Relu or Sigmoid as the non-linear layer. Did you try using softmax?If yes, how were the results? The paper architecture has softmax layer after we get the masks.

The scale of the output (the predicted speech) is not consistent with the input scale (the input mixture).

Hi, thanks for sharing your code. I have a question in seperate.py, specifically about the code below
def run(args): mix_input = WaveReader(args.input, sample_rate=args.fs) computer = NnetComputer(args.checkpoint, args.gpu) for key, mix_samps in mix_input: logger.info("Compute on utterance {}...".format(key)) spks = computer.compute(mix_samps) norm = np.linalg.norm(mix_samps, np.inf) for idx, samps in enumerate(spks): samps = samps[:mix_samps.size] samps = samps * norm / np.max(np.abs(samps)) write_wav( os.path.join(args.dump_dir, "spk{}/{}.wav".format( idx + 1, key)), samps, fs=args.fs)

I found the separated speeches are not in the same energy level as that of the mixture, though the performance is really great. I debug the code and surprisingly found that max(mix_samps) is around 1, while max(samps) is around 200 before doing norm.

I don't think we should do norm here, since this changes the separated speech energy level.
samps = samps * norm / np.max(np.abs(samps))
However, without normalization, as I said, max(samps) is around 200. Do you know why? And how to solve this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.