funcwj / deep-clustering Goto Github PK

View Code? Open in Web Editor NEW

108.0 6.0 35.0 24 KB

deep clustering method for single-channel speech separation

Python 91.37% Shell 5.06% MATLAB 3.57%

speech-separation pytorch

deep-clustering's Introduction

Deep clustering for single-channel speech separation

Implement of "Deep Clustering Discriminative Embeddings for Segmentation and Separation"

Requirements

see requirements.txt

Usage

Configure experiments in .yaml files, for example: train.yaml

Training:

python ./train_dcnet.py --config conf/train.yaml --num-epoches 20 > train.log 2>&1 &

Inference:

python ./separate.py --num-spks 2 $mdl_dir/train.yaml $mdl_dir/final.pkl egs.scp

Experiments

Configure	Epoch	FM	FF	MM	FF/MM	AVG
config-1	25	11.42	6.85	7.88	7.36	9.54

Q & A

The format of the .scp file?

The format of the wav.scp file follows the definition in kaldi toolkit. Each line contains a key value pair, where key is a unique string to index audio file and the value is the path of the file. For example
```
mix-utt-00001 /home/data/train/mix-utt-00001.wav
...
mix-utt-XXXXX /home/data/train/mix-utt-XXXXX.wav
```
How to prepare training dataset?

Original paper use MATLAB scripts from create-speaker-mixtures.zip to simulate two- and three-speaker dataset. You can use you own data source (egs: Librispeech, TIMIT) and create mixtures, keeping clean sources at meanwhile.

Reference

Hershey J R, Chen Z, Le Roux J, et al. Deep clustering: Discriminative embeddings for segmentation and separation[C]//Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016: 31-35.
Isik Y, Roux J L, Chen Z, et al. Single-channel multi-speaker separation using deep clustering[J]. arXiv preprint arXiv:1607.02173, 2016.

deep-clustering's People

Contributors

Stargazers

Watchers

deep-clustering's Issues

TypeError: '<=' not supported between instances of 'float' and 'str'

嗨，您好，可以麻烦您再帮我看下两个问题么：

"cmvn.dict"是只需要计算训练集"tr"下的mixture还是"tr/cv/tt"三个文件夹的mixture一起计算？
就是题目所示的bug，pyTorch的版本我跟您的保持一致0.4.0，升级到最新版也是同样的错误，完整的报错如下：

2019-04-27 22:48:06,634 [/home/MyCode/DeepClustering/funcwj_deep-clustering/trainer.py:43 - INFO ] DCNet:
DCNet(
  (rnn): LSTM(129, 600, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
  (drops): Dropout(p=0.5)
  (embed): Linear(in_features=1200, out_features=2580, bias=True)
)
Traceback (most recent call last):
  File "./train_dcnet.py", line 80, in <module>
    train(args)
  File "./train_dcnet.py", line 54, in train
    trainer = Trainer(dcnet, **config_dict["trainer"])
  File "/home/MyCode/DeepClustering/funcwj_deep-clustering/trainer.py", line 49, in __init__
    weight_decay=weight_decay)
  File "/home/MyCode/DeepClustering/funcwj_deep-clustering/trainer.py", line 27, in create_optimizer
    opt = supported_optimizer[optimizer](params, **kwargs)
  File "/home/anaconda3/lib/python3.6/site-packages/torch/optim/rmsprop.py", line 29, in __init__
    if not 0.0 <= lr:
TypeError: '<=' not supported between instances of 'float' and 'str'

KeyError: 'Missing targets or mixture'

作者，您好，现在.scp文件准备好了，混合也做好了，然后运行时，遇到了标题所示的错误，您可以解答一下吗？下面两张图是我自己的.scp文件的格式。

Array size mismatch during validation (dataset.py line 197)

Hello and thank you for providing the source code!

For me, it always crashes during validation:

2020-03-12 14:55:30,998 [/home/me/deep-clustering/trainer.py:85 - INFO ] Evaluating...
Traceback (most recent call last):
  File "./train_dcnet.py", line 90, in <module>
    train(args)
  File "./train_dcnet.py", line 65, in train
    trainer.run(train_loader, valid_loader, num_epoches=args.num_epoches)
  File "/home/me/deep-clustering/trainer.py", line 102, in run
    init_loss, _ = self.validate(dev_set)
  File "/home/me/deep-clustering/trainer.py", line 90, in validate
    for mix_spect, tgt_index, vad_masks in dataloader:
  File "/home/me/deep-clustering/dataset.py", line 239, in __iter__
    yield self._process(index)
  File "/home/me/deep-clustering/dataset.py", line 219, in _process
    data_dict = self._transform(s, t)
  File "/home/me/deep-clustering/dataset.py", line 197, in _transform
    target_attr = np.argmax(np.array(targets_specs_list), 0)
  File "/home/me/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 963, in argmax
    return _wrapfunc(a, 'argmax', axis=axis, out=out)
  File "/home/me/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
ValueError: operands could not be broadcast together with shapes (680,129) (356,129)

The type of mismatch (here: 680 and 356) depends on the sample set used, but there is always a mismatch that causes the crash.

What could be the reason?

Thank you!

funcwj 您好：
我有段代码不是很理解，希望您能帮忙解答下：
def form_mask(classes, spkid, vad_mask):
mask = ~vad_mask
# mask = np.zeros_like(vad_mask)
mask[vad_mask] = (classes == spkid)
return mask
我想问题mask[vad_mask]这个举证可以再取矩阵吗

数据集

作者，您好，如果用TIMIT数据集，音频的混合以及.scp的生成有可以参考的资料吗？

Have you implemented the deep attractor network?

Hi, Wu Jian, have you implemented the Deep Attractor Network (DANet)?

训练数据集怎么准备？

您好，非常感谢您的分享代码。有个问题想请教您，就是在Inference命令中“$mdl_dir”和“egs.scp”分别代表什么意思呢？还有就是在“train.yaml”文件中我没看到有关于test的数据地址？