Giter Club home page Giter Club logo

deep-clustering's Introduction

Deep clustering for single-channel speech separation

Implement of "Deep Clustering Discriminative Embeddings for Segmentation and Separation"

Requirements

see requirements.txt

Usage

  1. Configure experiments in .yaml files, for example: train.yaml

  2. Training:

    python ./train_dcnet.py --config conf/train.yaml --num-epoches 20 > train.log 2>&1 &
  3. Inference:

    python ./separate.py --num-spks 2 $mdl_dir/train.yaml $mdl_dir/final.pkl egs.scp
    

Experiments

Configure Epoch FM FF MM FF/MM AVG
config-1 25 11.42 6.85 7.88 7.36 9.54

Q & A

  1. The format of the .scp file?

    The format of the wav.scp file follows the definition in kaldi toolkit. Each line contains a key value pair, where key is a unique string to index audio file and the value is the path of the file. For example

    mix-utt-00001 /home/data/train/mix-utt-00001.wav
    ...
    mix-utt-XXXXX /home/data/train/mix-utt-XXXXX.wav
    
  2. How to prepare training dataset?

    Original paper use MATLAB scripts from create-speaker-mixtures.zip to simulate two- and three-speaker dataset. You can use you own data source (egs: Librispeech, TIMIT) and create mixtures, keeping clean sources at meanwhile.

Reference

  1. Hershey J R, Chen Z, Le Roux J, et al. Deep clustering: Discriminative embeddings for segmentation and separation[C]//Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016: 31-35.
  2. Isik Y, Roux J L, Chen Z, et al. Single-channel multi-speaker separation using deep clustering[J]. arXiv preprint arXiv:1607.02173, 2016.

deep-clustering's People

Contributors

dependabot[bot] avatar funcwj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

deep-clustering's Issues

TypeError: '<=' not supported between instances of 'float' and 'str'

嗨,您好,可以麻烦您再帮我看下两个问题么:

  1. "cmvn.dict"是只需要计算训练集"tr"下的mixture还是"tr/cv/tt"三个文件夹的mixture一起计算?
  2. 就是题目所示的bug,pyTorch的版本我跟您的保持一致0.4.0,升级到最新版也是同样的错误,完整的报错如下:
2019-04-27 22:48:06,634 [/home/MyCode/DeepClustering/funcwj_deep-clustering/trainer.py:43 - INFO ] DCNet:
DCNet(
  (rnn): LSTM(129, 600, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
  (drops): Dropout(p=0.5)
  (embed): Linear(in_features=1200, out_features=2580, bias=True)
)
Traceback (most recent call last):
  File "./train_dcnet.py", line 80, in <module>
    train(args)
  File "./train_dcnet.py", line 54, in train
    trainer = Trainer(dcnet, **config_dict["trainer"])
  File "/home/MyCode/DeepClustering/funcwj_deep-clustering/trainer.py", line 49, in __init__
    weight_decay=weight_decay)
  File "/home/MyCode/DeepClustering/funcwj_deep-clustering/trainer.py", line 27, in create_optimizer
    opt = supported_optimizer[optimizer](params, **kwargs)
  File "/home/anaconda3/lib/python3.6/site-packages/torch/optim/rmsprop.py", line 29, in __init__
    if not 0.0 <= lr:
TypeError: '<=' not supported between instances of 'float' and 'str'

KeyError: 'Missing targets or mixture'

作者,您好,现在.scp文件准备好了,混合也做好了,然后运行时,遇到了标题所示的错误,您可以解答一下吗?下面两张图是我自己 的.scp文件的格式。
PL1)Q`C(UD 0B RPA6HKTFH
Q0U(1Q)T_XM8EM%4836M504

Array size mismatch during validation (dataset.py line 197)

Hello and thank you for providing the source code!

For me, it always crashes during validation:

2020-03-12 14:55:30,998 [/home/me/deep-clustering/trainer.py:85 - INFO ] Evaluating...
Traceback (most recent call last):
  File "./train_dcnet.py", line 90, in <module>
    train(args)
  File "./train_dcnet.py", line 65, in train
    trainer.run(train_loader, valid_loader, num_epoches=args.num_epoches)
  File "/home/me/deep-clustering/trainer.py", line 102, in run
    init_loss, _ = self.validate(dev_set)
  File "/home/me/deep-clustering/trainer.py", line 90, in validate
    for mix_spect, tgt_index, vad_masks in dataloader:
  File "/home/me/deep-clustering/dataset.py", line 239, in __iter__
    yield self._process(index)
  File "/home/me/deep-clustering/dataset.py", line 219, in _process
    data_dict = self._transform(s, t)
  File "/home/me/deep-clustering/dataset.py", line 197, in _transform
    target_attr = np.argmax(np.array(targets_specs_list), 0)
  File "/home/me/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 963, in argmax
    return _wrapfunc(a, 'argmax', axis=axis, out=out)
  File "/home/me/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
ValueError: operands could not be broadcast together with shapes (680,129) (356,129)

The type of mismatch (here: 680 and 356) depends on the sample set used, but there is always a mismatch that causes the crash.

What could be the reason?

Thank you!

聚类如何实现的?

funcwj 您好:
我有段代码不是很理解,希望您能帮忙解答下:
def form_mask(classes, spkid, vad_mask):
mask = ~vad_mask
# mask = np.zeros_like(vad_mask)
mask[vad_mask] = (classes == spkid)
return mask
我想问题mask[vad_mask]这个举证可以再取矩阵吗

数据集

作者,您好,如果用TIMIT数据集,音频的混合以及.scp的生成有可以参考的资料吗?

训练数据集怎么准备?

您好,非常感谢您的分享代码。有个问题想请教您,就是在Inference命令中“$mdl_dir”和“egs.scp”分别代表什么意思呢?还有就是在“train.yaml”文件中我没看到有关于test的数据地址?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.