snowdar / asv-subtools Goto Github PK
View Code? Open in Web Editor NEWAn Open Source Tools for Speaker Recognition
License: Apache License 2.0
An Open Source Tools for Speaker Recognition
License: Apache License 2.0
有些任务只需要使用embedding而不关心具体的细节。是否可以麻烦作者提供一些已经训练好的模型供使用??
以及是否有建立的微信群,可以方便大家更及时的讨论和反馈??
感谢作者。非常出色的工作。
Hi thanks again for developing such amazing toolkit. I am now looking at the PLDA backend work, including the report from Jiafeng.
I do find very useful information about CORAL, which was originated from here. But that is for feature-based CORAL. In your implementation I believe it is model-based CORAL on PLDA. Do we have any external reference about model-based CORAL?
Thanks in advance.
Hi,
I'm trying to train my model based on recipe/voxcelebSRC, but I had a problem at the scoring stage.
In gather_results_from_epochs.sh
, $enroll_cohort_name.score
equals:
cosine_voxceleb1_O_enroll_voxceleb2_dev_submean_norm_voxceleb2_dev.score
But the generated score file appears to be:
cosine_voxceleb1_O_enroll_voxceleb2_devspk_xvector_submean_norm_mean_voxceleb2_dev.score
Looks like get_params_for_score()
in score.sh
failed to generate the correct $suffix
:
final_file=spk_xvector_submean_norm_mean.ark
input_name=xvector
suffix=$(echo ${final_file%.*} | sed 's/^'"$inputname"'//g;'s/spk_xvector_mean//g'')
It only removes continuous spk_xvector_mean
but is not functional for spk_xvector***_mean
.
Best regards,
Ya-Qi Yu
Hi,
I try to run the CNCeleb recipe, but a RuntimeError appears:
#### Training will run for 6 epochs.
Traceback (most recent call last):
File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/training/trainer.py", line 283, in run
loss, acc = self.train_one_batch(batch)
File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/training/trainer.py", line 182, in train_one_batch
loss = model.get_loss(model_forward(inputs), targets)
File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/support/utils.py", line 157, in wrapper
return function(self, *transformed)
File "/home/ubuntu/kaldi/egs/xmuspeech/sre/exp/SEResnet34_am_train_fbank40/config/resnet-se-xvector.py", line 559, in get_loss
return self.loss(inputs, targets)
File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/kaldi/egs/xmuspeech/sre/subtools/pytorch/libs/nnet/loss.py", line 360, in forward
return self.loss_function(outputs/self.t, targets) + self.ring_loss * ring_loss
File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1150, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/home/ubuntu/miniconda3/envs/subtools/lib/python3.8/site-packages/torch/nn/functional.py", line 2846, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/opt/conda/conda-bld/pytorch_1634272172048/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [0,0,0], thread: [55,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
That means the num_speakers output of FC classifier is less than the label.
And I find the num_targets in exp/egs/train_sequential/info
is 2687, while the max label in train.egs.csv is 2711.
So could you please tell me which script generates the exp/egs/train_sequential/info/num_targets
?
Hi,
I want to validate the recipe "ap-olr2020-baseline". I have some issues at scoring.
I want to know which file is used for language file, utt2lang or utt2spk?
Regards,
Luke
使用dataloader 多线程,但是多线程处于D状态,CPU读取数据慢,导师GPU util 为0, 请问这种情况怎么解决
Hi @Snowdar ,
The recipe asv-subtools/recipe/ap-olr2020-baseline is designed for language recognition tasks.
So at the data preparation stage, should I put the language label in the utt2spk file or in the utt2lang file?
I am new to language recognition, so I am litter confused about the above codes.
So what file I should use as input for subtools/getTrials.sh to generate the trials file?
Thanks
subtools/pytorch/libs/supports/utils.py line 319
这行中的方法的作用是将传入的字典与默认字典做对比,将传入字典与默认的不同的部分赋值给默认字典,然后返回默认字典。但是如果这个方法的最后两个布尔参数均为false,那么如果你传入的字典中包含了默认字典中不存在的键值,则既不会报错,也不会将不存在的键值赋值给默认字典并返回。那么你将得到一个只包含默认字段的字典,而你定义的新字段(可能是你修改了一些模型或方法,增加定义了一些新变量),将采用你在方法或模型中设定好的默认值。而你在主程序,如run****.py中定义的参数,则不会传入那些方法或模型中。
At least one of this function's last two variables which are 'force_check' and 'support_unkown' should be true. So that the function can raise an error or refine the new parameters of the individual definition for the default dictionary which will be return.
Or, the params you defined in the main program such as run****.py will be not passed into the methods or classes.
希望作者能看到这条问题,并对自己的程序作出修改。
您好,请问一下您的baseline里面包含可以直接输入音频单个音频直接可以解码出识别结果的脚本吗?如果有的话可以告知一下是哪个嘛?如果没有的话希望您能提供一下大致的流程和思路,非常感谢!
Xuran
你好!
我看olr2021-baseline 语种分类用了xvector embedding + LDA + LR 的方法,但是xvector在训练的时候用 softmax 输出每个语种的概率 计算CE进行训练的。为什么在inference的时候不直接用xvector 的 softmax的输出?
谢谢!
Hi @Snowdar
Since the inputs are of shape [batch, frequency, time], this line
asv-subtools/pytorch/libs/nnet/dropout.py
Line 217 in 1ea9894
inputs[:,f_0:f_0+f,:].fill_(0.)
Is that correct?
Thanks
Junjie
您好,
感谢您百忙之中查看我的邮件。
在kaldi中,训练神经网络之前会将特征生成用于训练网络的egs特征,这个值默认为[200,400]之间的随机数。
我的理解是这个将原始训练特征,切分为随机取值的过程,是为了增强网络对不同长度音频xvector提取的鲁棒性。
在您的asv-subtools中,好像是min-chunk直接设置的固定值。也就是用于训练的egs大小固定。
请问:
1.egs的选取(包括随机取值范围,是否是固定值)对网络性能有何影响?
2.在asv-subtool中,为何采用固定的min-chunk大小?能够更新一个类似于kaldi中可变egs大小的版本?
期待您的回信。
祝好!
hello XMU Speech Lab, Thank you so much for the great work you shared.
I wonder how to prepare our own data? Preparing wav.scp, utt2spk and spk2utt like Kaldi formats?
I couldn't get information about data preparing in README. Looking forward to answer.
best wishes
I try to train the standard xvector model on VoxCeleb1 trainset using the script runVoxceleb.sh
with 4 GPUs. And I completely use the default parameters in runStandardXvector-voxceleb1.py
except for the weight decay changed to 5e-1
(I also tried 3e-1
), but the result EER is only 3.531%
for 21 epoch far embedding with PLDA backend. Unable to achieve 3.028%
reported at the bottom of runStandardXvector-voxceleb1.py
. Is there something I overlooked or what I need to modify?
如题
作者你好,最近看到SDK封装方面的文档,但未找到打分模块相关代码,能否开源一下,谢谢!
When running Voxceleb Recipe [Speaker Recognition], I met the error as shown below. I am not sure where the codes in "runSnowdarXvector-extended-spec-am.py" wrong to make this type error. Thank you for your help!
(xmuspeech) tcao7@c06:~/kaldi/egs/xmuspeech/voxceleb1$ subtools/runPytorchLauncher.sh runSnowdarXvector-extended-spec-am.py --stage=0
Traceback (most recent call last):
File "runSnowdarXvector-extended-spec-am.py", line 282, in
utils.init_multi_gpu_training(args.gpu_id, args.multi_gpu_solution, args.port)
TypeError: init_multi_gpu_training() takes from 0 to 2 positional arguments but 3 were given
首先感谢xmuspeech的subtools工具~
请问一下,当使用命令 subtools/runPytorchLauncher.sh run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --stage=3 --endstage=3 ,也就是 python3 -m torch.distributed.launch --nproc_per_node=2 run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --port 2345 --stage=3 --endstage=3 时,出现如下warning和error,可能是环境还是哪里出现问题导致多卡初始化失败呢?
When we are training the baseline system, we are wondering what to use as the speaker information.
Can we ask that when you are training the ivector system, did you change spkid to langid in spk2utt and utt2spk file? Or just used the original spk info to train an UBM and i-vector extractor. Whatever the condition is, when we train the classifier, I think we should use languages as the labels, is there any problem if we use speaker info to train an i-vector extractor and classify the vector to some languages?
你好
我跑 recipe/voxceleb/runVoxceleb.sh 这个例子时,需要这两个文件,在哪里可以下载?
你好
谢谢你们开源这个项目,我是一个刚刚接触声纹的初学者,如果我想完成一个说话人确认的系统,我应该从哪部分开始,这里面有相应的recipe吗?
谢谢!
This issue is opened to notify the PR #54.
The PR has changed all has_key
function instances to in
operators accordingly.
snowdar您好!感谢您开发出这个研究框架。
我在阅读pytorch网络框架的resnet和snowdar部分,想请教下对于这两个xvector框架,有无比较准确的reference?
Welcome to discuss the training strategy here.
There are two typical training strategies, "SGD + Reduce Learning Rate on Plateau" and "Adam + Warm Restarts".
(1) Training slowly but could make a good generalization.
(2) The parameters of ReduceLROnPlateau should be set carefully, such as patience and learning rate scale.
......
(1) It is not clear to set the T for Warm Restarts.
(2) It is dizzy to make sure how many times the Restarts should be.
......
In fact, I am still not sure how the value of weight decay influences the results when training with these two strategies. And are there any other factors decide the final performance when comparing the two strategies?
Welcome to comment and share your experiments.
I noticed that the performance improve significantly in AS-norm for ECAPAXvector.(EER 1.506->1.140). Could you provide the cohort file you choose?
赵淼你好,
根据你的runPhoneticXvector.sh训练脚本,网络部分我自己修改了。
我一共有89个iter,但是训练到28个iter的时候出现了报错。
报错信息出现的情况相似于论坛中https://groups.google.com/g/kaldi-help/c/F7cud3lbDMo/m/VuNDG-qRBgAJ
我的报错信息是如下:
WARNING (nnet3-train[5.5]): ConstrainOrthonormalInternal():nnet-utils.cc:1055) Ratio is nan (should be >=1.0);component is tdnnf10.liner
ASSERTION_FAILED (net-trian [5.5]: ConstrainOrthonormalInternal():nnet-utils.cc:1057) Assertion failed: (ratio > 0.9)
train.py,# Recover checkpoint 没有将loss的weight的保存,仅load模型,是不是有问题?
首先非常感谢这么优秀的开源项目。
使用在线训练脚本subtools/pytorch/lanucher/*_online.py训练时,报出标签越界问题。
经排查产生该问题的原因如下:
subtools/pytorch/pipeline/preprocess_wav_egs.sh中调用的subtools/pytorch/pipeline/onestep/get_raw_wav_chunk.py中的get_chunk_egs函数是先对整个dataset生成utt2spk_int文件(dataset.generate("utt2spk_int") ),然后划分trainset, valid集合(trainset, valid = dataset.split(args.valid_num_utts, args.valid_split_type))。当某个说话人仅有1条utt且runEcapaXvector_online.py中limit_utts=1时,说话人可能就被全部划分进valid集合,从而导致train的实际人数减少,但标签最大值仍为整个数据集的最大值。
Hi.
We want to ask two questions about the evaluation results of OLR2020 Challenge baseline system.
We noticed that in Table 2, the official results of task1 using i-vector is: Cavg--0.2965 EER%--19.40
. But we get a results like: Cavg--0.2997, EER%--29.91
. We are doubting that is there any probability that the official results have mistakenly put a wrong EER% number into Table 2. Just like the pictures below, we find that the EER number 19.40% present not only in Table2, but Table3, of the same task, and the Cavg and EER in Table3 are not too far different which is opposite to Table2. And in other literatures using Cavg and EER as their evaluation criterion, we also barely see any circumstances that has such a big difference between the two number. We hope the members of official would check the results, thanks!
We are using the formular like the picture above to calculate Cavg in each task. But we find something that we don't understand in the python code computeCavg.py which gives a prior probability greater than 1 in task2 which is an open-set task.
In task1, we have 6 languages in both enrollset and testset, and it gives a prior probability of 0.5 for target-language, and 0.1 for each non-target-languages. There is no problem here. It computes Cavg like
In task2, we have 6 languages in testset but 3 in enrollset. The program sees lang_num as 3, and get prior probability of each non-target-language of 0.25. But what is odd is that the program sees the other 3 languages which are not in enrollment as another non-target-language and gives it a prior probability of 0.25. I have already followed each step of this code and get some of the parameters showed below. The code is like
line 113 p_nontarget = (1 - p_target) / (lang_num - 1) # lang_num=3, p_nontarget=0.25
line 114 target_cavg[lang] = p_target * p_miss + p_nontarget*sum(p_fa) # p_fa is a list, length is 3, p_fa[2] represents false alarm probability of the overall of three languages not in enrollset
Finally it computes Cavg like
where Ln(n=3) represents the overall of those languages not in enrollset. So from the formular above, we get an entire prior probability of 0.5+0.25*3=1.25
which is greater than 1. I don't konw is there any misunderstanding on this formular or how the program works...Could you please give us some hints on this?
Sincerely
Yizhou Peng
Hello, when I replaced the TDNN model with resnet-xvector.py in your model for Transfer learning, the following errors occurred during scoring. All but the loss layer were migrated. I hope to get your answer. Looking forward to your reply.Thank you.
ERROR (ivector-compute-plda[5.5]:Cholesky():tp-matrix.cc:110) Cholesky decomposition failed. Maybe matrix is not positive definite.
[ Stack-Trace: ]
/home/yqc/kaldi-master/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0xb42) [0x7f68e936c732]
ivector-compute-plda(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x21) [0x564450bdc4e9]
/home/yqc/kaldi-master/src/lib/libkaldi-matrix.so(kaldi::TpMatrix::Cholesky(kaldi::SpMatrix const&)+0x1b1) [0x7f68e95d73d1]
/home/yqc/kaldi-master/src/lib/libkaldi-ivector.so(+0x1b99a) [0x7f68e9a7399a]
/home/yqc/kaldi-master/src/lib/libkaldi-ivector.so(kaldi::PldaEstimator::GetOutput(kaldi::Plda*)+0x1c6) [0x7f68e9a75e00]
/home/yqc/kaldi-master/src/lib/libkaldi-ivector.so(kaldi::PldaEstimator::Estimate(kaldi::PldaEstimationConfig const&, kaldi::Plda*)+0x195) [0x7f68e9a76617]
ivector-compute-plda(main+0xd13) [0x564450bdb86d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f68e894bc87]
ivector-compute-plda(_start+0x2a) [0x564450bdaa7a]
kaldi::KaldiFatalError(subtools)
使用export_jit_model导出训练好的模型时候报错,有人碰到过吗
RuntimeError:
Module 'Xvector' has no attribute 'embd_dim' :
File "/work/kaldi/egs/xmuspeech/sre/exp/standard_voxceleb1/config/snowdar_xvector.py", line 337
执行recipe下的sh文件时,相对路径寻找有很多的问题,尤其以找不到. subtools/*.sh文件这样的问题居多,因为recipe下没有subtools文件,我只能把subtools路径加入到PATH里,然后把代码改成. *.sh才能执行。不知道是我那里没设置好,还是您的代码相对路径存在问题
Hi and big kudos to your asv-subtool with both academic and practical contributions!
I found the ResNET34 setting in the toolkit does not have a clear reference. While for other x-vector networks references are quite clear, can I have such for this model please? Maybe from your group?
By saying ResNET34, I am talking about this implemented class.
Hi, thank you for this great tools, when i run voxcelebSRC recipe, i met some errors when do as-norm, they are:
Hi,
Snowdar!
我在ap-olr2020-baseline中,通过run_pytorch_xvector.py生成了xvector,现在我想用subtools/scoreSets.sh对我的enrollsets和testsets进行打分(我的数据集都是一个speaker只有一个utterance),但现在出现了如下错误提示:
**[Auto find] Your vectortype is xvector
[Notice] It will set the default config task3_enroll[task3_enroll task3_enroll task3_test] for lda, submean and whiten, if used.
allsets:task3_enroll task3_test task3_enroll task3_enroll task3_enroll task3_test task3_enroll task3_enroll task3_enroll task3_test
[ lr ]
ivector-normalize-length --scaleup=false scp:exp/pytorch_xvector/far_epoch_21/task3_enroll/xvector.scp ark:exp/pytorch_xvector/far_epoch_21/task3_enroll/xvector_norm.ark
LOG (ivector-normalize-length[5.5.8041-a8c6]:main():ivector-normalize-length.cc:90) Processed 21580 iVectors.1-a8c6]:main():ivector-normalize-length.cc:94) Average ratio of iVector to expected length was 44.3168, standard deviation was 3.67065
LOG (ivector-normalize-length[5.5.804
ivector-compute-lda --dim=100 --total-covariance-factor=0.1 ark:exp/pytorch_xvector/far_epoch_21/task3_enroll/xvector_norm.ark ark:data/mfcc_20_5.0/task3_enroll/utt2spk exp/pytorch_xvector/far_epoch_21/task3_enroll/transform_100.mat
LOG (ivector-compute-lda[5.5.8041-a8c6]:main():ivector-compute-lda.cc:288) Read 21580 utterances, 0 with errors.1-a8c6]:main():ivector-compute-lda.cc:294) Computing within-class covariance.
LOG (ivector-compute-lda[5.5.804
LOG (ivector-compute-lda[5.5.8041-a8c6]:main():ivector-compute-lda.cc:299) 2-norm of iVector mean is 0.7718241-a8c6]:ComputeLdaTransform():ivector-compute-lda.cc:136) Stats have 21580 speakers, 21580 utterances.
LOG (ivector-compute-lda[5.5.804
ASSERTION_FAILED (ivector-compute-lda[5.5.804~1-a8c6]:ComputeLdaTransform():ivector-compute-lda.cc:137) Assertion failed: (!stats.Empty())
[ Stack-Trace: ]
/home/lanhaile/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x808) [0x7f4b1168e35c]
/home/lanhaile/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x59) [0x7f4b1168eda3]
ivector-compute-lda(kaldi::ComputeLdaTransform(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, kaldi::Vector, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, kaldi::Vector> > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > > > const&, float, float, kaldi::MatrixBase*)+0x705) [0x40d38f]
ivector-compute-lda(main+0xd6e) [0x40e6eb]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f4b09e2a840]
ivector-compute-lda(_start+0x29) [0x40ca89]
ivector-transform exp/pytorch_xvector/far_epoch_21/task3_enroll/ scp:exp/pytorch_xvector/far_epoch_21/task3_enroll/xvector.scp ark:exp/pytorch_xvector/far_epoch_21/task3_enroll/xvector_lda100.ark
ERROR (ivector-transform[5.5.804~1-a8c6]:Read():kaldi-matrix.cc:1617) Failed to read matrix from stream. : Expected "[", got EOF File position at start is -1, currently -1
[ Stack-Trace: ]
/home/lanhaile/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x808) [0x7fbf6bdd735c]
ivector-transform(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40a709]
/home/lanhaile/kaldi/src/lib/libkaldi-matrix.so(kaldi::Matrix::Read(std::istream&, bool, bool)+0x1a82) [0x7fbf6c020f76]
/home/lanhaile/kaldi/src/lib/libkaldi-util.so(void kaldi::ReadKaldiObject<kaldi::Matrix >(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, kaldi::Matrix*)+0x239) [0x7fbf6c298bdc]
ivector-transform(main+0xeb) [0x409681]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbf64573840]
ivector-transform(_start+0x29) [0x4094c9]
kaldi::KaldiFatalErrorawk: cmd. line:1: fatal: cannot open file `exp/pytorch_xvector/far_epoch_21/task3_test/lr_task3_enroll_task3_test_lda100_submean_norm.score' for reading (No such file or directory)
Tansforming score to table done.**
期待您的答疑,万分感谢!
现在的 runtime 是基于 libtorch
https://github.com/Snowdar/asv-subtools/tree/master/runtime
请问是否有计划提供 onnx 脚本的导出?
I noticed the WARNNING in the source code for torch.utils.data.distributed.DistributedSampler
:
.. warning::
In distributed mode, calling the :math`set_epoch(epoch) <set_epoch>` method at
the beginning of each epoch **before** creating the :class:`DataLoader` iterator
is necessary to make shuffling work properly across multiple epochs. Otherwise,
the same ordering will be always used.
so we should add data.train_sampler.set_epoch(this_epoch)
at the begin of every epoch? zhihu
Hi, Thanks for great job. I have tried to use Mixup Learning Strategies,
but I got an Error which says:
Mixup object has no attribute 'lam'
I think this may be an implementation bug.
从网上下载数据量过大,复现流程过久,我是做NLP的,最近在弄声纹识别,作为一个新手来说,复现流程不是很友好,有一点费劲,如果能提供一份少量voxceleb数据,能够快速复现整体流程,而不需要去一直等数据下载下来才能复现流程。
您好,感谢提供这么一个优秀的工具。是这样的,我一直有个问题不理解,为什么sre-fbank-81.conf配置的特征维度是81维而不是80维,我看到num-mel-bins设置的是80,极其相似的配置,sre-fbank-40.conf里面num-mel-bins设置的是40,特征的维度就是40
赵淼您好!
感谢百忙之中抽空查看我的邮件。
最近我在用开源工具ASV-subtools做一些声纹识别的研究。其中碰到了一个小问题想请教一下。
我目前用runResnetXvector.py的脚本训练resnet网络模型。在默认的参数下模型已经训练完毕了。查看loss曲线和acc曲线都比较正常。然后我把之前的softmax损失函数替换成am-softmax损失函数,把超参数m设置为0.3,同时用退火算法,算法会慢慢从softmax损失函数过度到am-softmax损失函数。这样的改动,导致模型在训练时的acc降到了70%,损失函数出现了先下降后上升的趋势。如果把超参数m设置为0.1,从acc曲线看,模型收敛的速度会快很多。根据这种现象,我有三个疑问想请教一下:
(1)从结果上看,超参数m大小似乎对模型的性能影响是很敏感的,不知道这是不是正常的现象。
(2)理论上讲,am-softmax可以使类间分得更开,从而应该比softmax损失函数有着更高的acc,但从附件上的图看,准确率变得低了很多,损失函数也在下降后又急速上升,这种现象是否是正常的?
(3)有没有什么好的方法能够在am-softmax损失函数下加快模型的收敛速度?
期待您的回信。
祝好!--------------------------------------------------------------------------
楼一杰你好,
总的来说,使用AM-softmax的时候,有一些参数需要注意。首先除了打开这个loss外,要考虑一下最后一层是否保留bn和relu,以及use_step的参数区域是否要使用渐变增加margin的策略。基于此,就你的疑问,我的理解如下:
(1)m作为惩罚,对模型训练是比较敏感的,太大可能导致收敛的问题,训练不好就会影响性能。另外一个是,如果你没有去掉最后一层relu,那么分类空间会小得多(非负意味着仅在第一象限),此时m更不适合取得太大。一般我们取0.2,仅供参考。
(2)关于acc的对比上,其实没有绝对的正比关系,acc更多的要考虑过拟合问题来审视。同时,应该以valid set的acc进行对比,trainset的acc对比意义会少很多。损失函数急速下降上升,可能是因为你画的trainset的loss,因为trainset的loss计算有惩罚的部分(要想获得真实loss,需要重复计算,这个一般不考虑,费时间),而惩罚在不断增加,所以这个loss是不可靠的,或许你可以看看validset。
(3)AM-softmax损失本身可以一定程度上加快训练速度,但是一般直接训练又可能会导致训练较差,所以默认选择比较鲁棒的渐变训练策略。在固定epoch的训练中,如果你发现后期AM比Softmax收敛的更差,往往这意味着你的惩罚太大,不能很好的收敛。另外,训练速度与优化器也有关系。
祝好!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.