Giter Club home page Giter Club logo

Comments (30)

padeoe avatar padeoe commented on August 13, 2024

额这差不多呀,epoch=1太小了

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

是用judge.py 得出的结果。你那judge.py不是真正测评分的吧?真正测评分工具的是法研杯官网那边提供的,但是那边没公开评分工具,是吧?

from cail2019.

padeoe avatar padeoe commented on August 13, 2024

不是的,主办方公开了judger.py,本项目的judjer.py与之一致,做了细微修改

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

我问的不是代码,judge.py代码我知道,是用官网的。我想问的是,我看你代码中,ground_truth.txt 文件里面的label是随机数生成的,也就是,真正的label答案,在官网那边?

from cail2019.

padeoe avatar padeoe commented on August 13, 2024

不是的,ground_truth.txt 是从标记数据集产生的,因为主办方提供的数据总是B相对C和A更相似,所以这里的随机是为了不让label全都是二分类中的一个类,B和C也会根据label不同调整顺序(这里),会正确对应的

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

哦,明白。谢谢你的回复。可是评分,还是要去主办方那边打分吧。毕竟自己没有预测答案,自己无法评分。所以,我的结果才只有0.52.

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

请问一下,可以自测评分吗?

from cail2019.

padeoe avatar padeoe commented on August 13, 2024

可以的,我是把数据集划分出一部分作为测试集来评分的,数据集本身具有标记,因此产生的测试集也有标记,所以可以正确评分,你这个52应该就是测试得分,我感觉是有点低,难道我最近的commit改出了bug

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

是啊,我也觉得奇怪。我还用了刘致远老师提供的预训练模型。https://github.com/thunlp/OpenCLaP 按理说,分数应该更好的。

from cail2019.

padeoe avatar padeoe commented on August 13, 2024

是啊,我也觉得奇怪。我还用了刘致远老师提供的预训练模型。https://github.com/thunlp/OpenCLaP 按理说,分数应该更好的。

哈哈,那我知道原因了,OpenCLaP 的预训练模型有 “bug”,参见我在 issue #8 的解释 ,我给他们的项目提了 issue,至今也没修复。solution:将 vocab.txt 拷贝到训练输出的模型目录就好了。

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

我试了下,准确率反而降低了。

eval acc是跟epoch大小关系紧密吗?
我看你的结果:

2019-11-05 16:06:53 - train model - INFO - Epoch 2, train Loss: 475.4495771, eval acc: 0.8539215686274509, eval loss: 286.7390137

我又重新跑了一次,而我的第一遍epoch结果是:

Epoch 1/2, Loss 0.4610962: 100%|█████████▉| 2290/2291 [28:48:50<00:48, 48.40s/it]
Epoch 1/2, Loss 0.4610962: 100%|██████████| 2291/2291 [28:48:50<00:00, 49.52s/it]
2020-07-01 23:16:38 - train model - INFO - Epoch 1, train Loss: 1155.5741252, eval acc: 0.5309446254071661, eval loss: 28.0904787

0%| | 0/2291 [00:00<?, ?it/s]
Epoch 2/2, Loss 0.4629875: 0%| | 0/2291 [00:33<?, ?it/s]
Epoch 2/2, Loss 0.4629875: 0%| | 1/2291 [00:33<21:12:08, 33.33s/it]

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

关于

eval acc是跟epoch大小关系紧密吗?

我觉得不像。我的eval acc达不到你之前的水平。我使用cpu训练的,请问跟这有关系吗?想了半天,没想明白。这是我最新训练,打印的日志截取:

2020-06-30 18:24:48 - train model - INFO - 算法:BertForSimMatchModel
2020-06-30 18:24:48 - train model - INFO - ***** Running training *****
2020-06-30 18:24:48 - train model - INFO - dataset: data/raw/CAIL2019-SCM-big/SCM_5k.json
2020-06-30 18:24:48 - train model - INFO - k-fold number: 1
2020-06-30 18:24:48 - train model - INFO - device: cpu n_gpu: 0
2020-06-30 18:24:48 - train model - INFO - config: {
"batch_size": 12,
"epochs": 2,
"fp16": false,
"fp16_opt_level": "O1",
"learning_rate": 2e-05,
"max_grad_norm": 1.0,
"max_length": 512,
"warmup_steps": 0.1
}
2020-06-30 18:25:00 - train model - INFO - ***** fold 1/1 *****
2020-06-30 18:25:00 - train model - INFO - Num examples = 27499
2020-06-30 18:25:00 - train model - INFO - Batch size = 12
2020-06-30 18:25:00 - train model - INFO - Num steps = 4582

0%| | 0/2291 [00:00<?, ?it/s]
Epoch 1/2, Loss 0.6622486: 0%| | 0/2291 [01:09<?, ?it/s]
Epoch 1/2, Loss 0.6622486: 0%| | 1/2291 [01:09<44:00:40, 69.19s/it]
Epoch 1/2, Loss 0.7014956: 0%| | 1/2291 [02:16<44:00:40, 69.19s/it]

Epoch 1/2, Loss 0.4610962: 100%|█████████▉| 2290/2291 [28:48:50<00:48, 48.40s/it]
Epoch 1/2, Loss 0.4610962: 100%|██████████| 2291/2291 [28:48:50<00:00, 49.52s/it]
2020-07-01 23:16:38 - train model - INFO - Epoch 1, train Loss: 1155.5741252, eval acc: 0.5309446254071661, eval loss: 28.0904787

0%| | 0/2291 [00:00<?, ?it/s]
Epoch 2/2, Loss 0.4629875: 0%| | 0/2291 [00:33<?, ?it/s]
Epoch 2/2, Loss 0.4629875: 0%| | 1/2291 [00:33<21:12:08, 33.33s/it]

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

请问可能是什么原因呢?我用10、50、100、500,5k条数据都测过,eval acc 均在0.5上下浮动,0.44-0.53。

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

这是最新训练出来的,用5k条数据。cpu训练的。打印日志如下:

2020-06-30 18:24:48 - train model - INFO - 算法:BertForSimMatchModel
2020-06-30 18:24:48 - train model - INFO - ***** Running training *****
2020-06-30 18:24:48 - train model - INFO - dataset: data/raw/CAIL2019-SCM-big/SCM_5k.json
2020-06-30 18:24:48 - train model - INFO - k-fold number: 1
2020-06-30 18:24:48 - train model - INFO - device: cpu n_gpu: 0
2020-06-30 18:24:48 - train model - INFO - config: {
"batch_size": 12,
"epochs": 2,
"fp16": false,
"fp16_opt_level": "O1",
"learning_rate": 2e-05,
"max_grad_norm": 1.0,
"max_length": 512,
"warmup_steps": 0.1
}
2020-06-30 18:25:00 - train model - INFO - ***** fold 1/1 *****
2020-06-30 18:25:00 - train model - INFO - Num examples = 27499
2020-06-30 18:25:00 - train model - INFO - Batch size = 12
2020-06-30 18:25:00 - train model - INFO - Num steps = 4582

0%| | 0/2291 [00:00<?, ?it/s]
Epoch 1/2, Loss 0.6622486: 0%| | 0/2291 [01:09<?, ?it/s]
Epoch 1/2, Loss 0.6622486: 0%| | 1/2291 [01:09<44:00:40, 69.19s/it]
Epoch 1/2, Loss 0.7014956: 0%| | 1/2291 [02:16<44:00:40, 69.19s/it]

Epoch 2/2, Loss 0.3055271: 100%|█████████▉| 2290/2291 [29:44:14<00:45, 45.23s/it]
Epoch 2/2, Loss 0.3055271: 100%|██████████| 2291/2291 [29:44:14<00:00, 45.21s/it]
2020-07-03 05:03:30 - train model - INFO - Epoch 2, train Loss: 862.6146737, eval acc: 0.5244299674267101, eval loss: 30.6137773
2020-07-03 05:03:38 - train model - INFO - ***** Stats *****
2020-07-03 05:03:38 - train model - INFO - acc for each epoch:
2020-07-03 05:03:38 - train model - INFO - epoch 1, mean: 0.53094, std: 0.00000
2020-07-03 05:03:38 - train model - INFO - epoch 2, mean: 0.52443, std: 0.00000
2020-07-03 05:03:38 - train model - INFO - ***** Training complete *****

from cail2019.

padeoe avatar padeoe commented on August 13, 2024

是用的OpenCLaP还是原生BERT呢

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

最新的5k条数据版的,是OpenCLaP;但我换了其它预训练模型,bert-base-chinese,chinese_wwm_ext_pytorch,在小数据下,eval acc 也是在0.5左右浮动。

from cail2019.

padeoe avatar padeoe commented on August 13, 2024

最新的5k条数据版的,是OpenCLaP;但我换了其它预训练模型,bert-base-chinese,chinese_wwm_ext_pytorch,在小数据下,eval acc 也是在0.5左右浮动。

我今天抽空测一下看看,旧版本代码测过没问题,后来小改动就没测试了。你也可以试试本项目的前几个commit的版本应该没毛病。

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

好的,我看下

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

我用你10月24号,21号的2个版本,分别用10条数据训练了下,也在0.5上下。你觉得可能是没有用gpu的缘故吗?我用2台服务器均是这样。一台2核;一台20核。而且,2核与20核的训练时间,差不多是一样的。

from cail2019.

padeoe avatar padeoe commented on August 13, 2024

我用你10月24号,21号的2个版本,分别用10条数据测了下,也在0.5上下。你觉得可能是没有用gpu的缘故吗?我用2台服务器均是这样。一台2核;一台20核。而且,2核与20核的训练时间,差不多是一样的。

理论上没有差别,但看你的测试或许真的有影响吧....我没试过CPU训练不敢说

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

我修改了小份的训练数据的错误,训练后,eval acc 直接等于0.5了。很奇怪。你的也是0.5吗

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

这是50条训练数据的日志截取:

2020-07-03 17:08:09 - train model - INFO - 算法:BertForSimMatchModel
2020-07-03 17:08:09 - train model - INFO - ***** Running training *****
2020-07-03 17:08:09 - train model - INFO - dataset: train_data/50/SCM_50.json
2020-07-03 17:08:09 - train model - INFO - k-fold number: 1
2020-07-03 17:08:09 - train model - INFO - device: cpu n_gpu: 0
2020-07-03 17:08:09 - train model - INFO - config: {
"batch_size": 12,
"epochs": 2,
"fp16": false,
"fp16_opt_level": "O1",
"learning_rate": 2e-05,
"max_grad_norm": 1.0,
"max_length": 512,
"warmup_steps": 0.1
}
2020-07-03 17:08:12 - train model - INFO - ***** fold 1/1 *****
2020-07-03 17:08:12 - train model - INFO - Num examples = 300
2020-07-03 17:08:12 - train model - INFO - Batch size = 12
2020-07-03 17:08:12 - train model - INFO - Num steps = 50

0%| | 0/25 [00:00<?, ?it/s]
Epoch 1/2, Loss 0.7200491: 0%| | 0/25 [00:28<?, ?it/s]
Epoch 1/2, Loss 0.7200491: 4%|▍ | 1/25 [00:28<11:13, 28.05s/it]

Epoch 2/2, Loss 0.4598025: 96%|█████████▌| 24/25 [14:33<00:34, 34.01s/it]
Epoch 2/2, Loss 0.4598025: 100%|██████████| 25/25 [14:33<00:00, 34.04s/it]
2020-07-03 17:37:43 - train model - INFO - Epoch 2, train Loss: 11.4871774, eval acc: 0.5, eval loss: 1.5408574
2020-07-03 17:37:44 - train model - INFO - ***** Stats *****
2020-07-03 17:37:44 - train model - INFO - acc for each epoch:
2020-07-03 17:37:44 - train model - INFO - epoch 1, mean: 0.50000, std: 0.00000
2020-07-03 17:37:44 - train model - INFO - epoch 2, mean: 0.50000, std: 0.00000
2020-07-03 17:37:44 - train model - INFO - ***** Training complete *****

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

你好,我是个人想用这个代码做个新闻推荐demo的。我借用朋友的泰坦GPU训练了50条的数据。eval acc真的和CPU训练的不一样。感觉很困惑。
日志如下:

/content/my_cailmodel
2020-07-04 13:07:32.343522: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-04 13:07:33.932846: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-04 13:07:33.948853: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-04 13:07:33.949845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-07-04 13:07:33.949898: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-04 13:07:33.952220: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-07-04 13:07:33.954218: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-07-04 13:07:33.954604: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-07-04 13:07:33.957184: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-07-04 13:07:33.958370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-07-04 13:07:33.962967: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-04 13:07:33.963113: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-04 13:07:33.964209: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-04 13:07:33.965041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-04 13:07:33.971443: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2300000000 Hz
2020-07-04 13:07:33.971680: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6975180 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-04 13:07:33.971757: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-07-04 13:07:34.068340: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-04 13:07:34.069464: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6975340 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-04 13:07:34.069501: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2020-07-04 13:07:34.069837: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-04 13:07:34.070772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-07-04 13:07:34.070856: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-04 13:07:34.070914: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-07-04 13:07:34.070958: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-07-04 13:07:34.070999: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-07-04 13:07:34.071039: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-07-04 13:07:34.071079: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-07-04 13:07:34.071119: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-04 13:07:34.071253: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-04 13:07:34.072197: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-04 13:07:34.073058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-04 13:07:38.054491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-04 13:07:38.054556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-07-04 13:07:38.054592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-07-04 13:07:38.054988: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-04 13:07:38.056061: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-04 13:07:38.056913: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2020-07-04 13:07:38.056971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14598 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2020-07-04 13:07:38 - train model - INFO - 算法:BertForSimMatchModel
2020-07-04 13:07:38 - train model - INFO - ***** Running training *****
2020-07-04 13:07:38 - train model - INFO - dataset: train_data/50/SCM_50.json
2020-07-04 13:07:38 - train model - INFO - k-fold number: 1
2020-07-04 13:07:38 - train model - INFO - device: cuda n_gpu: 1
2020-07-04 13:07:38 - train model - INFO - config: {
"batch_size": 7,
"epochs": 2,
"fp16": false,
"fp16_opt_level": "O1",
"learning_rate": 2e-05,
"max_grad_norm": 1.0,
"max_length": 512,
"warmup_steps": 0.1
}
2020-07-04 13:07:44 - train model - INFO - ***** fold 1/1 *****
2020-07-04 13:07:44 - train model - INFO - Num examples = 300
2020-07-04 13:07:44 - train model - INFO - Batch size = 7
2020-07-04 13:07:44 - train model - INFO - Num steps = 84
Epoch 1/2, Loss 0.5268430: 100% 42/42 [00:41<00:00, 1.01it/s]
2020-07-04 13:08:27 - train model - INFO - Epoch 1, train Loss: 23.1619640, eval acc: 0.5, eval loss: 1.1643409
Epoch 2/2, Loss 0.3859729: 100% 42/42 [00:41<00:00, 1.01it/s]
2020-07-04 13:09:09 - train model - INFO - Epoch 2, train Loss: 17.1225259, eval acc: 0.8, eval loss: 1.1731486
2020-07-04 13:09:10 - train model - INFO - ***** Stats *****
2020-07-04 13:09:10 - train model - INFO - acc for each epoch:
2020-07-04 13:09:10 - train model - INFO - epoch 1, mean: 0.50000, std: 0.00000
2020-07-04 13:09:10 - train model - INFO - epoch 2, mean: 0.80000, std: 0.00000
2020-07-04 13:09:10 - train model - INFO - ***** Training complete *****

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

你好。我用GPU测试多次,主办方的python环境,bert-base-chinese模型,5k数据,batch_size=6或7,没法重现你的结果。10月21号的代码也试过,eval acc均在0.55上下,0.5~0.58。请问可能是什么原因呢?

from cail2019.

padeoe avatar padeoe commented on August 13, 2024

你好。我用GPU测试多次,主办方的python环境,bert-base-chinese模型,5k数据,batch_size=6或7,没法重现你的结果。10月21号的代码也试过,eval acc均在0.55上下,0.5~0.58。请问可能是什么原因呢?

额,你上一条回复不是结果挺好吗,是5k数据结果仍然不行吗

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

是的,用500条,5k条训练了,都在0.55上下,0.5~0.58.真的看不出来什么回事.训练数据是5k,26.7M;testdata按20%切割的,是5.34M.

额,你上一条回复不是结果挺好吗,是5k数据结果仍然不行吗

那是50条的数据,看不出趋势.

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

我刚刚用2块tesla v100, 算是高级显卡了.可是训练时间,一个epoch,比你说的要长很多,有1小时15分钟的样子(可能我用的是共享gpu). apex是false. 是我用的数据和你的不一样吗? 训练数据是5k,26.7M; testdata按20%切割的,是5.34M. 是data.py生成的.
用的10月21号的代码.

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

又用了2块tesla v100显卡。
这是训练日志截取:

2020-07-05 17:34:46 - train model - INFO - 算法:BertForSimMatchModel
2020-07-05 17:34:46 - train model - INFO - ***** Running training *****
2020-07-05 17:34:46 - train model - INFO - dataset: train_data/5k/SCM_5k.json
2020-07-05 17:34:46 - train model - INFO - k-fold number: 1
2020-07-05 17:34:46 - train model - INFO - device: cuda n_gpu: 2
2020-07-05 17:34:46 - train model - INFO - config: {
"batch_size": 12,
"epochs": 2,
"fp16": false,
"learning_rate": 2e-05,
"max_length": 512
}
2020-07-05 17:35:03 - train model - INFO - ***** fold 1/1 *****
2020-07-05 17:35:03 - train model - INFO - Num examples = 27499
2020-07-05 17:35:03 - train model - INFO - Batch size = 12
2020-07-05 17:35:03 - train model - INFO - Num steps = 4582

0%| | 0/2291 [00:00<?, ?it/s]/usr/local/python3/lib/python3.7/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '

Epoch 1/2, Loss 0.6857975: 0%| | 0/2291 [00:05<?, ?it/s]
Epoch 1/2, Loss 0.6857975: 0%| | 1/2291 [00:05<3:32:41, 5.57s/it]
Epoch 1/2, Loss 0.7025878: 0%| | 1/2291 [00:07<3:32:4

Epoch 1/2, Loss 0.3436253: 100%|█████████▉| 2289/2291 [1:10:57<00:03, 1.85s/it]
Epoch 1/2, Loss 0.3436253: 100%|█████████▉| 2290/2291 [1:10:57<00:01, 1.85s/it]
Epoch 1/2, Loss 0.3884579: 100%|█████████▉| 2290/2291 [1:10:59<00:01, 1.85s/it]
Epoch 1/2, Loss 0.3884579: 100%|██████████| 2291/2291 [1:10:59<00:00, 1.85s/it]
2020-07-05 18:47:06 - train model - INFO - Epoch 1, train Loss: 976.8484841, eval acc: 0.5745098039215686, eval loss: 89.2770401

0%| | 0/2291 [00:00<?, ?it/s]

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

你好,有空吗?

你能把你的训练数据发我一份吗?qq 2648759823

from cail2019.

zhouyang-bigdata avatar zhouyang-bigdata commented on August 13, 2024

原来是数据集不一样。还有,最好不要用cpu训练。

from cail2019.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.