top_word结构中attr的怪现像 about scws HOT 7 CLOSED

1371030 commented on June 30, 2024

top_word结构中attr的怪现像

from scws.

Comments (7)

1371030 commented on June 30, 2024

每行下方的数字为attr的长度，用get_tops时和命令行调用一致，attr长度正常
电影 n 25.0200004577637(6)
1
创作 vn 14.8199996948242(3)
2
版权 n 14.7600002288818(3)
1
专有 vn 13.9399995803833(2)
2
陈凯歌 nr 11.8699998855591(1)
2

`[root@localhost ~]# scws -r /etc/scws/rules.utf8.ini -d /usr/share/scws/dict.utf8.xdb -c utf8 -I -A -t 5 -a~v -i 110.txt`
No. WordString Attr Weight(times)

电影 n 25.02(6)
创作 vn 14.82(3)
版权 n 14.76(3)
专有 vn 13.94(2)
陈凯歌 nr 11.87(1)

from scws.

1371030 commented on June 30, 2024

对scws中的get_words进行修改

                          top = (scws_top_t) malloc(sizeof(struct scws_topword));

                          top = (scws_top_t) pmalloc_z(xt->p, sizeof(struct scws_topword));�

就出现分段错误的提示

from scws.

1371030 commented on June 30, 2024

对cli中的scws_cmd修改，显示get_words中attr的长度，确实发现有问题，补丁文件
//--- scws-1.2.3/cli/scws_cmd.c.orig 2013-01-06 13:39:51
//+++ scws-1.2.3/cli/scws_cmd.c 2022-12-22 20:42:38
//@@ -286,6 +286,25 @@
// fprintf(fout, "EMPTY records!\n");
// }
//
//+ fprintf(fout, "No. WordString Attr Weight(times)\n");
//+ fprintf(fout, "-------------------------------------------------\n");
//+ if ((top = xtop = scws_get_words(s, attr)) != NULL)
//+ {
//+ tlimit = 1;
//+ while (xtop != NULL)
//+ {
//+ fprintf(fout, "%02d. %-24.24s %-4.2s %.2f(%d) %4d-\n",
//+ tlimit, xtop->word, xtop->attr, xtop->weight, xtop->times, strlen(xtop->attr));
//+ xtop = xtop->next;
//+ tlimit++;
//+ }
//+ scws_free_tops(top);
//+ }
//+ else
//+ {
//+ fprintf(fout, "EMPTY records!\n");
//+ }
//+
// if (xmode & XMODE_STAT_FILE)
// free(str);
// }

显示结果最后一列为attr的长度

研究 vn 4.45(1) 5-
生命科学 n 7.37(1) 1-
北京 ns 6.35(1) 5-
大学生 n 4.70(1) 1-
喝 vn 0.00(1) 5-
进口 vn 4.87(1) 5-
红酒 n 6.17(1) 1-

from scws.

hightman commented on June 30, 2024

最多就是2字节，你不能当string直接用发自我的 iPhone6艹在 2022年12月22日，20:54，1371030 ***@***.***> 写道：对cli中的scws_cmd修改，显示get_words中attr的长度，确实发现有问题，补丁文件 `--- scws-1.2.3/cli/scws_cmd.c.orig 2013-01-06 13:39:51 +++ scws-1.2.3/cli/scws_cmd.c 2022-12-22 20:42:38 @@ -286,6 +286,25 @@ fprintf(fout, "EMPTY records!\n"); } fprintf(fout, "No. WordString Attr Weight(times)\n"); fprintf(fout, "-------------------------------------------------\n"); if ((top = xtop = scws_get_words(s, attr)) != NULL) { tlimit = 1; while (xtop != NULL) { fprintf(fout, "%02d. %-24.24s %-4.2s %.2f(%d) %4d-\n", tlimit, xtop->word, xtop->attr, xtop->weight, xtop->times, strlen(xtop->attr)); xtop = xtop->next; tlimit++; } scws_free_tops(top); } else { fprintf(fout, "EMPTY records!\n"); } if (xmode & XMODE_STAT_FILE) free(str); }�` 显示结果最后一列为attr的长度研究 vn 4.45(1) 5- 生命科学 n 7.37(1) 1- 北京 ns 6.35(1) 5- 大学生 n 4.70(1) 1- 喝 vn 0.00(1) 5- 进口 vn 4.87(1) 5- 红酒 n 6.17(1) 1-� —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

from scws.

1371030 commented on June 30, 2024

get_tops同样是复制2字节没有这情况，不明白get_words为啥会出现超过2个字节就不等长的情况

from scws.

hightman commented on June 30, 2024

结构不一样。发自我的 iPhone6艹在 2022年12月26日，17:31，1371030 ***@***.***> 写道： get_tops同样是复制2字节没有这情况，不明白get_words为啥会出现超过2个字节就不等长的情况 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

from scws.

l1t1 commented on June 30, 2024

strlen只能处理以'\0'结尾的字符串，xtop->attr只是一个两元素的字符数组

from scws.

top_word结构中attr的怪现像 about scws HOT 7 CLOSED

Comments (7)

`[root@localhost ~]# scws -r /etc/scws/rules.utf8.ini -d /usr/share/scws/dict.utf8.xdb -c utf8 -I -A -t 5 -a~v -i 110.txt`
No. WordString Attr Weight(times)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (7)

[root@localhost ~]# scws -r /etc/scws/rules.utf8.ini -d /usr/share/scws/dict.utf8.xdb -c utf8 -I -A -t 5 -a~v -i 110.txt No. WordString Attr Weight(times)

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

`[root@localhost ~]# scws -r /etc/scws/rules.utf8.ini -d /usr/share/scws/dict.utf8.xdb -c utf8 -I -A -t 5 -a~v -i 110.txt`
No. WordString Attr Weight(times)