Comments (7)
每行下方的数字为attr的长度,用get_tops时和命令行调用一致,attr长度正常
电影 n 25.0200004577637(6)
1
创作 vn 14.8199996948242(3)
2
版权 n 14.7600002288818(3)
1
专有 vn 13.9399995803833(2)
2
陈凯歌 nr 11.8699998855591(1)
2
[root@localhost ~]# scws -r /etc/scws/rules.utf8.ini -d /usr/share/scws/dict.utf8.xdb -c utf8 -I -A -t 5 -a~v -i 110.txt
No. WordString Attr Weight(times)
- 电影 n 25.02(6)
- 创作 vn 14.82(3)
- 版权 n 14.76(3)
- 专有 vn 13.94(2)
- 陈凯歌 nr 11.87(1)
from scws.
对scws中的get_words进行修改
-
top = (scws_top_t) malloc(sizeof(struct scws_topword));
-
top = (scws_top_t) pmalloc_z(xt->p, sizeof(struct scws_topword));�
就出现分段错误的提示
from scws.
对cli中的scws_cmd修改,显示get_words中attr的长度,确实发现有问题,补丁文件
//--- scws-1.2.3/cli/scws_cmd.c.orig 2013-01-06 13:39:51
//+++ scws-1.2.3/cli/scws_cmd.c 2022-12-22 20:42:38
//@@ -286,6 +286,25 @@
// fprintf(fout, "EMPTY records!\n");
// }
//
//+ fprintf(fout, "No. WordString Attr Weight(times)\n");
//+ fprintf(fout, "-------------------------------------------------\n");
//+ if ((top = xtop = scws_get_words(s, attr)) != NULL)
//+ {
//+ tlimit = 1;
//+ while (xtop != NULL)
//+ {
//+ fprintf(fout, "%02d. %-24.24s %-4.2s %.2f(%d) %4d-\n",
//+ tlimit, xtop->word, xtop->attr, xtop->weight, xtop->times, strlen(xtop->attr));
//+ xtop = xtop->next;
//+ tlimit++;
//+ }
//+ scws_free_tops(top);
//+ }
//+ else
//+ {
//+ fprintf(fout, "EMPTY records!\n");
//+ }
//+
// if (xmode & XMODE_STAT_FILE)
// free(str);
// }
显示结果最后一列为attr的长度
- 研究 vn 4.45(1) 5-
- 生命科学 n 7.37(1) 1-
- 北京 ns 6.35(1) 5-
- 大学生 n 4.70(1) 1-
- 喝 vn 0.00(1) 5-
- 进口 vn 4.87(1) 5-
- 红酒 n 6.17(1) 1-
from scws.
from scws.
get_tops同样是复制2字节没有这情况,不明白get_words为啥会出现超过2个字节就不等长的情况
from scws.
from scws.
strlen只能处理以'\0'结尾的字符串,xtop->attr只是一个两元素的字符数组
from scws.
Related Issues (20)
- php7.3扩展编译出错
- ubuntu18安装后执行命令报错 HOT 5
- php ffi 支持的dll怎么编译
- 请问如何使用pecl安装scws extension? HOT 2
- 项目停止维护了吗?词库还停留在16年,太老旧了把,啥时候更新 HOT 4
- php8.0.0 编译错误 1.2.3 HOT 3
- 自定义词典貌似无效 HOT 1
- 请问下目前scws-gen-dict是否支持将 xdb转换为txt格式
- 如何自定义rules.ini HOT 2
- 大佬,php8+的编译能不能适配一下~~~ HOT 2
- php-8.1.7编译php扩展时在make时出错 HOT 1
- _xtree_node_search() 函数中compare string的实现有bug? HOT 3
- 编译报错may be used uninitialized in this function HOT 1
- 分词不成功,只要是中文都是一个个字分开了,所有中文都识别成了un属性 HOT 2
- 请问,分词过程中能否设置只对中文进行分词吗? HOT 1
- 直接git clone下来的代码,编译php8.0扩展的时候报错了 HOT 4
- php8.1 分词碰到空格 或者 特殊的符号啥的 比如 ++ 后面的词就不会继续分了 HOT 1
- 建议:给xattr参数加const
- 建议:统一xdb与xtree的结构
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scws.