Comments (7)
其实我还有一个问题哈,直接切出来的 ngram的量级非常大,即使采用外部文件归并排序的话也会很久,排序结果计算freq也很久,为什么不一边切,一边构建dict,统计ngram,最后把ngarm 及其频次输出到文件中,再排序的话量级会减少很多。这块我一直有个疑问哈?
哦哦,看到你的fastbuilder了,666
from dict_build.
@DBtxy 六年了,可能我对这个代码,还没有你熟悉的。我看看先~
from dict_build.
@DBtxy 六年了,可能我对这个代码,还没有你熟悉的。我看看先~
哈哈,好的好的,不急
from dict_build.
@DBtxy 六年了,可能我对这个代码,还没有你熟悉的。我看看先~
哈哈,好的好的,不急
现在跑起来,没有问题吧
from dict_build.
@DBtxy 六年了,可能我对这个代码,还没有你熟悉的。我看看先~
哈哈,好的好的,不急
现在跑起来,没有问题吧
跑应该是没有问题的,后面逻辑把这个边界问题产生的影响覆盖住了,可能就几个字符串的差异
from dict_build.
够细心,我抽空看看,然后交流交流。
from dict_build.
其实我还有一个问题哈,直接切出来的 ngram的量级非常大,即使采用外部文件归并排序的话也会很久,排序结果计算freq也很久,为什么不一边切,一边构建dict,统计ngram,最后把ngarm 及其频次输出到文件中,再排序的话量级会减少很多。这块我一直有个疑问哈?
from dict_build.
Related Issues (20)
- 能不能讲解下您这个算法的思路? HOT 1
- 输出结果分别代表什么意思 HOT 4
- 应该直接引入停词库 HOT 1
- 请问一下第五列位置成词概率是怎么算出来的? HOT 11
- 左右信息熵计算问题 HOT 3
- windows HOT 2
- 为什么提取出来的结果都是空的? HOT 6
- words_sort.data 无结果 HOT 20
- linux和windows同样数据跑的结果不一样 HOT 2
- 关于 pmi 的计算 HOT 8
- words.data和words_sort.data为空 HOT 5
- 最终的排序只按词频合理吗 HOT 7
- 小白问下第三步什么意思啊,点开后直接闪退 HOT 1
- 设备上没有空间 HOT 5
- words.data和words_sort.data为空的问题已解决。并在win mac linux上测试 HOT 1
- total 与freq的计算问题 HOT 4
- 关于 isChinese 的字符编码范围 HOT 1
- 请问怎么用gradle编译的 HOT 2
- 抽取结果与示例不太一致 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dict_build.