Giter Club home page Giter Club logo

xmnlp's People

Contributors

amchii avatar seanlee97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xmnlp's Issues

纠错字典表

您好,这个工具对我很有帮助,非常感谢您的分享。我对纠错功能很感兴趣,请问纠错可以自己设置一些非登陆词的字典表吗?我在使用过程中发现某些专有的特殊词汇会被错误地改正。

python2 模型加载慢问题

已修复,python2.7使用了更高效的cPickle来完成模型持久话。不过相比之下还是推荐使用python3,python3有更好的性能。

开发文档

请问作者有没有类似开发文档这样的东西,比较细致一点的,如介绍模型的整体架构、函数组成等,还有召回率、准确率等数据的话简直就拜谢您了。如果有的话,是否方便分享一下?

Original error was: PyCapsule_Import could not import module "datetime"

请问一下这是什么问题?放在win7就只能UserWarning: Unsupported Windows version (7). ONNX Runtime supports Windows 10 and above, only.
warnings.warn('Unsupported Windows version (%s). ONNX Runtime supports Windows 10 and above, only.' %
Lazy load checker...

但放到win10就变成以下这样了。

init.py 22
from . import multiarray

multiarray.py 12
from . import overrides

overrides.py 7
from numpy.core._multiarray_umath import (

ImportError:
PyCapsule_Import could not import module "datetime"


xmnlp测试.py 1
import xmnlp

init.py 15
from xmnlp import config

init.py 6
from xmnlp.utils import load_stopword

init.py 12
import numpy as np

init.py 140
from . import core

init.py 48
raise ImportError(msg)

ImportError:

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  • The Python version is: Python3.7 from "D:\LLD\python3\python.exe"
  • The NumPy version is: "1.19.5"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: PyCapsule_Import could not import module "datetime"

安装不成功报错是为啥啊

报错如下
ERROR: Could not find a version that satisfies the requirement scikit-learn (from xmnlp) (from versions: none)
ERROR: No matching distribution found for scikit-learn

文本纠错

你好,

我想问一下,这个文本纠错的功能是只能将一个字换成另一个字么?对于少字,多字的错误可以解决么?

情感分析

非常感谢您的分享,正在学习这个项目,对情感分析很感兴趣。有点困惑想请教一下,您在做情感分析时是采用什么方法做的特征选择呢?最终计算得到的情感数值是根据什么计算得到的呢?再次感谢~还有您这个项目有没有学习交流的QQ群之类的呢?

python3 userdict.txt 加载错误

我的环境:win10中文版,python3.6

examples的错误信息

UnicodeDecodeError: 'gbk' codec can't decode byte 0xaa in position 35: illegal multibyte sequence

我的修复:
dag.py line 159: with open(fname, 'r',encoding='utf-8') as f:

网盘分享的训练语料已失效

还请作者有空修复一下。
另外文本纠错方面,除了汉字的编辑距离,是否加上拼音方面的编辑距离,再去评估bi-gram会更加合理?

新版本在linux上使用报错

Traceback (most recent call last):
File "normal_keywords.py", line 49, in
keywords_list += xmnlp.seg(text)
File "/home/ubuser/anaconda3/envs/gpu/lib/python3.8/site-packages/xmnlp/lexical/init.py", line 53, in seg
load_lexical()
File "/home/ubuser/anaconda3/envs/gpu/lib/python3.8/site-packages/xmnlp/lexical/init.py", line 46, in load_lexical
lexical = LexicalDecoder(
File "/home/ubuser/anaconda3/envs/gpu/lib/python3.8/site-packages/xmnlp/lexical/lexical_model.py", line 45, in init
self.lexical_model = LexicalModel(os.path.join(model_dir, 'lexical.onnx'))
File "/home/ubuser/anaconda3/envs/gpu/lib/python3.8/site-packages/xmnlp/base_model.py", line 11, in init
self.sess = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
File "/home/ubuser/anaconda3/envs/gpu/lib/python3.8/site-packages/onnxruntime/capi/session.py", line 158, in init
self._load_model(providers or [])
File "/home/ubuser/anaconda3/envs/gpu/lib/python3.8/site-packages/onnxruntime/capi/session.py", line 177, in _load_model
self._sess.load_model(providers)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:Embedding-Token/NotEqual : No Op registered for Equal with domain_version of 13

关于checker.py的一个疑问

    def calc_proba(self, gram):
        x = self.bi[tuple(gram)]
        y = self.uni[gram[0]]
        return float((x + 1)) / (y + len(self.uni.keys())**2)

这段代码的作用是smoothing吧?为什么是用y + len(self.uni.keys())**2而不是y + len(self.uni.keys())呢?

xmnlp.seg(text)效果不是很好

text='7月1日,世预赛亚洲区12强赛抽签举行,**队分在B组。同组对手是日本、澳大利亚、沙特、阿曼、越南。体育博主潘伟力在个人微博上表示,国足应把目标定在小组第二,第三意义不大。'
xmnlp.seg(text)
['7月1日', ',', '世', '预赛', '亚洲区', '12', '强赛', '抽签', '举行', ',', '**队', '分', '在', 'B', '组', '。', '同', '组', '对手', '是', '日本', '、', '澳大利亚', '、', '沙特', '、', '阿曼', '、', '越南', '。', '体育博主', '潘伟力', '在', '个人', '微博', '上', '表示', ',', '国', '足', '应', '把', '目标', '定', '在', '小组', '第二', ',', '第三意义', '不大', '。']

训练的纠错模型不生效

我重新训练了下examples/corpus/checker.txt文件,生成的models/checker.pickle.3替换了xmnlp/checker/下的checker.pickle.3,但是运行examples/checker.py纠错不生效
error: """这理风景绣丽,而且天汽不错,我的心情各外舒畅!"""
correct:"""这理风景绣丽,而且天汽不错,我的心情各外舒畅!"""

缩短纠错时间

您好,您的代码对我最近做的拼音纠错有很大帮助,拼音纠错的效果非常好,非常感谢你的分享,但是,我想把时间缩短到200ms,您有什么建议吗?

java版的包哪里可以下载到?

我看到java项目文件中有人用到1.4版本的xmnlp包,可是maven公仓以及百度谷歌都搜不到这个包,请问哪里可以找到呢。

代码勘误

base_model.py内对onnxruntime调用代码有误

`# -- coding: utf-8 --

from abc import ABCMeta, abstractmethod

import onnxruntime as ort

class BaseModel(metaclass=ABCMeta):

def __init__(self, model_path: str):
    self.sess = ort.InferenceSession(model_path, providors=['CPUExecutionProvider'])

@abstractmethod
def predict(self):
    raise NotImplementedError`

是providers不是providors

关于特殊名词

类似债券简称,比如“02进出04”,特殊名词比如“5G”,我发现在分词的时候会打散

关于部首提取

请问是根据新华字典的标注对字进行部首提取的吗?

分词及userdict的格式

分词j结果:
记住钥匙放在厨房餐桌上 ->记住 / 钥匙 / 放在 / 厨房 / 餐桌上
"记住钥匙放在厨房桌子上" ->记住 / 钥匙 / 放在 / 厨房 / 桌子 / 上

应该是餐桌没有在字典中。我在examples中的userdict增加了餐桌也没有用,如何增加字典值,词后面的5 nw是什么意思,都有哪些选择

人工智能 5 nw
机器学习 5

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.