syllog1sm / redshift Goto Github PK

View Code? Open in Web Editor NEW

418.0 418.0 52.0 8.59 MB

Transition-based statistical parser

Python 99.41% Shell 0.59%

redshift's People

Contributors

Stargazers

Watchers

redshift's Issues

Non-english usage

Hi,

I'm trying to use redshift with non-english languages. Is there any way to regenerate case file (index/english.case) and vocab cluster file (index/bllip-clusters)?

Thanks.

Hi
I was training the redshift for an input with some non ascii characters and I encountered errors
I passed errors by replacing them but my goal is to train it for persian data and it will surely encounter with errors
I heared about some solution like transliterals but i know nothing about
I want to khow is that the best solution or you suggest better solutions?
thanks

Compilation error

Anyone seen this compilation error before? I'm getting a handful of these w/ Cython 0.21.1.

redshift/parser.pyx:263:40: Cannot assign type 'void *(Pool, int, void *)' to 'init_funct_t'

Unclear running instructions

Hi, I was trying to run the code, but it relies on some (model?) files. What does /tmp/stanford_beam8 mean?
Also, the script parser.py asks for a parser location directory. Which one should I specify? Apparently, the loading module looks for parser.cfg, but I couldn't find it in the distribution.

Could you improve the README? Thanks!

Not installing on ubuntu

Running with fab make test gives me a bunch of errors

http://pastebin.com/kw20DhbE

I am running on Ubuntu 14..04. How to fix this?

Error using POS tagger

Hi there. I'm trying to use your POS tagger and I'm getting the following error when I attempt to train on a very small sample (10 sentences) from the Penn Treebank WSJ dataset. Any thoughts as to what I'm doing wrong?

In [2]: from redshift.tagger import train

In [3]: train(open('wsj.10.txt', 'r').read(), 'redshift_model')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-4-16d6fd520844> in <module>()
----> 1 train(open('wsj.10.txt', 'r').read(), 'redshift_model')

/Library/Python/2.7/site-packages/redshift/tagger.so in redshift.tagger.train (redshift/tagger.cpp:2391)()

/Library/Python/2.7/site-packages/redshift/tagger.so in redshift.tagger.Tagger.train_sent (redshift/tagger.cpp:4013)()

/Library/Python/2.7/site-packages/thinc/learner.so in thinc.learner.LinearModel.update (thinc/learner.cpp:2395)()

AssertionError:

Can't train CoNLL formatted file

I have been struggling to find freely available training CoNLL data for Redshift. I have finally found that using http://www.anc.org:8080/ANC2Go/ you can export Treebank in CoLNN. However the trainer fails with the following error:
Traceback (most recent call last):
File "./scripts/train.py", line 54, in
plac.call(main)
File "/home/3TOP/fscharf/virt_env/3top_dev/lib/python2.6/site-packages/plac_core.py", line 309, in call
cmd, result = parser_from(obj).consume(arglist)
File "/home/3TOP/fscharf/virt_env/3top_dev/lib/python2.6/site-packages/plac_core.py", line 195, in consume
return cmd, self.func(_(args + varargs + extraopts), *_kwargs)
File "./scripts/train.py", line 48, in main
train_data = redshift.io_parse.read_conll(train_str, unlabelled=unlabelled)
File "io_parse.pyx", line 129, in redshift.io_parse.read_conll (redshift/io_parse.cpp:2860)
ValueError: too many values to unpack (expected 4)

It looks like a format problem...
Also, is there a way to pass a folder as an argument to the trainer so all the files are used ?

Runtime problem on OSX

So I decided to try redshift on OSX and had problem running it. Works fine on Linux.

>>> import lexicon
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named lexicon
>>> import index.lexicon
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "index/__init__.py", line 1, in <module>
    import index.lexicon
ImportError: dlopen(index/lexicon.so, 2): Symbol not found: __ZNSs4_Rep20_S_empty_rep_storageE
  Referenced from: index/lexicon.so
  Expected in: dynamic lookup

Quick googling, found out that I need to link against libstdc++. Apply the following patch, rebuild.

diff --git a/setup.py b/setup.py
index 3660b56..cf91b5a 100755
--- a/setup.py
+++ b/setup.py
@@ -61,6 +61,7 @@ exts = [
               include_dirs=includes),
     Extension("index.lexicon", ["index/lexicon.pyx", "ext/MurmurHash2.cpp",
                                "ext/MurmurHash3.cpp"], language="c++",
+              extra_link_args=['-lstdc++'],
               include_dirs=includes),

     Extension("features.extractor", ["features/extractor.pyx", "ext/MurmurHash2.cpp",

Cannot install on OS X

Greetings!

I'm really excited to start using this, however I'm not able to install it on my Mac. Here is a link to the message I'm getting: http://pastebin.com/j9cNzdv7. I have followed your installation instructions, but I'm still not able to get it to work. Thanks so much for developing this. I can't wait to get to use it!

Documentation for redshift

Hello! I am using redshift for the first time, and I am unable to find any documentation regarding the functionalities of the libraries. Can you please tell me where I can start with redshift?

Also, can you tell me what /tmp/stanford_beam8 is?

Get redshift into pip

Requires figuring out what to do about the dependency on sparsehash.

Running disfluency parser

Hello, I am interested in your joint dependency parser with disfluency detection.

I want to try running it, but it fails, even though I read the previous issue and
paid attention to the version pinning while installing.

I am trying to install it on a machine with ubuntu 12.04.
What I did in installing is almost the same recipe you wrote in installation in README.rst,
But just I did "git checkout develop" not after cloning, but almost in the last part of the installation,
just before "fab make test",
because doing "git checkout develop" strips away information about the version pinning.
But this ends up with an error with message:

index/lexicon.cpp:249:36: fatal error: murmurhash/MurmurHash3.h: No such file or directory
 #include "murmurhash/MurmurHash3.h"
                                    ^
compilation terminated.
error: command 'gcc' failed with exit status 1

Fatal error: local() encountered an error (return code 1) while executing 'python setup.py build_ext --inplace'

Aborting.

Though I changed to do "git checkout" as explained, and replace the requirements.txt with
the one with the version pinnings before "pip install -r requirements.txt", I got the same error.
How should I solve this problem??

Greedy or not?

Hi there! This isn't an issue, so much as a question.

I came across your neat POS tagger implementation by way of this blog post:

https://honnibal.wordpress.com/2013/09/11/a-good-part-of-speechpos-tagger-in-about-200-lines-of-python/

I'm curious... In the post, you describe the greedy implementation and argue that it's plenty accurate, but the implementation here in your repo actually uses a beam search. Do you have accuracy numbers for this implementation?

Fix interface

Add raw text interface.

still cant install ubuntu 14.04

i tried every possibility to install redshift but no luck.
i get this error every time:
"error: command 'gcc' failed with exit status 1

Fatal error: local() encountered an error (return code 1) while executing 'python setup.py build_ext --inplace'
"

i guess problem is with cython,

Cannot install on Ubuntu

Hello, thank you for your amazing work on redshift , It's a glass of water in NLP hell , I'm planning on using in WSD research but I keep failing to install it on my machine , with the errors logged on the joined pastebin dump Here . I hope you will find the time to help me .
Thanks in advance,
Amine

set/replace sentence.Input's token label

Hi mate,

I couldn't find the way to set/replace sentence.Input token label, so I made a workaround in sentence.pyx

    def set_label(self, i, label):
        self.c_sent.tokens[i].label = index.hashes.encode_label(label)

I wonder is there better way to do that?

Reading input before parser is created breaks index.hashes

Currently the parser initialisation starts up index.hashes, so if you read input before you make the parser, the indexing is off.

Missing license

I just noticed that there still isn't a license listed with your code, which really prevents a lot of people from using it.

Also, you list some conflicting goals (FOSS, but no commercial use). There's a great Quora answer on the topic of how you could make it free-for-some, but that would inherently not be open source.

If you want to go for something truly open source, but can't pick a license, choosealicense.com will help you out.

Replace model transport code with protobuf

Should be faster, require less code.

syllog1sm / redshift Goto Github PK

redshift's People

Contributors

Stargazers

Watchers

Forkers

redshift's Issues

Recommend Projects

Recommend Topics

Recommend Org