Giter Club home page Giter Club logo

redshift's People

Contributors

aclark4life avatar antoniomo avatar syllog1sm avatar yoavg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

redshift's Issues

Non-english usage

Hi,

I'm trying to use redshift with non-english languages. Is there any way to regenerate case file (index/english.case) and vocab cluster file (index/bllip-clusters)?

Thanks.

Non ascii chars in train file

Hi
I was training the redshift for an input with some non ascii characters and I encountered errors
I passed errors by replacing them but my goal is to train it for persian data and it will surely encounter with errors
I heared about some solution like transliterals but i know nothing about
I want to khow is that the best solution or you suggest better solutions?
thanks

Compilation error

Anyone seen this compilation error before? I'm getting a handful of these w/ Cython 0.21.1.

redshift/parser.pyx:263:40: Cannot assign type 'void *(Pool, int, void *)' to 'init_funct_t'

Unclear running instructions

Hi, I was trying to run the code, but it relies on some (model?) files. What does /tmp/stanford_beam8 mean?
Also, the script parser.py asks for a parser location directory. Which one should I specify? Apparently, the loading module looks for parser.cfg, but I couldn't find it in the distribution.

Could you improve the README? Thanks!

Error using POS tagger

Hi there. I'm trying to use your POS tagger and I'm getting the following error when I attempt to train on a very small sample (10 sentences) from the Penn Treebank WSJ dataset. Any thoughts as to what I'm doing wrong?

In [2]: from redshift.tagger import train

In [3]: train(open('wsj.10.txt', 'r').read(), 'redshift_model')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-4-16d6fd520844> in <module>()
----> 1 train(open('wsj.10.txt', 'r').read(), 'redshift_model')

/Library/Python/2.7/site-packages/redshift/tagger.so in redshift.tagger.train (redshift/tagger.cpp:2391)()

/Library/Python/2.7/site-packages/redshift/tagger.so in redshift.tagger.Tagger.train_sent (redshift/tagger.cpp:4013)()

/Library/Python/2.7/site-packages/thinc/learner.so in thinc.learner.LinearModel.update (thinc/learner.cpp:2395)()

AssertionError: 

Can't train CoNLL formatted file

I have been struggling to find freely available training CoNLL data for Redshift. I have finally found that using http://www.anc.org:8080/ANC2Go/ you can export Treebank in CoLNN. However the trainer fails with the following error:
Traceback (most recent call last):
File "./scripts/train.py", line 54, in
plac.call(main)
File "/home/3TOP/fscharf/virt_env/3top_dev/lib/python2.6/site-packages/plac_core.py", line 309, in call
cmd, result = parser_from(obj).consume(arglist)
File "/home/3TOP/fscharf/virt_env/3top_dev/lib/python2.6/site-packages/plac_core.py", line 195, in consume
return cmd, self.func(_(args + varargs + extraopts), *_kwargs)
File "./scripts/train.py", line 48, in main
train_data = redshift.io_parse.read_conll(train_str, unlabelled=unlabelled)
File "io_parse.pyx", line 129, in redshift.io_parse.read_conll (redshift/io_parse.cpp:2860)
ValueError: too many values to unpack (expected 4)

It looks like a format problem...
Also, is there a way to pass a folder as an argument to the trainer so all the files are used ?

Runtime problem on OSX

So I decided to try redshift on OSX and had problem running it. Works fine on Linux.

>>> import lexicon
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named lexicon
>>> import index.lexicon
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "index/__init__.py", line 1, in <module>
    import index.lexicon
ImportError: dlopen(index/lexicon.so, 2): Symbol not found: __ZNSs4_Rep20_S_empty_rep_storageE
  Referenced from: index/lexicon.so
  Expected in: dynamic lookup

Quick googling, found out that I need to link against libstdc++. Apply the following patch, rebuild.

diff --git a/setup.py b/setup.py
index 3660b56..cf91b5a 100755
--- a/setup.py
+++ b/setup.py
@@ -61,6 +61,7 @@ exts = [
               include_dirs=includes),
     Extension("index.lexicon", ["index/lexicon.pyx", "ext/MurmurHash2.cpp",
                                "ext/MurmurHash3.cpp"], language="c++",
+              extra_link_args=['-lstdc++'],
               include_dirs=includes),

     Extension("features.extractor", ["features/extractor.pyx", "ext/MurmurHash2.cpp",

Cannot install on OS X

Greetings!

I'm really excited to start using this, however I'm not able to install it on my Mac. Here is a link to the message I'm getting: http://pastebin.com/j9cNzdv7. I have followed your installation instructions, but I'm still not able to get it to work. Thanks so much for developing this. I can't wait to get to use it!

Documentation for redshift

Hello! I am using redshift for the first time, and I am unable to find any documentation regarding the functionalities of the libraries. Can you please tell me where I can start with redshift?

Also, can you tell me what /tmp/stanford_beam8 is?

Running disfluency parser

Hello, I am interested in your joint dependency parser with disfluency detection.

I want to try running it, but it fails, even though I read the previous issue and
paid attention to the version pinning while installing.

I am trying to install it on a machine with ubuntu 12.04.
What I did in installing is almost the same recipe you wrote in installation in README.rst,
But just I did "git checkout develop" not after cloning, but almost in the last part of the installation,
just before "fab make test",
because doing "git checkout develop" strips away information about the version pinning.
But this ends up with an error with message:

index/lexicon.cpp:249:36: fatal error: murmurhash/MurmurHash3.h: No such file or directory
 #include "murmurhash/MurmurHash3.h"
                                    ^
compilation terminated.
error: command 'gcc' failed with exit status 1

Fatal error: local() encountered an error (return code 1) while executing 'python setup.py build_ext --inplace'

Aborting.

Though I changed to do "git checkout" as explained, and replace the requirements.txt with
the one with the version pinnings before "pip install -r requirements.txt", I got the same error.
How should I solve this problem??

still cant install ubuntu 14.04

i tried every possibility to install redshift but no luck.
i get this error every time:
"error: command 'gcc' failed with exit status 1

Fatal error: local() encountered an error (return code 1) while executing 'python setup.py build_ext --inplace'
"

i guess problem is with cython,

Cannot install on Ubuntu

Hello, thank you for your amazing work on redshift , It's a glass of water in NLP hell , I'm planning on using in WSD research but I keep failing to install it on my machine , with the errors logged on the joined pastebin dump Here . I hope you will find the time to help me .
Thanks in advance,
Amine

set/replace sentence.Input's token label

Hi mate,

I couldn't find the way to set/replace sentence.Input token label, so I made a workaround in sentence.pyx

    def set_label(self, i, label):
        self.c_sent.tokens[i].label = index.hashes.encode_label(label)

I wonder is there better way to do that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.