Giter Club home page Giter Club logo

remine's Issues

Cannot reproduce the results of the paper

the output of kdd branch still has the same issue that addressed in #25

raw data raw_train.json is missed in code and cannot run the re-train process

output files:

results_remine/remine_result.txt

1	man | have , | medical
1	man | have , | neglect
1	man | have , | month
1	health | | Home
1	health | | operation
1	risk of Minnesota | | state
2	people | have , | Conakry
2	people | have , | capital
2	people | have , | Hospital
3	Democrat | | Nebraska
4	something | | for
4	New | by , , , | something

results_remine/remine_segmentation.txt

Gov. :RP]_[Tim :EP]_[Pawlenty of :BP]_[Minnesota :BP]_[order :EP]_[the :RP]_[state :EP]_[health :EP]_[department :EP]_[this :EP]_[month :EP]_[to :EP]_[monitor]_[day-to-day :BP]_[operation :EP]_[at :EP]_[the :RP]_[Minneapolis :BP]_[Veterans :EP]_[Home :EP]_[after :EP]_[state :EP]_[inspector]_[find :RP]_[that :RP]_[three :EP]_[man :EP]_[have :RP]_[die :EP]_[there :EP]_[in :EP]_[the :RP]_[previous :EP]_[month :EP]_[because :BP]_[of :EP]_[neglect :BP]_[or :EP]_[medical :BP]_[error :RP]_[. :RP]_[
the :RP]_[aid :RP]_[group]_[doctor :EP]_[without :EP]_[border :EP]_[say :BP]_[that :RP]_[since :RP]_[Saturday :BP]_[, :RP]_[more :RP]_[than :EP]_[275 :EP]_[wounded :BP]_[people :EP]_[have :RP]_[be]_[admit :BP]_[and :EP]_[treat :EP]_[at :EP]_[Donka]_[Hospital :EP]_[in :EP]_[the :RP]_[capital :EP]_[of :EP]_[Guinea :BP]_[, :RP]_[Conakry :BP]_[. :RP]_[
the :RP]_[american :BP]_[people :EP]_[can :EP]_[see :EP]_[what :EP]_[be]_[happen :BP]_[here :RP]_[, :RP]_[say :BP]_[Senator :BP]_[Ben :BP]_[Nelson]_[, :RP]_[Democrat :BP]_[of :EP]_[Nebraska :BP]_[. :RP]_[
for :BP]_[million :RP]_[, :RP]_[it :EP]_[be]_[a :EP]_[tough :BP]_[day :EP]_[of :EP]_[coping :BP]_[-- :BP]_[of :EP]_[watch :EP]_[floodwater :BP]_[pour]_[into :BP]_[home :EP]_[on :EP]_[the :RP]_[Raritan :BP]_[River :BP]_[in :EP]_[New :EP]_[Jersey :BP]_[from :RP]_[New :EP]_[Brunswick]_[to :EP]_[bind :EP]_[Brook :BP]_[and :EP]_[in :EP]_[the :RP]_[Westchester]_[suburb]_[of :EP]_[Mamaroneck :EP]_[and :EP]_[New :EP]_[Rochelle]_[, :RP]_[of :EP]_[sleep :EP]_[in :EP]_[a :EP]_[shelter :BP]_[or :EP]_[a :EP]_[airport :EP]_[, :RP]_[of :EP]_[tow :BP]_[a :EP]_[car :BP]_[and :EP]_[watch :EP]_[a :EP]_[refrigerator]_[float]_[by :RP]_[, :RP]_[of :EP]_[get :EP]_[to :EP]_[work]_[despite :EP]_[flood :BP]_[road :BP]_[and :EP]_[erratic :BP]_[train :BP]_[, :RP]_[of :EP]_[wait :BP]_[for :BP]_[power :BP]_[or :EP]_[a :EP]_[water :RP]_[pump :EP]_[or :EP]_[just :BP]_[something :BP]_[to :EP]_[hope :BP]_[for :BP]_[. :RP]_[

tmp_remine/remine_tokenized_segmented_sentences.txt

1	5663| 141 , | 983
1	5663| 141 , | 44243
1	5663| 141 , | 2668
1	1931| | 18561
1	1931| | 5053
1	2 3 72917| | 1519
2	3245| 141 , | 127954
2	3245| 141 , | 7358
2	3245| 141 , | 3303
3	60075| | 51787
4	18944| | 70
4	1269| 319 , 24 , | 18944
4	1269| | 21073
4	1269| | 89926

No such file or directory

mldl@mldlUB1604:~/ub16_prj/gh-shanzhenren/ReMine$ bash train.sh
===Entity Linking===
Traceback (most recent call last):
File "src_py/distantSupervision.py", line 28, in
utils.getEntity(args.in1, args.out, args.opt)
File "/home/mldl/ub16_prj/gh-shanzhenren/ReMine/src_py/utils.py", line 29, in getEntity
with open(file_path) as IN, open(output, 'w') as OUT:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/train_nyt.json'
((VB|VBD|VBG|VBN|VBN|VBP|VBZ) )+((NN.{0,2}|JJ.{0,1}|RB.{0,1}|PRP.{0,1}|DT ))+((IN|RP) +)|((VB|VBD|VBG|VBN|VBN|VBP|VBZ) )+((IN|RP) +)|((VB|VBD|VBG|VBN|VBN|VBP|VBZ) )+|(NN.{0,2})+((IN|RP) +)
Traceback (most recent call last):
File "src_py/distantSupervision.py", line 24, in
utils.relationLinker(args.in1, args.in2)
File "/home/mldl/ub16_prj/gh-shanzhenren/ReMine/src_py/utils.py", line 72, in relationLinker
with open(file_path,'r') as IN:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/train_nyt.json'
===Tokenizaztion===
Traceback (most recent call last):
File "src_py/preprocessing.py", line 396, in
tmp.tokenized_train(args.in1, args.in2, args.in3)
File "src_py/preprocessing.py", line 125, in tokenized_train
with open(docIn, encoding='utf-8') as doc, open(posIn, encoding='utf-8') as pos, open(depIn, encoding='utf-8') as dep:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/total.lemmas.txt'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 404, in
tmp.chunk_train(args.in1, args.in2)
File "src_py/preprocessing.py", line 51, in chunk_train
with open(docIn, encoding='utf-8') as doc, open(posIn, encoding='utf-8') as pos, open('tmp_remine/boost_patterns.txt', 'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/total.lemmas.txt'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 407, in
tmp.tokenize(args.in1, args.out)
File "src_py/preprocessing.py", line 222, in tokenize
with open(docIn, encoding='utf-8') as doc, open(docOut,'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'data/stopwords.txt'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 407, in
tmp.tokenize(args.in1, args.out)
File "src_py/preprocessing.py", line 222, in tokenize
with open(docIn, encoding='utf-8') as doc, open(docOut,'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/nyt.entities'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 407, in
tmp.tokenize(args.in1, args.out)
File "src_py/preprocessing.py", line 222, in tokenize
with open(docIn, encoding='utf-8') as doc, open(docOut,'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/nyt.relations'

Multilingual

Great job! Does this tool support Chinese?

Is the output of default model very poor?

1       man | have , | medical
1       man | have , | neglect
1       risk of Minnesota | | state
1       man | have , | month
1       health | | operation
1       health | | Home
2       people | have , | Conakry
2       people | have , | capital
2       people | have , | Hospital
3       Democrat | | Nebraska
4       New | by , , , | something
4       New | | shelter
4       day | | coping
4       New | | airport
4       day | | floodwater
4       New | | car
4       day | | home
4       something | | for

compile error

g++ -std=c++11 -Wall -O3 -msse2 -fopenmp -I..  -pthread -lm -Wno-unused-result -Wno-sign-compare -Wno-unused-variable -Wno-parentheses -Wno-format -o bin/remine_train src/main.cpp
In file included from src/utils/parameters.h:4:0,
                 from src/main.cpp:1:
src/utils/../utils/utils.h:29:0: warning: ignoring #pragma omp declare [-Wunknown-pragmas]
 #pragma omp declare reduction(vec_double_plus : std::vector<double> : \
 ^
In file included from src/classification/../classification/predict_quality.h:7:0,
                 from src/classification/label_generation.h:8,
                 from src/classification/feature_extraction.h:6,
                 from src/main.cpp:6:
src/classification/../classification/random_forest.h: In member function ‘std::pair<int, double> RandomForestRelated::RandomForest::estimate(std::vector<double>&)’:
src/classification/../classification/random_forest.h:365:46: error: expected ‘+’, ‘*’, ‘-’, ‘&’, ‘^’, ‘|’, ‘&&’, ‘||’, ‘min’ or ‘max’ before ‘vec_double_plus’
         # pragma omp parallel for reduction (vec_double_plus : sum)
                                              ^
make: *** [bin/remine_train] Error 1

My g++ version is 4.8. Maybe the OpenMP version is too low in my computer, would you please show me the right version. Thanks.

Multi-class random forest classification

Hi Jingbo,

Can you take a look at random forest classifiers? I tried DP Learn but it doesn't work as I expected. Is it possible to output a class label {entity,relation,background} and a probability between [0,1] as well? @shangjingbo1226

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.