gentlezhu / remine Goto Github PK
View Code? Open in Web Editor NEWIntegrating Local Context and Global Cohesiveness for Open Information Extraction(WSDM'19)
Integrating Local Context and Global Cohesiveness for Open Information Extraction(WSDM'19)
the output of kdd branch still has the same issue that addressed in #25
raw data raw_train.json is missed in code and cannot run the re-train process
results_remine/remine_result.txt
1 man | have , | medical
1 man | have , | neglect
1 man | have , | month
1 health | | Home
1 health | | operation
1 risk of Minnesota | | state
2 people | have , | Conakry
2 people | have , | capital
2 people | have , | Hospital
3 Democrat | | Nebraska
4 something | | for
4 New | by , , , | something
results_remine/remine_segmentation.txt
Gov. :RP]_[Tim :EP]_[Pawlenty of :BP]_[Minnesota :BP]_[order :EP]_[the :RP]_[state :EP]_[health :EP]_[department :EP]_[this :EP]_[month :EP]_[to :EP]_[monitor]_[day-to-day :BP]_[operation :EP]_[at :EP]_[the :RP]_[Minneapolis :BP]_[Veterans :EP]_[Home :EP]_[after :EP]_[state :EP]_[inspector]_[find :RP]_[that :RP]_[three :EP]_[man :EP]_[have :RP]_[die :EP]_[there :EP]_[in :EP]_[the :RP]_[previous :EP]_[month :EP]_[because :BP]_[of :EP]_[neglect :BP]_[or :EP]_[medical :BP]_[error :RP]_[. :RP]_[
the :RP]_[aid :RP]_[group]_[doctor :EP]_[without :EP]_[border :EP]_[say :BP]_[that :RP]_[since :RP]_[Saturday :BP]_[, :RP]_[more :RP]_[than :EP]_[275 :EP]_[wounded :BP]_[people :EP]_[have :RP]_[be]_[admit :BP]_[and :EP]_[treat :EP]_[at :EP]_[Donka]_[Hospital :EP]_[in :EP]_[the :RP]_[capital :EP]_[of :EP]_[Guinea :BP]_[, :RP]_[Conakry :BP]_[. :RP]_[
the :RP]_[american :BP]_[people :EP]_[can :EP]_[see :EP]_[what :EP]_[be]_[happen :BP]_[here :RP]_[, :RP]_[say :BP]_[Senator :BP]_[Ben :BP]_[Nelson]_[, :RP]_[Democrat :BP]_[of :EP]_[Nebraska :BP]_[. :RP]_[
for :BP]_[million :RP]_[, :RP]_[it :EP]_[be]_[a :EP]_[tough :BP]_[day :EP]_[of :EP]_[coping :BP]_[-- :BP]_[of :EP]_[watch :EP]_[floodwater :BP]_[pour]_[into :BP]_[home :EP]_[on :EP]_[the :RP]_[Raritan :BP]_[River :BP]_[in :EP]_[New :EP]_[Jersey :BP]_[from :RP]_[New :EP]_[Brunswick]_[to :EP]_[bind :EP]_[Brook :BP]_[and :EP]_[in :EP]_[the :RP]_[Westchester]_[suburb]_[of :EP]_[Mamaroneck :EP]_[and :EP]_[New :EP]_[Rochelle]_[, :RP]_[of :EP]_[sleep :EP]_[in :EP]_[a :EP]_[shelter :BP]_[or :EP]_[a :EP]_[airport :EP]_[, :RP]_[of :EP]_[tow :BP]_[a :EP]_[car :BP]_[and :EP]_[watch :EP]_[a :EP]_[refrigerator]_[float]_[by :RP]_[, :RP]_[of :EP]_[get :EP]_[to :EP]_[work]_[despite :EP]_[flood :BP]_[road :BP]_[and :EP]_[erratic :BP]_[train :BP]_[, :RP]_[of :EP]_[wait :BP]_[for :BP]_[power :BP]_[or :EP]_[a :EP]_[water :RP]_[pump :EP]_[or :EP]_[just :BP]_[something :BP]_[to :EP]_[hope :BP]_[for :BP]_[. :RP]_[
tmp_remine/remine_tokenized_segmented_sentences.txt
1 5663| 141 , | 983
1 5663| 141 , | 44243
1 5663| 141 , | 2668
1 1931| | 18561
1 1931| | 5053
1 2 3 72917| | 1519
2 3245| 141 , | 127954
2 3245| 141 , | 7358
2 3245| 141 , | 3303
3 60075| | 51787
4 18944| | 70
4 1269| 319 , 24 , | 18944
4 1269| | 21073
4 1269| | 89926
Jingbo, please add this function into src/data/documents.h, thanks!
g++ -std=c++11 -Wall -O3 -msse2 -fopenmp -I.. -pthread -lm -Wno-unused-result -Wno-sign-compare -Wno-unused-variable -Wno-parentheses -Wno-format -o bin/remine_train src/main.cpp
In file included from src/utils/parameters.h:4:0,
from src/main.cpp:1:
src/utils/../utils/utils.h:29:0: warning: ignoring #pragma omp declare [-Wunknown-pragmas]
#pragma omp declare reduction(vec_double_plus : std::vector<double> : \
^
In file included from src/classification/../classification/predict_quality.h:7:0,
from src/classification/label_generation.h:8,
from src/classification/feature_extraction.h:6,
from src/main.cpp:6:
src/classification/../classification/random_forest.h: In member function ‘std::pair<int, double> RandomForestRelated::RandomForest::estimate(std::vector<double>&)’:
src/classification/../classification/random_forest.h:365:46: error: expected ‘+’, ‘*’, ‘-’, ‘&’, ‘^’, ‘|’, ‘&&’, ‘||’, ‘min’ or ‘max’ before ‘vec_double_plus’
# pragma omp parallel for reduction (vec_double_plus : sum)
^
make: *** [bin/remine_train] Error 1
My g++ version is 4.8. Maybe the OpenMP version is too low in my computer, would you please show me the right version. Thanks.
mldl@mldlUB1604:~/ub16_prj/gh-shanzhenren/ReMine$ bash train.sh
===Entity Linking===
Traceback (most recent call last):
File "src_py/distantSupervision.py", line 28, in
utils.getEntity(args.in1, args.out, args.opt)
File "/home/mldl/ub16_prj/gh-shanzhenren/ReMine/src_py/utils.py", line 29, in getEntity
with open(file_path) as IN, open(output, 'w') as OUT:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/train_nyt.json'
((VB|VBD|VBG|VBN|VBN|VBP|VBZ) )+((NN.{0,2}|JJ.{0,1}|RB.{0,1}|PRP.{0,1}|DT ))+((IN|RP) +)|((VB|VBD|VBG|VBN|VBN|VBP|VBZ) )+((IN|RP) +)|((VB|VBD|VBG|VBN|VBN|VBP|VBZ) )+|(NN.{0,2})+((IN|RP) +)
Traceback (most recent call last):
File "src_py/distantSupervision.py", line 24, in
utils.relationLinker(args.in1, args.in2)
File "/home/mldl/ub16_prj/gh-shanzhenren/ReMine/src_py/utils.py", line 72, in relationLinker
with open(file_path,'r') as IN:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/train_nyt.json'
===Tokenizaztion===
Traceback (most recent call last):
File "src_py/preprocessing.py", line 396, in
tmp.tokenized_train(args.in1, args.in2, args.in3)
File "src_py/preprocessing.py", line 125, in tokenized_train
with open(docIn, encoding='utf-8') as doc, open(posIn, encoding='utf-8') as pos, open(depIn, encoding='utf-8') as dep:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/total.lemmas.txt'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 404, in
tmp.chunk_train(args.in1, args.in2)
File "src_py/preprocessing.py", line 51, in chunk_train
with open(docIn, encoding='utf-8') as doc, open(posIn, encoding='utf-8') as pos, open('tmp_remine/boost_patterns.txt', 'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/total.lemmas.txt'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 407, in
tmp.tokenize(args.in1, args.out)
File "src_py/preprocessing.py", line 222, in tokenize
with open(docIn, encoding='utf-8') as doc, open(docOut,'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'data/stopwords.txt'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 407, in
tmp.tokenize(args.in1, args.out)
File "src_py/preprocessing.py", line 222, in tokenize
with open(docIn, encoding='utf-8') as doc, open(docOut,'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/nyt.entities'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 407, in
tmp.tokenize(args.in1, args.out)
File "src_py/preprocessing.py", line 222, in tokenize
with open(docIn, encoding='utf-8') as doc, open(docOut,'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/nyt.relations'
Hi Jingbo,
Can you take a look at random forest classifiers? I tried DP Learn but it doesn't work as I expected. Is it possible to output a class label {entity,relation,background} and a probability between [0,1] as well? @shangjingbo1226
Great job! Does this tool support Chinese?
1 man | have , | medical
1 man | have , | neglect
1 risk of Minnesota | | state
1 man | have , | month
1 health | | operation
1 health | | Home
2 people | have , | Conakry
2 people | have , | capital
2 people | have , | Hospital
3 Democrat | | Nebraska
4 New | by , , , | something
4 New | | shelter
4 day | | coping
4 New | | airport
4 day | | floodwater
4 New | | car
4 day | | home
4 something | | for
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.