Comments (12)
Do you have AMiner-Author2Paper.tsv ? @macks22 @shashankg7
from dblp.
@JerryTom121 you don't actually need the .tsv extension. There was a confusing typo in the config-example.py
file that I've now corrected. There was another confusing typo in the run instructions in the readme which I've also fixed. When you download the data, the AMiner-Author2Paper file has a ".txt" extension even though the file is a tab-separated ".tsv" file in reality. This is unimportant for the correct functioning of the program; you can use either extension to name the file.
The real issue was that the config-example.py
file was using the wrong path to look for the data files if you followed the setup instructions exactly. I have updated the example, so please pull the most recent changes and start from step 2 in the instructions again: https://github.com/macks22/dblp#how-to-run-the-complete-pipeline.
Please let me know if you are still having trouble after trying this.
from dblp.
from dblp.
jerry@jerry-virtual-machine:~/citation network/dblp-master/dblp-master/dblp-master/pipeline$ python pipeline.py BuildDataset --start 2000 --end 2016 --local-sch
eduler
DEBUG: Checking if BuildDataset(start=2000, end=2016) is complete
/usr/local/lib/python2.7/dist-packages/luigi/worker.py:328: UserWarning: Task BuildDataset(start=2000, end=2016) without outputs has no custom complete() method
is_complete = task.complete()
DEBUG: Checking if BuildLCCAuthorRepdocCorpusTfidf(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildDataset_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildLCCAuthorRepdocCorpusTf(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildLCCAuthorRepdocCorpusTfidf_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildAuthorRepdocVectors(start=2000, end=2016) is complete
DEBUG: Checking if BuildPaperRepdocDictionary(start=2000, end=2016) is complete
DEBUG: Checking if AuthorCitationGraphLCCIdmap(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildLCCAuthorRepdocCorpusTf_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if WriteLCCAuthorCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task AuthorCitationGraphLCCIdmap_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildAuthorCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task WriteLCCAuthorCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilterAuthorshipsToYearRange(start=2000, end=2016) is complete
DEBUG: Checking if PaperCitationGraphIdmap(start=2000, end=2016) is complete
DEBUG: Checking if PickledPaperCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildAuthorCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildPaperCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task PickledPaperCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilteredCSVPapers(start=2000, end=2016) is complete
DEBUG: Checking if FilteredCSVRefs(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilterPapersToYearRange(start=2000, end=2016) is complete
INFO: Informed scheduler that task FilteredCSVRefs_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if RemoveUniqueVenues() is complete
DEBUG: Checking if CSVRefsRecords() is complete
INFO: Informed scheduler that task FilterPapersToYearRange_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if ParsePapersToCSV() is complete
INFO: Informed scheduler that task CSVRefsRecords__99914b932b has status PENDING
DEBUG: Checking if AminerNetworkPapers() is complete
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status PENDING
DEBUG: Checking if RemovePapersNoVenueOrYear() is complete
INFO: Informed scheduler that task RemoveUniqueVenues__99914b932b has status PENDING
DEBUG: Checking if CSVPaperRecords() is complete
INFO: Informed scheduler that task RemovePapersNoVenueOrYear__99914b932b has status PENDING
INFO: Informed scheduler that task CSVPaperRecords__99914b932b has status PENDING
INFO: Informed scheduler that task FilteredCSVPapers_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task PaperCitationGraphIdmap_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if ParseAuthorshipsToCSV() is complete
INFO: Informed scheduler that task FilterAuthorshipsToYearRange_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if AminerNetworkAuthorships() is complete
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status PENDING
DEBUG: Checking if BuildPaperRepdocVectors(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperRepdocDictionary_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildPaperRepdocs(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperRepdocVectors_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task BuildPaperRepdocs_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task BuildAuthorRepdocVectors_2016_2000_27b674c14c has status PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 25
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running AminerNetworkPapers()
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) done AminerNetworkPapers()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running ParsePapersToCSV()
ERROR: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) failed ParsePapersToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkPapers__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status FAILED
DEBUG: Checking if ParsePapersToCSV() is complete
DEBUG: Checking if AminerNetworkPapers() is complete
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status PENDING
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 25
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running AminerNetworkPapers()
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) done AminerNetworkPapers()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running ParsePapersToCSV()
ERROR: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) failed ParsePapersToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkPapers__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running AminerNetworkAuthorships()
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) done AminerNetworkAuthorships()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 23
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running ParseAuthorshipsToCSV()
ERROR: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) failed ParseAuthorshipsToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkAuthorships__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status FAILED
DEBUG: Checking if ParseAuthorshipsToCSV() is complete
DEBUG: Checking if AminerNetworkAuthorships() is complete
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status PENDING
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running AminerNetworkAuthorships()
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) done AminerNetworkAuthorships()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 23
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running ParseAuthorshipsToCSV()
ERROR: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) failed ParseAuthorshipsToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkAuthorships__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: There are 23 pending tasks possibly being run by other workers
DEBUG: There are 23 pending tasks unique to this worker
DEBUG: There are 23 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====
Scheduled 25 tasks of which:
- 2 ran successfully:
- 1 AminerNetworkAuthorships()
- 1 AminerNetworkPapers()
- 2 failed:
- 1 ParseAuthorshipsToCSV()
- 1 ParsePapersToCSV()
- 21 were left pending, among these:
- 21 had failed dependencies:
- 1 AuthorCitationGraphLCCIdmap(start=2000, end=2016)
- 1 BuildAuthorCitationGraph(start=2000, end=2016)
- 1 BuildAuthorRepdocVectors(start=2000, end=2016)
- 1 BuildDataset(start=2000, end=2016)
- 1 BuildLCCAuthorRepdocCorpusTf(start=2000, end=2016)
...
- 21 had failed dependencies:
This progress looks :( because there were failed tasks
===== Luigi Execution Summary =====
from dblp.
Without context surrounding this output, I'm afraid I don't know exactly what you've done to produce it. Even so, it is clear the same file is not being found. I have made the process simpler; please delete the repository, re-clone it, and then simply run make
. This will now install all dependencies using pip, write an appropriate config.py file, and then download and extract the data. It will also run two new verification scripts to ensure your setup is correct. If make
succeeds, you should be good to go.
If you prefer not to install using pip, you can also just run make config && make dl && make extract
.
from dblp.
I run python pipeline.py BuildDataset --start 2000 --end 2016 --local-scheduler
My config file:
import os
pjoin = os.path.join
base_dir = '/data/aminer-network'
data_dir = pjoin(base_dir, 'data') # join base_dir and data_dir
originals_dir = pjoin(data_dir, 'original-data')
base_csv_dir = pjoin(data_dir, 'base-csv')
filtered_dir = pjoin(data_dir, 'filtered-csv')
repdoc_dir = pjoin(data_dir, 'repdocs')
graph_dir = pjoin(data_dir, 'graphs')
However, I can not get the csv in the files?
DEBUG: Checking if BuildDataset(start=2000, end=2016) is complete
/usr/local/lib/python2.7/dist-packages/luigi/worker.py:328: UserWarning: Task BuildDataset(start=2000, end=2016) without outputs has no custom complete() method
is_complete = task.complete()
DEBUG: Checking if BuildLCCAuthorRepdocCorpusTfidf(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildDataset_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildLCCAuthorRepdocCorpusTf(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildLCCAuthorRepdocCorpusTfidf_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildAuthorRepdocVectors(start=2000, end=2016) is complete
DEBUG: Checking if BuildPaperRepdocDictionary(start=2000, end=2016) is complete
DEBUG: Checking if AuthorCitationGraphLCCIdmap(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildLCCAuthorRepdocCorpusTf_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if WriteLCCAuthorCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task AuthorCitationGraphLCCIdmap_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildAuthorCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task WriteLCCAuthorCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilterAuthorshipsToYearRange(start=2000, end=2016) is complete
DEBUG: Checking if PaperCitationGraphIdmap(start=2000, end=2016) is complete
DEBUG: Checking if PickledPaperCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildAuthorCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildPaperCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task PickledPaperCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilteredCSVPapers(start=2000, end=2016) is complete
DEBUG: Checking if FilteredCSVRefs(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilterPapersToYearRange(start=2000, end=2016) is complete
INFO: Informed scheduler that task FilteredCSVRefs_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if RemoveUniqueVenues() is complete
DEBUG: Checking if CSVRefsRecords() is complete
INFO: Informed scheduler that task FilterPapersToYearRange_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if ParsePapersToCSV() is complete
INFO: Informed scheduler that task CSVRefsRecords__99914b932b has status PENDING
DEBUG: Checking if AminerNetworkPapers() is complete
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status PENDING
DEBUG: Checking if RemovePapersNoVenueOrYear() is complete
INFO: Informed scheduler that task RemoveUniqueVenues__99914b932b has status PENDING
DEBUG: Checking if CSVPaperRecords() is complete
INFO: Informed scheduler that task RemovePapersNoVenueOrYear__99914b932b has status PENDING
INFO: Informed scheduler that task CSVPaperRecords__99914b932b has status PENDING
INFO: Informed scheduler that task FilteredCSVPapers_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task PaperCitationGraphIdmap_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if ParseAuthorshipsToCSV() is complete
INFO: Informed scheduler that task FilterAuthorshipsToYearRange_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if AminerNetworkAuthorships() is complete
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status PENDING
DEBUG: Checking if BuildPaperRepdocVectors(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperRepdocDictionary_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildPaperRepdocs(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperRepdocVectors_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task BuildPaperRepdocs_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task BuildAuthorRepdocVectors_2016_2000_27b674c14c has status PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 25
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running AminerNetworkPapers()
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) done AminerNetworkPapers()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running ParsePapersToCSV()
ERROR: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) failed ParsePapersToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkPapers__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status FAILED
DEBUG: Checking if ParsePapersToCSV() is complete
DEBUG: Checking if AminerNetworkPapers() is complete
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status PENDING
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 25
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running AminerNetworkPapers()
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) done AminerNetworkPapers()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running ParsePapersToCSV()
ERROR: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) failed ParsePapersToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkPapers__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running AminerNetworkAuthorships()
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) done AminerNetworkAuthorships()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 23
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running ParseAuthorshipsToCSV()
ERROR: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) failed ParseAuthorshipsToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkAuthorships__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status FAILED
DEBUG: Checking if ParseAuthorshipsToCSV() is complete
DEBUG: Checking if AminerNetworkAuthorships() is complete
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status PENDING
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running AminerNetworkAuthorships()
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) done AminerNetworkAuthorships()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 23
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running ParseAuthorshipsToCSV()
ERROR: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) failed ParseAuthorshipsToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkAuthorships__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: There are 23 pending tasks possibly being run by other workers
DEBUG: There are 23 pending tasks unique to this worker
DEBUG: There are 23 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====
Scheduled 25 tasks of which:
- 2 ran successfully:
- 1 AminerNetworkAuthorships()
- 1 AminerNetworkPapers()
- 2 failed:
- 1 ParseAuthorshipsToCSV()
- 1 ParsePapersToCSV()
- 21 were left pending, among these:
- 21 had failed dependencies:
- 1 AuthorCitationGraphLCCIdmap(start=2000, end=2016)
- 1 BuildAuthorCitationGraph(start=2000, end=2016)
- 1 BuildAuthorRepdocVectors(start=2000, end=2016)
- 1 BuildDataset(start=2000, end=2016)
- 1 BuildLCCAuthorRepdocCorpusTf(start=2000, end=2016)
...
- 21 had failed dependencies:
This progress looks :( because there were failed tasks
===== Luigi Execution Summary =====
from dblp.
I added two scripts to verify your config and downloads. If you pull you will get these. You can run python verify_config.py
and python verify_download.py
. I suspect one of these might inform you of the issue.
from dblp.
from dblp.
The images in your comment do not show up. You should have the following directory structure:
├── /data/aminer-network/data
│ └── original-data
│ ├── AMiner-Author.txt
│ ├── AMiner-Author.zip
│ ├── AMiner-Author2Paper.txt
│ ├── AMiner-Author2Paper.zip
│ ├── AMiner-Paper.rar
│ └── AMiner-Paper.txt
Please run the two verify scripts and paste the output here.
from dblp.
DEBUG: Checking if BuildDataset(start=2000, end=2016) is complete
/home/jerry/.local/lib/python2.7/site-packages/luigi/worker.py:328: UserWarning: Task BuildDataset(start=2000, end=2016) without outputs has no custom complete() method
is_complete = task.complete()
from dblp.
Were you able to run it successfully? If not, please run the two verify scripts and paste the output here.
from dblp.
Closing due to inactivity.
from dblp.
Related Issues (20)
- Add AMiner data retrieval script HOT 1
- How to run this on the Aminer dataset? HOT 21
- NFO: Task RemoveUniqueVenues__99914b932b died unexpectedly ERROR.... HOT 2
- BuildAllGraphData task does not exist in build_graphs.py HOT 1
- python filtering.py FilterAllCSVRecordsToYearRange --start 1990 --end 2000 --local-scheduler does not work as guided HOT 2
- Add complete() mechanism to BuildDataset
- Failed scheduling due to utils.py 'basestring' is not defined? HOT 2
- Author <id> to <name> mapping HOT 2
- File "pipeline.py", line 9, import config ImportError: No module named config HOT 4
- paper.csv is too large to save in my computer HOT 1
- Repdocs Module Documentation
- RuntimeError: Unfulfilled dependency at run time: HOT 2
- config is not recognized while executing the code HOT 5
- Thoroughly document each output file.
- Refactor `convert` module to use luigi HOT 1
- Unit tests for each v1.0 Task
- Build co-authorship network
- Perform author name disambiguation to produce new mapping HOT 2
- Add graphml writer that includes term attributes on nodes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dblp.