Giter Club home page Giter Club logo

Comments (12)

JerryTom121 avatar JerryTom121 commented on July 28, 2024

Do you have AMiner-Author2Paper.tsv ? @macks22 @shashankg7

from dblp.

macks22 avatar macks22 commented on July 28, 2024

@JerryTom121 you don't actually need the .tsv extension. There was a confusing typo in the config-example.py file that I've now corrected. There was another confusing typo in the run instructions in the readme which I've also fixed. When you download the data, the AMiner-Author2Paper file has a ".txt" extension even though the file is a tab-separated ".tsv" file in reality. This is unimportant for the correct functioning of the program; you can use either extension to name the file.

The real issue was that the config-example.py file was using the wrong path to look for the data files if you followed the setup instructions exactly. I have updated the example, so please pull the most recent changes and start from step 2 in the instructions again: https://github.com/macks22/dblp#how-to-run-the-complete-pipeline.

Please let me know if you are still having trouble after trying this.

from dblp.

JerryTom121 avatar JerryTom121 commented on July 28, 2024

from dblp.

JerryTom121 avatar JerryTom121 commented on July 28, 2024

jerry@jerry-virtual-machine:~/citation network/dblp-master/dblp-master/dblp-master/pipeline$ python pipeline.py BuildDataset --start 2000 --end 2016 --local-sch
eduler
DEBUG: Checking if BuildDataset(start=2000, end=2016) is complete
/usr/local/lib/python2.7/dist-packages/luigi/worker.py:328: UserWarning: Task BuildDataset(start=2000, end=2016) without outputs has no custom complete() method
is_complete = task.complete()
DEBUG: Checking if BuildLCCAuthorRepdocCorpusTfidf(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildDataset_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildLCCAuthorRepdocCorpusTf(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildLCCAuthorRepdocCorpusTfidf_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildAuthorRepdocVectors(start=2000, end=2016) is complete
DEBUG: Checking if BuildPaperRepdocDictionary(start=2000, end=2016) is complete
DEBUG: Checking if AuthorCitationGraphLCCIdmap(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildLCCAuthorRepdocCorpusTf_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if WriteLCCAuthorCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task AuthorCitationGraphLCCIdmap_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildAuthorCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task WriteLCCAuthorCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilterAuthorshipsToYearRange(start=2000, end=2016) is complete
DEBUG: Checking if PaperCitationGraphIdmap(start=2000, end=2016) is complete
DEBUG: Checking if PickledPaperCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildAuthorCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildPaperCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task PickledPaperCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilteredCSVPapers(start=2000, end=2016) is complete
DEBUG: Checking if FilteredCSVRefs(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilterPapersToYearRange(start=2000, end=2016) is complete
INFO: Informed scheduler that task FilteredCSVRefs_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if RemoveUniqueVenues() is complete
DEBUG: Checking if CSVRefsRecords() is complete
INFO: Informed scheduler that task FilterPapersToYearRange_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if ParsePapersToCSV() is complete
INFO: Informed scheduler that task CSVRefsRecords__99914b932b has status PENDING
DEBUG: Checking if AminerNetworkPapers() is complete
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status PENDING
DEBUG: Checking if RemovePapersNoVenueOrYear() is complete
INFO: Informed scheduler that task RemoveUniqueVenues__99914b932b has status PENDING
DEBUG: Checking if CSVPaperRecords() is complete
INFO: Informed scheduler that task RemovePapersNoVenueOrYear__99914b932b has status PENDING
INFO: Informed scheduler that task CSVPaperRecords__99914b932b has status PENDING
INFO: Informed scheduler that task FilteredCSVPapers_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task PaperCitationGraphIdmap_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if ParseAuthorshipsToCSV() is complete
INFO: Informed scheduler that task FilterAuthorshipsToYearRange_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if AminerNetworkAuthorships() is complete
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status PENDING
DEBUG: Checking if BuildPaperRepdocVectors(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperRepdocDictionary_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildPaperRepdocs(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperRepdocVectors_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task BuildPaperRepdocs_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task BuildAuthorRepdocVectors_2016_2000_27b674c14c has status PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 25
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running AminerNetworkPapers()
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) done AminerNetworkPapers()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running ParsePapersToCSV()
ERROR: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) failed ParsePapersToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkPapers__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status FAILED
DEBUG: Checking if ParsePapersToCSV() is complete
DEBUG: Checking if AminerNetworkPapers() is complete
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status PENDING
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 25
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running AminerNetworkPapers()
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) done AminerNetworkPapers()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running ParsePapersToCSV()
ERROR: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) failed ParsePapersToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkPapers__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running AminerNetworkAuthorships()
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) done AminerNetworkAuthorships()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 23
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running ParseAuthorshipsToCSV()
ERROR: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) failed ParseAuthorshipsToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkAuthorships__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status FAILED
DEBUG: Checking if ParseAuthorshipsToCSV() is complete
DEBUG: Checking if AminerNetworkAuthorships() is complete
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status PENDING
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running AminerNetworkAuthorships()
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) done AminerNetworkAuthorships()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 23
INFO: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) running ParseAuthorshipsToCSV()
ERROR: [pid 49786] Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) failed ParseAuthorshipsToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkAuthorships__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: There are 23 pending tasks possibly being run by other workers
DEBUG: There are 23 pending tasks unique to this worker
DEBUG: There are 23 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=376560916, workers=1, host=jerry-virtual-machine, username=jerry, pid=49786) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====

Scheduled 25 tasks of which:

  • 2 ran successfully:
    • 1 AminerNetworkAuthorships()
    • 1 AminerNetworkPapers()
  • 2 failed:
    • 1 ParseAuthorshipsToCSV()
    • 1 ParsePapersToCSV()
  • 21 were left pending, among these:
    • 21 had failed dependencies:
      • 1 AuthorCitationGraphLCCIdmap(start=2000, end=2016)
      • 1 BuildAuthorCitationGraph(start=2000, end=2016)
      • 1 BuildAuthorRepdocVectors(start=2000, end=2016)
      • 1 BuildDataset(start=2000, end=2016)
      • 1 BuildLCCAuthorRepdocCorpusTf(start=2000, end=2016)
        ...

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

from dblp.

macks22 avatar macks22 commented on July 28, 2024

Without context surrounding this output, I'm afraid I don't know exactly what you've done to produce it. Even so, it is clear the same file is not being found. I have made the process simpler; please delete the repository, re-clone it, and then simply run make. This will now install all dependencies using pip, write an appropriate config.py file, and then download and extract the data. It will also run two new verification scripts to ensure your setup is correct. If make succeeds, you should be good to go.

If you prefer not to install using pip, you can also just run make config && make dl && make extract.

from dblp.

JerryTom121 avatar JerryTom121 commented on July 28, 2024

I run python pipeline.py BuildDataset --start 2000 --end 2016 --local-scheduler
My config file:
import os
pjoin = os.path.join

base_dir = '/data/aminer-network'
data_dir = pjoin(base_dir, 'data') # join base_dir and data_dir
originals_dir = pjoin(data_dir, 'original-data')
base_csv_dir = pjoin(data_dir, 'base-csv')
filtered_dir = pjoin(data_dir, 'filtered-csv')
repdoc_dir = pjoin(data_dir, 'repdocs')
graph_dir = pjoin(data_dir, 'graphs')

However, I can not get the csv in the files?
DEBUG: Checking if BuildDataset(start=2000, end=2016) is complete
/usr/local/lib/python2.7/dist-packages/luigi/worker.py:328: UserWarning: Task BuildDataset(start=2000, end=2016) without outputs has no custom complete() method
is_complete = task.complete()
DEBUG: Checking if BuildLCCAuthorRepdocCorpusTfidf(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildDataset_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildLCCAuthorRepdocCorpusTf(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildLCCAuthorRepdocCorpusTfidf_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildAuthorRepdocVectors(start=2000, end=2016) is complete
DEBUG: Checking if BuildPaperRepdocDictionary(start=2000, end=2016) is complete
DEBUG: Checking if AuthorCitationGraphLCCIdmap(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildLCCAuthorRepdocCorpusTf_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if WriteLCCAuthorCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task AuthorCitationGraphLCCIdmap_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildAuthorCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task WriteLCCAuthorCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilterAuthorshipsToYearRange(start=2000, end=2016) is complete
DEBUG: Checking if PaperCitationGraphIdmap(start=2000, end=2016) is complete
DEBUG: Checking if PickledPaperCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildAuthorCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildPaperCitationGraph(start=2000, end=2016) is complete
INFO: Informed scheduler that task PickledPaperCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilteredCSVPapers(start=2000, end=2016) is complete
DEBUG: Checking if FilteredCSVRefs(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperCitationGraph_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if FilterPapersToYearRange(start=2000, end=2016) is complete
INFO: Informed scheduler that task FilteredCSVRefs_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if RemoveUniqueVenues() is complete
DEBUG: Checking if CSVRefsRecords() is complete
INFO: Informed scheduler that task FilterPapersToYearRange_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if ParsePapersToCSV() is complete
INFO: Informed scheduler that task CSVRefsRecords__99914b932b has status PENDING
DEBUG: Checking if AminerNetworkPapers() is complete
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status PENDING
DEBUG: Checking if RemovePapersNoVenueOrYear() is complete
INFO: Informed scheduler that task RemoveUniqueVenues__99914b932b has status PENDING
DEBUG: Checking if CSVPaperRecords() is complete
INFO: Informed scheduler that task RemovePapersNoVenueOrYear__99914b932b has status PENDING
INFO: Informed scheduler that task CSVPaperRecords__99914b932b has status PENDING
INFO: Informed scheduler that task FilteredCSVPapers_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task PaperCitationGraphIdmap_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if ParseAuthorshipsToCSV() is complete
INFO: Informed scheduler that task FilterAuthorshipsToYearRange_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if AminerNetworkAuthorships() is complete
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status PENDING
DEBUG: Checking if BuildPaperRepdocVectors(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperRepdocDictionary_2016_2000_27b674c14c has status PENDING
DEBUG: Checking if BuildPaperRepdocs(start=2000, end=2016) is complete
INFO: Informed scheduler that task BuildPaperRepdocVectors_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task BuildPaperRepdocs_2016_2000_27b674c14c has status PENDING
INFO: Informed scheduler that task BuildAuthorRepdocVectors_2016_2000_27b674c14c has status PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 25
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running AminerNetworkPapers()
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) done AminerNetworkPapers()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running ParsePapersToCSV()
ERROR: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) failed ParsePapersToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkPapers__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status FAILED
DEBUG: Checking if ParsePapersToCSV() is complete
DEBUG: Checking if AminerNetworkPapers() is complete
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status PENDING
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 25
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running AminerNetworkPapers()
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) done AminerNetworkPapers()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkPapers__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running ParsePapersToCSV()
ERROR: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) failed ParsePapersToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkPapers__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParsePapersToCSV__99914b932b has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running AminerNetworkAuthorships()
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) done AminerNetworkAuthorships()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 23
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running ParseAuthorshipsToCSV()
ERROR: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) failed ParseAuthorshipsToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkAuthorships__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status FAILED
DEBUG: Checking if ParseAuthorshipsToCSV() is complete
DEBUG: Checking if AminerNetworkAuthorships() is complete
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status PENDING
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status PENDING
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 24
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running AminerNetworkAuthorships()
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) done AminerNetworkAuthorships()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AminerNetworkAuthorships__99914b932b has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 23
INFO: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) running ParseAuthorshipsToCSV()
ERROR: [pid 52678] Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) failed ParseAuthorshipsToCSV()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 175, in run
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time: AminerNetworkAuthorships__99914b932b
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ParseAuthorshipsToCSV__99914b932b has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: There are 23 pending tasks possibly being run by other workers
DEBUG: There are 23 pending tasks unique to this worker
DEBUG: There are 23 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=796904446, workers=1, host=jerry-virtual-machine, username=jerry, pid=52678) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====

Scheduled 25 tasks of which:

  • 2 ran successfully:
    • 1 AminerNetworkAuthorships()
    • 1 AminerNetworkPapers()
  • 2 failed:
    • 1 ParseAuthorshipsToCSV()
    • 1 ParsePapersToCSV()
  • 21 were left pending, among these:
    • 21 had failed dependencies:
      • 1 AuthorCitationGraphLCCIdmap(start=2000, end=2016)
      • 1 BuildAuthorCitationGraph(start=2000, end=2016)
      • 1 BuildAuthorRepdocVectors(start=2000, end=2016)
      • 1 BuildDataset(start=2000, end=2016)
      • 1 BuildLCCAuthorRepdocCorpusTf(start=2000, end=2016)
        ...

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

from dblp.

macks22 avatar macks22 commented on July 28, 2024

I added two scripts to verify your config and downloads. If you pull you will get these. You can run python verify_config.py and python verify_download.py. I suspect one of these might inform you of the issue.

from dblp.

JerryTom121 avatar JerryTom121 commented on July 28, 2024

from dblp.

macks22 avatar macks22 commented on July 28, 2024

The images in your comment do not show up. You should have the following directory structure:

├── /data/aminer-network/data
│   └── original-data
│       ├── AMiner-Author.txt
│       ├── AMiner-Author.zip
│       ├── AMiner-Author2Paper.txt
│       ├── AMiner-Author2Paper.zip
│       ├── AMiner-Paper.rar
│       └── AMiner-Paper.txt

Please run the two verify scripts and paste the output here.

from dblp.

JerryTom121 avatar JerryTom121 commented on July 28, 2024

DEBUG: Checking if BuildDataset(start=2000, end=2016) is complete
/home/jerry/.local/lib/python2.7/site-packages/luigi/worker.py:328: UserWarning: Task BuildDataset(start=2000, end=2016) without outputs has no custom complete() method
is_complete = task.complete()

from dblp.

macks22 avatar macks22 commented on July 28, 2024

Were you able to run it successfully? If not, please run the two verify scripts and paste the output here.

from dblp.

macks22 avatar macks22 commented on July 28, 2024

Closing due to inactivity.

from dblp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.