Giter Club home page Giter Club logo

evalne's Issues

Network reconstruction question

Hi Alexandru,

I don't fully understand how network reconstruction is done.
For example, what is the difference between LPEvaluator and NREvaluator when train_frac is the same number, for example 0.7 ?

Best regards.

Train/test procedure for link prediction on NE

Dear authors,

I try to understand a procedure around train/test splitting for evaluating network embeddings on link prediction you implemented in your package. I posted a similar question yesterday on Data Science Stack Exchange.

If I understand your paper correctly, the node embedding is performed only on positive examples (edges) on the training set; false examples are used only to train a classifier? Is this correct?

Thanks.

Best, Andrej

runtimeerror when run simple-example.py

(EvalNE) C:\Users\13688\Downloads\EvalNE-master\examples\api_examples>python simple-example.py
Traceback (most recent call last):
File "", line 1, in
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "D:\Software\MiniConda\envs\EvalNE\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "D:\Software\MiniConda\envs\EvalNE\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "D:\Software\MiniConda\envs\EvalNE\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\13688\Downloads\EvalNE-master\examples\api_examples\simple-example.py", line 33, in
result = nee.evaluate_baseline(method=method)
File "D:\Software\MiniConda\envs\EvalNE\lib\site-packages\evalne\evaluation\evaluator.py", line 301, in evaluate_baseline
train_pred, test_pred = util.run_function(timeout, _eval_sim,
File "D:\Software\MiniConda\envs\EvalNE\lib\site-packages\evalne\utils\util.py", line 189, in run_function
p.start()
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Confused about the train/test split for link prediction

Is this correct?

Every positive training edge exists in the training graph, but none of the positive testing edges exist in the training graph.

If so, I'm not sure I understand the reasoning. Isn't the goal to predict missing edges? If the training edges exist in the training graph then they're not missing.

In particular, methods which return very high scores for edges which already exist will bias the training. For example, I'm using quasi-local methods like superposed random walks (just running a few random walks of length 3 and then adding the results). If an edge already exists between two nodes then this superposed score will tend to be very very high. As a result, my test scores have very high precision but very low recall. This makes sense to me if the classifier is learning "very very high SRW score => edge".

Am I misinterpreting something here?

[FEATURE] limiting thread usage

Hi, I found out when performing a benchmark experiment that at a certain point all available threads were used, this killed the process of a colleague of mine. I found the culprit to be the construction of the edge embeddings, this is written efficiently with Numpy but by default uses all available threads.

I tried to alter this behavior via joblib backend but that did not work.

I found solution which was to include the following 2 lines before importing numpy, in the script where I call the evalne evaluator:
import os
os.environ["OPENBLAS_NUM_THREADS"] = "24"

Note that the number of threads need to be given as a str not an int.

Hope this helps others with the same problem.

[BUG] 1. TypeError: 'Results' object is not iterable. 2. TypeError: a bytes-like object is required, not 'str'

Describe the bug
I have installed EvalNE, OpenNE library, PRUNE and Metapath2Vec following the instructions.
When I run evaluator_example.py, I encounter several errors and warnings.

  1. TypeError: 'Results' object is not iterable.
  2. TypeError: a bytes-like object is required, not 'str'
  3. ERROR:root:No test edges in trainvalid_split. Recomputing correct split...
  4. WARNING:root:Output of method metapath2vec++ contains 2 more lines than expected. Will consider them part of the header and ignore them... Expected num_lines 703, obtained lines 705.

To Reproduce
Steps to reproduce the error:

  1. OS used: Ubuntu 18.04.1 LTS
  2. EvalNE Version: 0.3.1
  3. Snippet of code executed (for API) or conf file run (for CLI)
cd examples/
python3 evaluator_example.py
  1. Full error output
  • Error 1
    Traceback (most recent call last):
    File "evaluator_example.py", line 185, in
    main()
    File "evaluator_example.py", line 67, in main
    eval_other(nee, scoresheet)
    File "evaluator_example.py", line 153, in eval_other
    for res in results:
    TypeError: 'Results' object is not iterable

  • Error 2
    Traceback (most recent call last):
    File "/home/huangxk/workspace_python/embedding/EvalNE/examples/evaluator_example.py", line 185, in
    main()
    File "/home/huangxk/workspace_python/embedding/EvalNE/examples/evaluator_example.py", line 70, in main
    scoresheet.write_tabular(filename=os.path.join(outpath, 'eval_output.txt'), metric='auroc')
    File "/home/huangxk/workspace_python/embedding/EvalNE/venv_for_evlne/lib/python3.6/site-packages/evalne/evaluation/score.py", line 204, in write_tabular
    df.to_csv(f, sep='\t', na_rep='NA')
    File "/home/huangxk/workspace_python/embedding/EvalNE/venv_for_evlne/lib/python3.6/site-packages/pandas/core/generic.py", line 3228, in to_csv
    formatter.save()
    File "/home/huangxk/workspace_python/embedding/EvalNE/venv_for_evlne/lib/python3.6/site-packages/pandas/io/formats/csvs.py", line 202, in save
    self._save()
    File "/home/huangxk/workspace_python/embedding/EvalNE/venv_for_evlne/lib/python3.6/site-packages/pandas/io/formats/csvs.py", line 310, in _save
    self._save_header()
    File "/home/huangxk/workspace_python/embedding/EvalNE/venv_for_evlne/lib/python3.6/site-packages/pandas/io/formats/csvs.py", line 278, in _save_header
    writer.writerow(encoded_labels)
    TypeError: a bytes-like object is required, not 'str'

  • Error 3
    Preprocessing graph...
    Repetition 0 of experiment
    Evaluating baselines...
    Evaluating Embedding methods...
    ERROR:root:No test edges in trainvalid_split. Recomputing correct split...
    Running command...

  • Warning 4
    WARNING:root:Output of method metapath2vec++ contains 2 more lines than expected. Will consider them part of the header and ignore them... Expected num_lines 703, obtained lines 705.
    WARNING:root:Output provided by method metapath2vec++ contains 129 columns, 128 expected! Taking first column as nodeID...
    WARNING:root:Output of method node2vec contains 1 more lines than expected. Will consider them part of the header and ignore them... Expected num_lines 703, obtained lines 704.
    WARNING:root:Output provided by method node2vec contains 129 columns, 128 expected! Taking first column as nodeID...
    WARNING:root:Output of method deepwalk contains 1 more lines than expected. Will consider them part of the header and ignore them... Expected num_lines 703, obtained lines 704.
    WARNING:root:Output provided by method deepwalk contains 129 columns, 128 expected! Taking first column as nodeID...

My solutions
I have tried to solve Error 1 and Error 2 and it works (but i am not sure whether it is the right solution). Although Error 3 and Warning 4 are WARNING, I want to know the reason and whether I should ignore them or not.

  • My solution to TypeError: 'Results' object is not iterable.
    In file evaluator_example.py, line 149-150, line 173-174 :

    for res in results:
        scoresheet.log_results(res)
    

    change them to

    scoresheet.log_results(results)
    
  • My solution to TypeError: a bytes-like object is required, not 'str'
    It seems that this error is caused by text/binary mode. This question in stackoverflow may be helpful. So I tried to change the source code of evalne: in file evalne/evaluation/score.py, line 201-202

    f = open(filename, 'a+b')
    f.write(header.encode())
    

    change them to

    f = open(filename, 'a')
    f.write(header)
    
  • ERROR:root:No test edges in trainvalid_split. Recomputing correct split...
    Why this error message? Should I ignore it?

  • Warning 4
    It seems that they are related to OpenNE?

Result
Even with Error 3 and Warning 4, I still get result file in example/output/eval_output.txt

Evaluation results (auroc):
-----------------------
	network
random_prediction	0.4942
common_neighbours	0.8458
jaccard_coefficient	0.7255
adamic_adar_index	0.8551
preferential_attachment	0.9376
resource_allocation_index	0.853
PRUNE	0.8299
metapath2vec++	0.8218
node2vec	0.8796
deepwalk	0.8603
line	0.8997

Is this correct?

Desktop (please complete the following information):

  • OS: Ubuntu 18.04.1 LTS
  • EvalNE Version : 0.3.1
  • Python: 3.6.7

Thanks for sharing this great library. I am learning to use it.
Best,
Xikun

The way to test tadw of the openne library is to make the following mistakes!using the simple-example.py

Hello, I'm testing tadw of openne library with simple-example.py file. There are the following errors. I hope you can help me solve the questions in your busy schedule.

D:\software\Python3.5\python3.exe "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 5345 --file C:/Users/liujinxin/Desktop/EvalNE-master/examples/simple-example.py
pydev debugger: process 1788 is connecting

Connected to pydev debugger (build 183.5912.18)
Running command...
python3 -m openne --method tadw --input C:/Users/liujinxin/Desktop/xiugai/OpenNE-master/dwata/cora/cora_edgelist.txt --graph-format edgelist --output vec_all.txt --q 0.25 --p 0.25 --input ./edgelist.tmp --output ./emb.tmp --representation-size 128
Reading...
Traceback (most recent call last):
File "D:\software\Python3.5\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "D:\software\Python3.5\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne_main
.py", line 182, in
main(parse_args())
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne_main
.py", line 137, in main
g.read_node_label(args.label_file)
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne\graph.py", line 89, in read_node_label
self.G.nodes[vec[0]]['label'] = vec[1:]
File "D:\software\Python3.5\lib\site-packages\networkx\classes\reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: '703'
I/O error(2): No such file or directory while evaluating method tadw
Traceback (most recent call last):
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1741, in
main()
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/liujinxin/Desktop/EvalNE-master/examples/simple-example.py", line 43, in
edge_embedding_methods=edge_emb, input_delim=' ', output_delim=' ')
File "D:\software\Python3.5\lib\site-packages\evalne\evaluation\evaluator.py", line 695, in evaluate_cmd
input_delim, output_delim, write_weights, write_dir, verbose)
File "D:\software\Python3.5\lib\site-packages\evalne\evaluation\evaluator.py", line 744, in _evaluate_ne_cmd
num_vectors = sum(1 for _ in open(tmpemb))
FileNotFoundError: [Errno 2] No such file or directory: './emb.tmp'
Backend TkAgg is interactive backend. Turning interactive mode on.
Failed to enable GUI event loop integration for 'tk'
Traceback (most recent call last):
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydev_ipython\matplotlibtools.py", line 31, in do_enable_gui
enable_gui(guiname)
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydev_ipython\inputhook.py", line 536, in enable_gui
return gui_hook(app)
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydev_ipython\inputhook.py", line 285, in enable_tk
app = TK.Tk()
File "D:\software\Python3.5\lib\tkinter_init
.py", line 1877, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: Can't find a usable init.tcl in the following directories:
D:/software/Python3.5/lib/tcl8.6 D:/software/lib/tcl8.6 D:/lib/tcl8.6 D:/software/library D:/library D:/tcl8.6.4/library D:/tcl8.6.4/library

This probably means that Tcl wasn't installed properly.

NO directory found

IOError: [Errno 2] No such file or directory: './emb.tmp' while running python evalne ./examples/conf_parTest.ini

Installing EvaNE v0.3.2

I am trying to install EvalNE v0.3.2 on python 3.6.12 on macOS and execute simple_example.py.

While executing
$conda install --file requirements.txt

the output is

error: numpy 1.15.1 is installed but numpy>=1.15.4 is required by {'pandas'}

Basically, it can be reproduced by following the installation guide:

$git clone https://github.com/Dru-Mara/EvalNE.git
$cd EvalNE
$pip3 install -r requirements.txt
$sudo python3 setup.py install

or

$git clone https://github.com/Dru-Mara/EvalNE.git
$cd EvalNE
$conda install --file requirements.txt
$sudo python3 setup.py install

By ignoring the error and later running

$cd examples/
$python3 simple_example.py

we obtain

evalne/utils/preprocess.py", line 523, in prep_graph G.remove_edges_from(G.selfloop_edges()) AttributeError: 'Graph' object has no attribute 'selfloop_edges'

I think the issue is that we need to specify the versions in the requirements more clearly, i.e. the versions of scipy, pandas, tqdm and matplotlib.

[BUG] precisionatk (evaluation/score.py)

Hi Alex,

I noticed this bug when I wanted to use the Score class separately from any other class in the evalne package, simply because it allows you to easily calculate and plot performance metrics. However, when I wanted to use the precisionatk function in evalne/evaluation/score.py at line 598 I got the following error:

  • TypeError: 'zip' object is not subscriptable

Current solution: I solved it by first encapsulating the zip-object in a list-call as such list(zip(*self._sorted))[0] .

Best,
Pieter-Paul

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.