dru-mara / evalne Goto Github PK

View Code? Open in Web Editor NEW

103.0 5.0 26.0 869 KB

Source code for EvalNE, a Python library for evaluating Network Embedding methods.

Home Page: https://evalne.readthedocs.io/en/latest/

License: MIT License

Python 99.87% Shell 0.13%

link-prediction evaluation graph-embedding graphs networks network-embedding data-mining python library research-tool

evalne's Introduction

EvalNE: A Python library for evaluating Network Embedding methods

Read The Docs »

Table of Contents

About EvalNE
- For methodologists
- For practitioners
Installation
Usage
Contributing
License
Citation

About EvalNE

This repository provides the source code for EvalNE, an open-source Python library designed for assessing and comparing the performance of Network Embedding (NE) methods on Link Prediction (LP), Sign prediction (SP), Network Reconstruction (NR) and Node Classification (NC) tasks. The library intends to simplify these complex and time consuming evaluation processes by providing automation and abstraction of tasks such as hyper-parameter tuning and model validation, node and edge sampling, node-pair embedding computation, results reporting and data visualization.

The library can be used both as a command line tool and an API. In its current version, EvalNE can evaluate unweighted directed and undirected simple networks.

A Graphical User Interface based on Plotly Dash has been recently added to EvalNE. The interface allows users to set up and execute EvalNE evaluations in an intuitive and interactive way, monitor system resources and browse previous evaluations. Check out the project here -> EvalNE-gui.

Interested in robustness evaluation? That can also be done using EvalNE! Check out the following project (we will port it into the main library very soon): EvalNE-robustness.

The library is maintained by Alexandru Mara (alexandru.mara(at)ugent.be). The full documentation of EvalNE is hosted by Read the Docs and can be found here.

For Methodologists

A command line interface in combination with a configuration file (describing datasets, methods and evaluation setup) allows the user to evaluate any embedding method and compare it to the state of the art or replicate the experimental setup of existing papers without the need to write additional code. EvalNE does not provide implementations of any NE methods but offers the necessary environment to evaluate any off-the-shelf algorithm. Implementations of NE methods can be obtained from libraries such as OpenNE or GEM as well as directly from the web pages of the authors e.g. Deepwalk, Node2vec, LINE, PRUNE, Metapath2vec, CNE.

EvalNE does, however, includes the following LP heuristics for both directed and undirected networks (in and out node neighbourhoods), which can be used as baselines for different downstream tasks:

Random Prediction
Common Neighbours
Jaccard Coefficient
Adamic Adar Index
Preferential Attachment
Resource Allocation Index
Cosine Similarity
Leicht-Holme-Newman index
Topological Overlap
Katz similarity
All baselines (a combination of the first 5 heuristics in a 5-dim embedding)

For practitioners

When used as an API, EvalNE provides functions to:

Load and preprocess graphs
Obtain general graph statistics
Conveniently read node/edge embeddings from files
Sample nodes/edges to form train/test/validation sets
Different approaches for edge sampling:
- Timestamp based sampling: latest nodes are used for testing
- Random sampling: random split of edges in train and test sets
- Spanning tree sampling: train set will contain a spanning tree of the graph
- Fast depth first search sampling: similar to spanning tree but based of DFS
Negative sampling or generation of non-edge pairs using:
- Open world assumption: train non-edges do not overlap with train edges
- Closed world assumption: train non-edges do not overlap with either train nor test edges
Evaluate LP, SP and NR for methods that output:
- Node Embeddings
- Node-pair Embeddings
- Similarity scores (e.g. the ones given by LP heuristics)
Implements simple visualization routines for embeddings and graphs
Includes NC evaluation for node embedding methods
Provides binary operators to compute edge embeddings from node feature vectors:
- Average
- Hadamard
- Weighted L1
- Weighted L2
Can use any scikit-learn classifier for LP/SP/NR/NC tasks
Provides routines to run command line commands or functions with a given timeout
Includes hyperparameter tuning based on grid search
Implements over 10 different evaluation metrics such as AUC, F-score, etc.
AUC and PR curves can be provided as output
Includes routines to generate tabular outputs and directly parse them to Latex tables

(back to top)

Installation

The latest version of the library (v0.4.0) has been tested on Python 3.8.

EvalNE depends on the following packages:

Numpy
Scipy
Scikit-learn
Matplotlib
NetworkX
Pandas
tqdm
kiwisolver

Before installing EvalNE make sure that pip and python-tk packages are installed on your system, this can be done by running:

sudo apt-get install python3-pip
sudo apt-get install python3-tk

Option 1: Install the library using pip:

pip install evalne

Option 2: Cloning the code and installing:

Clone the EvalNE repository:

git clone https://github.com/Dru-Mara/EvalNE.git
cd EvalNE

Download dependencies and install the library:

# System-wide install
sudo python setup.py install

# Alternative install for a single user
python setup.py install --user

Check the installation by running simple_example.py or functions_example.py as shown below. If you have installed the package using pip, you will need to download the examples folder from the github repository first.

cd examples/
python simple_example.py

NOTE: In order to run the evaluator_example.py script, the OpenNE library, PRUNE and Metapath2Vec are required. The instructions for installing them are available here, here, and here, respectively. The instructions on how to run evaluations using .ini files are provided in the next section.

(back to top)

Usage

As a command line tool

The library takes as input an .ini configuration file. This file allows the user to specify the evaluation settings, from the task to perform to the networks to use, data preprocessing, methods and baselines to evaluate, and types of output to provide.

An example conf.ini file is provided describing the available options for each parameter. This file can be either modified to simulate different evaluation settings or used as a template to generate other .ini files.

Additional configuration (.ini) files are provided replicating the experimental sections of different papers in the NE literature. These can be found in different folders under examples/replicated_setups. One such configuration file is examples/replicated_setups/node2vec/conf_node2vec.ini. This file simulates the link prediction experiments of the paper "Scalable Feature Learning for Networks" by A. Grover and J. Leskovec.

Once the configuration is set, the evaluation can be run as indicated in the next subsection.

Running the conf examples

In order to run the evaluations using the provided conf.ini or any other .ini file, the following steps are necessary:

Download/Install the methods you want to test:
- For conf.ini:
  - Install OpenNE
  - Install PRUNE
- For other .ini files you may need:
  - Deepwalk, Node2vec, LINE, Metapath2vec, and/or CNE.
Download the datasets used in the examples:
- For conf.ini:
  - StudentDB
  - Facebook (combined network)
  - ArXiv GR-QC
- For other .ini files you may need:
Set the correct dataset paths in the INPATHS option of the corresponding .ini file. And the correct method paths under METHODS_OPNE and/or METHODS_OTHER options.

Run the evaluation:

# For conf.ini run:
python -m evalne ./examples/conf.ini

# For conf_node2vec.ini run:
python -m evalne ./examples/node2vec/conf_node2vec.ini

Note: The input networks for EvalNE are required to be in edgelist format.

As an API

The library can be imported and used like any other Python module. Next, we present a very basic LP example, for more complete ones we refer the user to the examples folder and the docstring documentation of the evaluator and the split submodules.

from evalne.evaluation.evaluator import LPEvaluator
from evalne.evaluation.split import LPEvalSplit
from evalne.evaluation.score import Scoresheet
from evalne.utils import preprocess as pp

# Load and preprocess the network
G = pp.load_graph('../evalne/tests/data/network.edgelist')
G, _ = pp.prep_graph(G)

# Create an evaluator and generate train/test edge split
traintest_split = LPEvalSplit()
traintest_split.compute_splits(G)
nee = LPEvaluator(traintest_split)

# Create a Scoresheet to store the results
scoresheet = Scoresheet()

# Set the baselines
methods = ['random_prediction', 'common_neighbours', 'jaccard_coefficient']

# Evaluate baselines
for method in methods:
    result = nee.evaluate_baseline(method=method)
    scoresheet.log_results(result)

try:
    # Check if OpenNE is installed
    import openne

    # Set embedding methods from OpenNE
    methods = ['node2vec', 'deepwalk', 'GraRep']
    commands = [
        'python -m openne --method node2vec --graph-format edgelist --p 1 --q 1',
        'python -m openne --method deepWalk --graph-format edgelist --number-walks 40',
        'python -m openne --method grarep --graph-format edgelist --epochs 10']
    edge_emb = ['average', 'hadamard']

    # Evaluate embedding methods
    for i in range(len(methods)):
        command = commands[i] + " --input {} --output {} --representation-size {}"
        results = nee.evaluate_cmd(method_name=methods[i], method_type='ne', command=command,
                                   edge_embedding_methods=edge_emb, input_delim=' ', output_delim=' ')
        scoresheet.log_results(results)

except ImportError:
    print("The OpenNE library is not installed. Reporting results only for the baselines...")
    pass

# Get output
scoresheet.print_tabular()

Output

The library stores all the output generated in a single folder per execution. The name of this folder is: {task}_eval_{month}{day}_{hour}{min}. Where {task} is one of: lp, sp, nr or nc.

The library can provide two types of outputs, depending on the value of the SCORES option of the configuration file. If the keyword all is specified, the library will generate a file named eval_output.txt containing for each method and network analysed all the metrics available (auroc, precision, f-score, etc.). If more than one experiment repeat is requested the values reported will be the average over all the repeats.

Setting the SCORES option to %(maximize) will generate a similar output file as before. The content of this file, however, will be a table (Alg. x Networks) containing exclusively the score specified in the MAXIMIZE option for each combination of method and network averaged over all experiment repeats. In addition a second table indicating the average execution time per method and dataset will be generated.

If the option CURVES is set to a valid option then for each method dataset and experiment repeat a PR or ROC curve will be generated. If the option SAVE_PREP_NW is set to True, each evaluated network will be stored, in edgelist format, in a folder with the same name as the network.

Finally, the library also generates an eval.log file and a eval.pkl. The first file contains important information regarding the evaluation process such as methods whose execution has failed, or validation scores. The second one encapsulates all the evaluation results as a pickle file. This file can be conveniently loaded and the results can be transformed into e.g. pandas dataframes or latex tables.

Parallelization

EvalNE makes extensive use of numpy for most operations. Numpy, in turn, uses other libraries such as OpenMP, MKL, etc., to provide parallelization. In order to allow for certain control on the maximum number of threads used during execution, we include a simple bash script (set_numpy_threads.sh). The script located inside the scripts folder can be given execution permissions and run as follows:

# Give execution permissions:
chmod +x set_numpy_threads.sh

# Run the script:
source set_numpy_threads.sh
# The script will then ask for the maximum number of threads to use.

(back to top)

Contributing

Contributions are greatly appreciated. If you want to help us improve EvalNE, please fork the repo and create a new pull request. Don't forget to give the project a star! Thanks!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Alternatively, you can make suggestions or report bugs by opening a new issue with the appropriate tag ("feature" or "bug") and following our Contributing template.

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Citation

If you have found EvaNE useful in your research, please consider giving the repo a star and citing our paper:

    @article{MARA2022evalne,
      title = {EvalNE: A Framework for Network Embedding Evaluation},
      author = {Alexandru Mara and Jefrey Lijffijt and Tijl {De Bie}},
      journal = {SoftwareX},
      volume = {17},
      pages = {},
      year = {2022},
      issn = {100997},
      doi = {10.1016/j.softx.2022.100997},
      url = {https://www.sciencedirect.com/science/article/pii/S2352711022000139}
    }

(back to top)

evalne's People

Contributors

Stargazers

Watchers

evalne's Issues

[BUG] precisionatk (evaluation/score.py)

Hi Alex,

I noticed this bug when I wanted to use the Score class separately from any other class in the evalne package, simply because it allows you to easily calculate and plot performance metrics. However, when I wanted to use the precisionatk function in evalne/evaluation/score.py at line 598 I got the following error:

TypeError: 'zip' object is not subscriptable

Current solution: I solved it by first encapsulating the zip-object in a list-call as such list(zip(*self._sorted))[0] .

Best,
Pieter-Paul

Can we use EvalNE for multiplex embedding ?

Hi,
Do you think your code can also be applied for the evaluation of heterogeneous and/or mutiplex network embedding ?

Does EvalNE work with RDF data?

Hello,
is this framework suitable with RDF data, since RDF induces a multigraph?

The way to test tadw of the openne library is to make the following mistakes！using the simple-example.py

Hello, I'm testing tadw of openne library with simple-example.py file. There are the following errors. I hope you can help me solve the questions in your busy schedule.

D:\software\Python3.5\python3.exe "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 5345 --file C:/Users/liujinxin/Desktop/EvalNE-master/examples/simple-example.py
pydev debugger: process 1788 is connecting

Connected to pydev debugger (build 183.5912.18)
Running command...
python3 -m openne --method tadw --input C:/Users/liujinxin/Desktop/xiugai/OpenNE-master/dwata/cora/cora_edgelist.txt --graph-format edgelist --output vec_all.txt --q 0.25 --p 0.25 --input ./edgelist.tmp --output ./emb.tmp --representation-size 128
Reading...
Traceback (most recent call last):
File "D:\software\Python3.5\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "D:\software\Python3.5\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne_main.py", line 182, in
main(parse_args())
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne_main.py", line 137, in main
g.read_node_label(args.label_file)
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne\graph.py", line 89, in read_node_label
self.G.nodes[vec[0]]['label'] = vec[1:]
File "D:\software\Python3.5\lib\site-packages\networkx\classes\reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: '703'
I/O error(2): No such file or directory while evaluating method tadw
Traceback (most recent call last):
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1741, in
main()
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/liujinxin/Desktop/EvalNE-master/examples/simple-example.py", line 43, in
edge_embedding_methods=edge_emb, input_delim=' ', output_delim=' ')
File "D:\software\Python3.5\lib\site-packages\evalne\evaluation\evaluator.py", line 695, in evaluate_cmd
input_delim, output_delim, write_weights, write_dir, verbose)
File "D:\software\Python3.5\lib\site-packages\evalne\evaluation\evaluator.py", line 744, in _evaluate_ne_cmd
num_vectors = sum(1 for _ in open(tmpemb))
FileNotFoundError: [Errno 2] No such file or directory: './emb.tmp'
Backend TkAgg is interactive backend. Turning interactive mode on.
Failed to enable GUI event loop integration for 'tk'
Traceback (most recent call last):
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydev_ipython\matplotlibtools.py", line 31, in do_enable_gui
enable_gui(guiname)
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydev_ipython\inputhook.py", line 536, in enable_gui
return gui_hook(app)
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydev_ipython\inputhook.py", line 285, in enable_tk
app = TK.Tk()
File "D:\software\Python3.5\lib\tkinter_init.py", line 1877, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: Can't find a usable init.tcl in the following directories:
D:/software/Python3.5/lib/tcl8.6 D:/software/lib/tcl8.6 D:/lib/tcl8.6 D:/software/library D:/library D:/tcl8.6.4/library D:/tcl8.6.4/library

This probably means that Tcl wasn't installed properly.

[BUG] 1. TypeError: 'Results' object is not iterable. 2. TypeError: a bytes-like object is required, not 'str'

Describe the bug
I have installed EvalNE, OpenNE library, PRUNE and Metapath2Vec following the instructions.
When I run evaluator_example.py, I encounter several errors and warnings.

TypeError: 'Results' object is not iterable.
TypeError: a bytes-like object is required, not 'str'
ERROR:root:No test edges in trainvalid_split. Recomputing correct split...
WARNING:root:Output of method metapath2vec++ contains 2 more lines than expected. Will consider them part of the header and ignore them... Expected num_lines 703, obtained lines 705.

To Reproduce
Steps to reproduce the error:

OS used: Ubuntu 18.04.1 LTS
EvalNE Version: 0.3.1
Snippet of code executed (for API) or conf file run (for CLI)

cd examples/
python3 evaluator_example.py

Full error output

Error 1
Traceback (most recent call last):
File "evaluator_example.py", line 185, in
main()
File "evaluator_example.py", line 67, in main
eval_other(nee, scoresheet)
File "evaluator_example.py", line 153, in eval_other
for res in results:
TypeError: 'Results' object is not iterable
Error 2
Traceback (most recent call last):
File "/home/huangxk/workspace_python/embedding/EvalNE/examples/evaluator_example.py", line 185, in
main()
File "/home/huangxk/workspace_python/embedding/EvalNE/examples/evaluator_example.py", line 70, in main
scoresheet.write_tabular(filename=os.path.join(outpath, 'eval_output.txt'), metric='auroc')
File "/home/huangxk/workspace_python/embedding/EvalNE/venv_for_evlne/lib/python3.6/site-packages/evalne/evaluation/score.py", line 204, in write_tabular
df.to_csv(f, sep='\t', na_rep='NA')
File "/home/huangxk/workspace_python/embedding/EvalNE/venv_for_evlne/lib/python3.6/site-packages/pandas/core/generic.py", line 3228, in to_csv
formatter.save()
File "/home/huangxk/workspace_python/embedding/EvalNE/venv_for_evlne/lib/python3.6/site-packages/pandas/io/formats/csvs.py", line 202, in save
self._save()
File "/home/huangxk/workspace_python/embedding/EvalNE/venv_for_evlne/lib/python3.6/site-packages/pandas/io/formats/csvs.py", line 310, in _save
self._save_header()
File "/home/huangxk/workspace_python/embedding/EvalNE/venv_for_evlne/lib/python3.6/site-packages/pandas/io/formats/csvs.py", line 278, in _save_header
writer.writerow(encoded_labels)
TypeError: a bytes-like object is required, not 'str'
Error 3
Preprocessing graph...
Repetition 0 of experiment
Evaluating baselines...
Evaluating Embedding methods...
ERROR:root:No test edges in trainvalid_split. Recomputing correct split...
Running command...
Warning 4
WARNING:root:Output of method metapath2vec++ contains 2 more lines than expected. Will consider them part of the header and ignore them... Expected num_lines 703, obtained lines 705.
WARNING:root:Output provided by method metapath2vec++ contains 129 columns, 128 expected! Taking first column as nodeID...
WARNING:root:Output of method node2vec contains 1 more lines than expected. Will consider them part of the header and ignore them... Expected num_lines 703, obtained lines 704.
WARNING:root:Output provided by method node2vec contains 129 columns, 128 expected! Taking first column as nodeID...
WARNING:root:Output of method deepwalk contains 1 more lines than expected. Will consider them part of the header and ignore them... Expected num_lines 703, obtained lines 704.
WARNING:root:Output provided by method deepwalk contains 129 columns, 128 expected! Taking first column as nodeID...

My solutions
I have tried to solve Error 1 and Error 2 and it works (but i am not sure whether it is the right solution). Although Error 3 and Warning 4 are WARNING, I want to know the reason and whether I should ignore them or not.

My solution to TypeError: 'Results' object is not iterable.
In file evaluator_example.py, line 149-150, line 173-174 :
```
for res in results:
    scoresheet.log_results(res)
```
change them to
```
scoresheet.log_results(results)
```
My solution to TypeError: a bytes-like object is required, not 'str'
It seems that this error is caused by text/binary mode. This question in stackoverflow may be helpful. So I tried to change the source code of evalne: in file evalne/evaluation/score.py, line 201-202
```
f = open(filename, 'a+b')
f.write(header.encode())
```
change them to
```
f = open(filename, 'a')
f.write(header)
```
ERROR:root:No test edges in trainvalid_split. Recomputing correct split...
Why this error message? Should I ignore it?
Warning 4
It seems that they are related to OpenNE?

Result
Even with Error 3 and Warning 4, I still get result file in example/output/eval_output.txt

Evaluation results (auroc):
-----------------------
	network
random_prediction	0.4942
common_neighbours	0.8458
jaccard_coefficient	0.7255
adamic_adar_index	0.8551
preferential_attachment	0.9376
resource_allocation_index	0.853
PRUNE	0.8299
metapath2vec++	0.8218
node2vec	0.8796
deepwalk	0.8603
line	0.8997

Is this correct?

Desktop (please complete the following information):

OS: Ubuntu 18.04.1 LTS
EvalNE Version : 0.3.1
Python: 3.6.7

Thanks for sharing this great library. I am learning to use it.
Best,
Xikun

NO directory found

IOError: [Errno 2] No such file or directory: './emb.tmp' while running python evalne ./examples/conf_parTest.ini

runtimeerror when run simple-example.py

(EvalNE) C:\Users\13688\Downloads\EvalNE-master\examples\api_examples>python simple-example.py
Traceback (most recent call last):
File "", line 1, in
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "D:\Software\MiniConda\envs\EvalNE\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "D:\Software\MiniConda\envs\EvalNE\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "D:\Software\MiniConda\envs\EvalNE\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\13688\Downloads\EvalNE-master\examples\api_examples\simple-example.py", line 33, in
result = nee.evaluate_baseline(method=method)
File "D:\Software\MiniConda\envs\EvalNE\lib\site-packages\evalne\evaluation\evaluator.py", line 301, in evaluate_baseline
train_pred, test_pred = util.run_function(timeout, _eval_sim,
File "D:\Software\MiniConda\envs\EvalNE\lib\site-packages\evalne\utils\util.py", line 189, in run_function
p.start()
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "D:\Software\MiniConda\envs\EvalNE\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Confused about the train/test split for link prediction

Is this correct?

Every positive training edge exists in the training graph, but none of the positive testing edges exist in the training graph.

If so, I'm not sure I understand the reasoning. Isn't the goal to predict missing edges? If the training edges exist in the training graph then they're not missing.

In particular, methods which return very high scores for edges which already exist will bias the training. For example, I'm using quasi-local methods like superposed random walks (just running a few random walks of length 3 and then adding the results). If an edge already exists between two nodes then this superposed score will tend to be very very high. As a result, my test scores have very high precision but very low recall. This makes sense to me if the classifier is learning "very very high SRW score => edge".

Am I misinterpreting something here?

Network reconstruction question

Hi Alexandru,

I don't fully understand how network reconstruction is done.
For example, what is the difference between LPEvaluator and NREvaluator when train_frac is the same number, for example 0.7 ?

Best regards.

Installing EvaNE v0.3.2

I am trying to install EvalNE v0.3.2 on python 3.6.12 on macOS and execute simple_example.py.

While executing
$conda install --file requirements.txt

the output is

error: numpy 1.15.1 is installed but numpy>=1.15.4 is required by {'pandas'}

Basically, it can be reproduced by following the installation guide:

$git clone https://github.com/Dru-Mara/EvalNE.git
$cd EvalNE
$pip3 install -r requirements.txt
$sudo python3 setup.py install

$git clone https://github.com/Dru-Mara/EvalNE.git
$cd EvalNE
$conda install --file requirements.txt
$sudo python3 setup.py install

By ignoring the error and later running

$cd examples/
$python3 simple_example.py

we obtain

evalne/utils/preprocess.py", line 523, in prep_graph G.remove_edges_from(G.selfloop_edges()) AttributeError: 'Graph' object has no attribute 'selfloop_edges'

I think the issue is that we need to specify the versions in the requirements more clearly, i.e. the versions of scipy, pandas, tqdm and matplotlib.

Train/test procedure for link prediction on NE

Dear authors,

I try to understand a procedure around train/test splitting for evaluating network embeddings on link prediction you implemented in your package. I posted a similar question yesterday on Data Science Stack Exchange.

If I understand your paper correctly, the node embedding is performed only on positive examples (edges) on the training set; false examples are used only to train a classifier? Is this correct?

Thanks.

Best, Andrej

[FEATURE] limiting thread usage

Hi, I found out when performing a benchmark experiment that at a certain point all available threads were used, this killed the process of a colleague of mine. I found the culprit to be the construction of the edge embeddings, this is written efficiently with Numpy but by default uses all available threads.

I tried to alter this behavior via joblib backend but that did not work.

I found solution which was to include the following 2 lines before importing numpy, in the script where I call the evalne evaluator:
import os
os.environ["OPENBLAS_NUM_THREADS"] = "24"

Note that the number of threads need to be given as a str not an int.

Hope this helps others with the same problem.

dru-mara / evalne Goto Github PK

evalne's Introduction

EvalNE: A Python library for evaluating Network Embedding methods

About EvalNE

For Methodologists

For practitioners

Installation

Usage

As a command line tool

Running the conf examples

As an API

Output

Parallelization

Contributing

License

Citation

evalne's People

Contributors

Stargazers

Watchers

Forkers

evalne's Issues

Recommend Projects

Recommend Topics

Recommend Org