Giter Club home page Giter Club logo

grape's People

Contributors

caufieldjh avatar cthoyt avatar elenacasiraghi avatar gvalentini58 avatar lucacappelletti94 avatar mughetto avatar pnrobinson avatar zommiommy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grape's Issues

Grape edge prediction # edges reported

Hi!

I am using edge prediction evaluation and LogisticRegressionCVEdgePrediction for my graph, which has the following:

print(f"Number of edges in graph: {len(graph.get_edge_node_ids(directed=False))}")
Number of edges in graph: 4541
print(f"Number of nodes in graph: {len(graph.get_node_ids())}")
Number of nodes in graph: 1875.

However, when I run edge prediction evaluation:
results = edge_prediction_evaluation(
holdouts_kwargs=dict(train_size=0.8),
graphs=graph,
models=LogisticRegressionCVEdgePrediction(max_iter=500),
number_of_holdouts=1,
node_features=model
)

where model = Node2VecSkipGramEnsmallen(embedding_size=EMBEDDING_SIZE,walk_length = WALK_LENGTH,return_weight = RETURN_WEIGHT, explore_weight=EXPLORE_WEIGHT, iterations=NUM_WALKS).fit_transform(graph)

I find that the results report:
nodes_number: 1875
edges_number: 9082

I was wondering why the prediction is reporting double the number of edges than are actually present in the graph? Thanks!

Question about custom PyG and PyKeen models

First of all, congratulations on a great library. I just happen to have some questions surrounding the use of custom models alongside your preprocessing module.
I have a use case that might benefit from the flexibility of implementing custom models in PyG or PyKeen, or using already implemented models for which grape might not have support already. I ve seem that embiggen has some implemented wrappers, and I wonder if you could direct me to an example of how to use those PyG and PyKeen wrappers to implement custom models.
I also think this could be a really cool tutorial to have in your documentation, and I would be ok with contributing to adding support to more PyKeen models if desired :)
Thanks

load graph from a Pandas Dataframe

Big data engineering processes using Apache Spark produce triple sets. To avoid tedious IO serialisation and coalescing to/from CSV files PySpark provides toPandas() method. This method collects the partitioned and distributed dataset into the local memory of the driver node and make it accessible as Pandas data frame.
Thus, having a graph constructor straight from already produced data frames will be really convenient.

pedges = edges.toPandas()
pnodes = nodes.toPandas() 

g = (Graph.from_pd(directed=True, 
                    node_path=pnodes,
                    nodes_column_number=0,
                    node_list_node_types_column_number=1,
                    edge_path=pedges,
                    sources_column_number=0,
                    destinations_column_number=2,
                    edge_list_edge_types_column_number=4,
                    weights_column_number=11)
       .remove_components(top_k_components=1)
    )

embiggen package error under Windoze

The joy on installation on Windoze...

Collecting embiggen>=0.11.9
  Downloading embiggen-0.11.38.tar.gz (154 kB)
     ---------------------------------------- 154.2/154.2 kB ? eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [10 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\cygwin64\tmp\pip-install-37lyy1_b\embiggen_3ec9ca91df6044b1b2470bb84cb6184d\setup.py", line 54, in <module>
          long_description=readme(),
        File "C:\cygwin64\tmp\pip-install-37lyy1_b\embiggen_3ec9ca91df6044b1b2470bb84cb6184d\setup.py", line 12, in readme
          return f.read()
        File "C:\Users\richa\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
          return codecs.charmap_decode(input,self.errors,decoding_table)[0]
      UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 2: character maps to <undefined>
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

update grape.datasets.monarchinitiative to point to latest Monarch nodes/edges

Current this is not working:

from grape.datasets.monarchinitiative import Monarch
g = Monarch()
Downloading files: 0%
0/2 [00:00<?, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/ensmallen/datasets/graph_retrieval.py](https://localhost:8080/#) in download(self)
    349                 # Download the necessary data
--> 350                 self._downloader.download(
    351                     self._graph["urls"],

9 frames
ValueError: Request to url https://archive.monarchinitiative.org/202103/kgx/sri-reference-kg_nodes.tsv finished with status code 404.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/ensmallen/datasets/graph_retrieval.py](https://localhost:8080/#) in download(self)
    353                 )
    354             except Exception as e:
--> 355                 raise RuntimeError(
    356                     f"Something went wrong while downloading the graph {self._name}, "
    357                     f"version {self._version}, "

RuntimeError: Something went wrong while downloading the graph Monarch, version 202103, retrieved from the monarchinitiative repository. In this step, we are trying to download data provided from third parties, and such data may now be offline or moved. Please do investigate what has happened at the URLs reported below in this error message and do open up an issue in the Ensmallen's GitHub repository reporting also the completeexception of this error to help us keep the automatic graph retrieval in good shape. Thank you!Specifically, we were trying to download the following urls: ['https://archive.monarchinitiative.org/202103/kgx/sri-reference-kg_nodes.tsv', 'https://archive.monarchinitiative.org/202103/kgx/sri-reference-kg_edges.tsv']

New builds can be retrieved from here:
https://kg-hub.berkeleybop.io/kg-monarch/index.html

Hopefully this follows the pattern that Grape expects? @LucaCappelletti94 @putmantime

Problem running on Apple M1

I was pairing with Luca and we came across this problem running on mac M1

Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Connected to pydev debugger (build 231.9011.38)
Traceback (most recent call last):
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 680, in _load_unlocked
File "", line 850, in exec_module
File "", line 228, in _call_with_frames_removed
File "/Users/benchamberlain/workspace/subgraph-sketching/src/data.py", line 22, in
from src.datasets.elph import get_hashed_train_val_test_datasets, make_train_eval_data
File "/Users/benchamberlain/workspace/subgraph-sketching/src/datasets/elph.py", line 17, in
from embiggen.embedders.ensmallen_embedders.hyper_sketching import HyperSketching
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/site-packages/embiggen/init.py", line 2, in
from embiggen.visualizations import GraphVisualizer
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/site-packages/embiggen/visualizations/init.py", line 2, in
from embiggen.visualizations.graph_visualizer import GraphVisualizer
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/site-packages/embiggen/visualizations/graph_visualizer.py", line 16, in
from ensmallen import Graph # pylint: disable=no-name-in-module
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/site-packages/ensmallen/init.py", line 14, in
unavailable_flags = set(HASWELL_FLAGS) - set(cpuinfo.get_cpu_info()["flags"])
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/site-packages/cpuinfo/cpuinfo.py", line 2762, in get_cpu_info
output = json.loads(output, object_hook = _utf_to_str)
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/json/init.py", line 359, in loads
return cls(**kw).decode(s)
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
python-BaseException

NameError: name 'HASWELL_FLAGS' is not defined

Another hardware-related oddity, courtesy of our aging build server.
When I import grape, it raises NameError: name 'HASWELL_FLAGS' is not defined - this is on our build server, where we previously had issues with missing AVX flags.
In this case, I'm also in a Docker container and in a virtualenv, but it happens outside that env, too.

Python 3.9.5 (default, Nov 23 2021, 15:27:38)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import grape
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/.cache/pypoetry/virtualenvs/semsim-op1wFzG_-py3.9/lib/python3.9/site-packages/grape/__init__.py", line 9, in <module>
    from embiggen import *
  File "/root/.cache/pypoetry/virtualenvs/semsim-op1wFzG_-py3.9/lib/python3.9/site-packages/embiggen/__init__.py", line 2, in <module>
    from embiggen.visualizations import GraphVisualizer
  File "/root/.cache/pypoetry/virtualenvs/semsim-op1wFzG_-py3.9/lib/python3.9/site-packages/embiggen/visualizations/__init__.py", line 2, in <module>
    from embiggen.visualizations.graph_visualizer import GraphVisualizer
  File "/root/.cache/pypoetry/virtualenvs/semsim-op1wFzG_-py3.9/lib/python3.9/site-packages/embiggen/visualizations/graph_visualizer.py", line 17, in <module>
    from ensmallen import Graph  # pylint: disable=no-name-in-module
  File "/root/.cache/pypoetry/virtualenvs/semsim-op1wFzG_-py3.9/lib/python3.9/site-packages/ensmallen/__init__.py", line 27, in <module>
    ).format(HASWELL_FLAGS, unavailable_flags)
NameError: name 'HASWELL_FLAGS' is not defined

Looks like this could fail more gracefully, at least.

Error for visualizing embeddings

Embeddings were successfully generated with Node2VecSkipGramEnsmallen from GRAPE.
However, when trying to visualize the graph dimensionality reduction only the node degree plot showed in the top left, the other plots are blank along with this error message:

0 points=points,
5511 nrows=nrows,
5512 ncols=ncols,
5513 plotting_callbacks=plotting_callbacks,
5514 show_letters=show_letters,
5515 )

File ~/Documents/VIMSS/ontology/KG-Hub/embeddings/venv/lib/python3.10/site-packages/embiggen/visualizations/graph_visualizer.py:5155, in GraphVisualizer._fit_and_plot_all(self, points, nrows, ncols, plotting_callbacks, show_letters)
5150 evaluation_letters = []
5152 for ax, plot_callback, letter in zip(
5153 flat_axes, itertools.chain(plotting_callbacks), "abcdefghjkilmnopqrstuvwxyz"
5154 ):
-> 5155 figure, axes, caption = plot_callback(
5156 figure=figure,
5157 axes=ax,
5158 **(
5159 dict(loc="lower center")
5160 if "loc" in inspect.signature(plot_callback).parameters
5161 else dict()
5162 ),
5163 apply_tight_layout=False,
5164 )
5166 if "heatmap" in caption.lower():
5167 heatmaps_letters.append(letter)

File ~/Documents/VIMSS/ontology/KG-Hub/embeddings/venv/lib/python3.10/site-packages/embiggen/visualizations/graph_visualizer.py:3720, in GraphVisualizer.plot_node_degrees(self, figure, axes, scatter_kwargs, train_indices, test_indices, train_marker, test_marker, use_log_scale, show_title, show_legend, return_caption, loc, annotate_nodes, show_edges, edge_scatter_kwargs, **kwargs)
3649 def plot_node_degrees(
3650 self,
3651 figure: Optional[Figure] = None,
(...)
3666 **kwargs: Dict,
3667 ):
3668 """Plot node degrees heatmap.
3669
3670 Parameters
(...)
3718 Figure and Axis of the plot.
3719 """
-> 3720 return self._plot_node_metric(
3721 metric=np.fromiter(
3722 (
3723 self._support.get_node_degree_from_node_id(node_id)
3724 for node_id in self.iterate_subsampled_node_ids()
3725 ),
3726 dtype=np.uint32,
3727 ),
3728 metric_name="Node degrees",
3729 figure=figure,
3730 axes=axes,
3731 scatter_kwargs=scatter_kwargs,
3732 train_indices=train_indices,
3733 test_indices=test_indices,
3734 train_marker=train_marker,
3735 test_marker=test_marker,
3736 use_log_scale=use_log_scale,
3737 show_title=show_title,
3738 show_legend=show_legend,
3739 return_caption=return_caption,
3740 loc=loc,
3741 annotate_nodes=annotate_nodes,
3742 show_edges=show_edges,
3743 edge_scatter_kwargs=edge_scatter_kwargs,
3744 **kwargs,
3745 )

File ~/Documents/VIMSS/ontology/KG-Hub/embeddings/venv/lib/python3.10/site-packages/embiggen/visualizations/graph_visualizer.py:3630, in GraphVisualizer._plot_node_metric(self, metric, metric_name, figure, axes, scatter_kwargs, train_indices, test_indices, train_marker, test_marker, use_log_scale, show_title, show_legend, return_caption, loc, annotate_nodes, show_edges, edge_scatter_kwargs, **kwargs)
3628 color_bar = figure.colorbar(scatter[0], ax=axes)
3629 color_bar.set_alpha(1)
-> 3630 color_bar.draw_all()
3632 if annotate_nodes:
3633 figure, axes = self.annotate_nodes(
3634 figure=figure,
3635 axes=axes,
3636 points=self._node_decomposition,
3637 )

AttributeError: 'Colorbar' object has no attribute 'draw_all'

How to create a graph in grape when both nodes and edges have multiple features?

Hello, in my graph data, each node has many features, and the type of the feature value is float. Similarly, the edges also have many features, and the type of the feature value is also float. Please tell me how I should create a graph in grape through Graph.from_pd. I read related tutorials, but they did not introduce the situation if nodes have multiple features or edges have multiple features. At the same time, I did not find any relevant description through the help(Graph.from_pd) method.

best wishes!

get_available_models_for_node_embedding() returns NotImplementedError

I am trying to follow this tutorial to try out some embedding methods. However, running

>>> from grape import get_available_models_for_node_embedding
>>> 
>>> all_embedding_methods = get_available_models_for_node_embedding()
Traceback (most recent call last):
  File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 730, in get_model_metadata
    "requires_edge_type_features": model_class.requires_edge_type_features(),
  File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 380, in requires_edge_type_features
    raise NotImplementedError(
NotImplementedError: The `requires_edge_type_features` method must be implemented in the child classes of abstract model. It was not implemented in the class StubClass.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 763, in get_available_models_for_node_embedding
    df = get_models_dataframe()
  File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 752, in get_models_dataframe
    [
  File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 753, in <listcomp>
    get_model_metadata(model_class)
  File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 742, in get_model_metadata
    raise NotImplementedError(
NotImplementedError: Some of the mandatory static methods were not implemented in model class StubClass. The previous exception was: The `requires_edge_type_features` method must be implemented in the child classes of abstract model. It was not implemented in the class StubClass.

Versions:

>>> grape.print_version()
{'GRAPE Version': '0.2.2', 'Python version': '3.10.6', 'Platform': 'Linux-5.4.0-150-generic-x86_64-with-glibc2.31', 'Threads number': 48, 'PyTorch version': '1.13.0', 'PyKEEN version': '1.9.0'}
embiggen==0.11.71
ensmallen==0.8.65

Is there a work-around this? Thank you for a great package!

Edge prediction models not working in edge prediction evaluation pipeline

When using the edge prediction evaluation pipeline, I tried running a few different edge prediction models (e.g., KipfGCNEdgePrediction, not for the MLP, Decision Tree, or Random Forest), I got this error:

Traceback (most recent call last):
File [script I ran]
<module> KipfGCNEdgePrediction(),
File "/home/dylansteinecke/anaconda3/lib/python3.9/site-packages/embiggen/utils/abstract_models/model_stub.py", line 101, in init
super().init(**parent_class.smoke_test_parameters())
File "/home/dylansteinecke/anaconda3/lib/python3.9/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 124, in smoke_test_parameters
raise NotImplementedError((
NotImplementedError: The smoke_test_parameters method must be implemented in the child classes of abstract model.

Is there a way I can address it or are these models not currently permitted in the edge prediction evaluation pipeline? Thank you.

ImportError: libgfortran-ed201abd.so.3.0.0: cannot open shared object file: No such file or directory

In a fresh notebook, attempting to import grape yields an ImportError about a missing libgfortran-ed201abd.so.3.0.0.

>>> !pip install grape -U
>>> import grape
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?276a3afe-1b97-4f33-82e6-6df2db01934a)
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/home/harry/kg-bioportal/data/merged/KG-Bioportal analysis.ipynb Cell 2' in <cell line: 1>()
----> [1](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/harry/kg-bioportal/data/merged/KG-Bioportal%20analysis.ipynb#ch0000001vscode-remote?line=0) import grape

File ~/.local/lib/python3.8/site-packages/grape/__init__.py:9, in <module>
      1 """GraPE main module.
      2 
      3 For now, this is a simple wrapper of GraPE main two sub-modules that for
   (...)
      6 These packages are mimed here by the two sub-directories, ensmallen and embiggen.
      7 """
----> 9 from embiggen import *
     10 from ensmallen import Graph
     13 def import_all(module_locals):

File ~/.local/lib/python3.8/site-packages/embiggen/__init__.py:2, in <module>
      1 """Module with models for graph machine learning and visualization."""
----> 2 from embiggen.visualizations import GraphVisualizer
      3 from embiggen.utils import (
      4     EmbeddingResult,
      5     get_models_dataframe,
   (...)
      9     get_available_models_for_node_embedding,
     10 )
...
    691     'spherical_kn',
    692 ]
    694 from scipy._lib._testutils import PytestTester

ImportError: libgfortran-ed201abd.so.3.0.0: cannot open shared object file: No such file or directory

I've seen that this may be related to libraries packaged with numpy, as seen in the following:
ContinuumIO/anaconda-issues#445
numpy/numpy#14348

This may be environment-specific, of course.

typo: GraphVisualizer

The image produced by GraphVisualizer has this

Degrees distribution of graph <graphname>

In English, one always uses the singular in phrases like this. Also, 'graph' seems superfluous.
Suggested label:

Degree distribution of <graphname>

GRAPE on Heterogenous graphs

Hi !

I am a beginner in GNN and saw you repo and it seems that it could work for my problem but I just need to be sure.
My goal is to try to predict the chemical composition of organisms across the tree of life. I have a CSV file that is similar to this example :

molecules species papers mol_pathway mol_sub_pathway species_domain species_family
H20 Homo sapiens 14 Terpenoids Monoterpenoids Eukaryotes Hominidae

So at each row we have unique pair of molecule-species (I'm thinking that would be the edge between 2 nodes of different type hence the Heterogenous graph), a certain number of papers that have actually found the molecule in that species (edge weight ?) , and then some information about the molecule and the species.

In this database there are 2 things we know : how species are related (classic phylogenic tree) and how molecules are related (group-subgroup structure seen above).

One fair assumption is that closely related species may share a similar set of molecules and molecules related in their synthesis may share a similar distribution across species. What I would like to have as a result is a matrix of s (species) by m (molecules) of probabilities that tell me if the edge between that molecule and that species could exist.

My questions are :

  • Does GRAPE accepts Heterogenous graphs ?
  • If yes, I didn't quite understand how to build a graph in GRAPE. In the documentation you show how to download a graph from the different databases but how to load my own dataset wasn't very clear to me.
  • Can the nodes have certain features that would then be used for the embedding (not sure about the proper terminology) ? For example species would have the phylogenic tree as features.

Sorry if those are very rooky questions, and thanks in advance for the reply ! :)

Embedding model names not recognized; alternate suggestions are unexpected

As of grape 0.1.9, node embedding model names have changed, such that a call to embiggen's AbstractModel.get_task_data(model_name, task_name) with one of the frequently used model names like CBOW or SkipGram throws a ValueError.

I see from grape.get_available_models_for_node_embedding() that these now have more specific names like Node2Vec CBOW.
No problem with being specific, but we'd still like to be able to specify CBOW, SkipGram, or GloVe in config definitions without having to verify the exact model names embiggen is expecting first. Could we use the short names as aliases to a default model, like CBOW will be understood as Node2Vec CBOW, etc?

The name convention also appears to confuse the alternative suggests provided in the ValueError text, so we get suggestions like this:

ValueError: The provided model name `CBOW` is not available. Did you mean BoxE?

ValueError when trying to use external embedder like in pykeen and karateClub

Hello,
Thanks you for your amaeing work, i'm a phD student working on the embeddings of biomedical data particularly in immunogenetics, and currently i'm comparing tools to embed data. I found your works very interesting. I got some issues when i try to use external model from pykeen and karateclub. i got this message :
ValueError: We have found an useless method in the class StubClass, implementing method HolE from library PyKEEN and task Node Embedding. It does not make sense to implement the `requires_positive_edge_weights` method when the `can_use_edge_weights` always returns False, as it is already handled in the root abstract model class.

Also for the vizualisation, when i did ```  from grape import GraphVisualizer
visualizer = GraphVisualizer(kg.remove_disconnected_nodes())
visualizer.fit_and_plot_all(embedding)

I got this warning without no visualisation:  FutureWarning: The parameter `square_distances` has not effect and will be removed in version 1.3.
Thank you in advance for your answer
Gaoussou

pip install grape failure on support_luca>=1.0.2

I am attempting to install grape using pip on Ubuntu 20.04.4 LTS with python 3.8.3.

Most of the build/install appears to work just fine until I hit this error, providing a little additional context. I have also tried to install ensmallen directly with pip install ensmallen and I get the same error. Any advice you have would be appreciated.

Requirement already satisfied: idna<3,>=2.5 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (1.25.9)
Requirement already satisfied: certifi>=2017.4.17 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (2020.6.20)
Requirement already satisfied: chardet<4,>=3.0.2 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (3.0.4)
Collecting typing-extensions>=3.7.4.3
  Using cached typing_extensions-4.3.0-py3-none-any.whl (25 kB)
ERROR: Could not find a version that satisfies the requirement support_luca>=1.0.2 (from dict_hash>=1.1.25->cache_decorator>=2.1.11->ensmallen>=0.8.21->grape) (from versions: none)
ERROR: No matching distribution found for support_luca>=1.0.2 (from dict_hash>=1.1.25->cache_decorator>=2.1.11->ensmallen>=0.8.21->grape)

Errors from `compute_pairwise_resnik`

Hello!

We're trying to put together some unit tests for computing pairwise Resnik similarity, but are encountering an error with a description that doesn't seem correct.

With the input here and a test like this:

    def setUp(self) -> None:
        """Set up."""
        self.test_graph_path_nodes = "tests/resources/test_hpo_nodes.tsv"
        self.test_graph_path_edges = "tests/resources/test_hpo_edges.tsv"
        self.resnik_outpath = "tests/output/resnik_out"
        self.test_graph = Graph.from_csv(
            directed=True,
            node_path=self.test_graph_path_nodes,
            edge_path=self.test_graph_path_edges,
            nodes_column="id",
            node_list_node_types_column="category",
            sources_column="subject",
            destinations_column="object",
            edge_list_edge_types_column="predicate",
        )
        self.test_counts = {
            "HP:0000118": 23,
            "HP:0000001": 24,
            "HP:0001507": 1,
            "HP:0001574": 1,
            "HP:0001871": 1,
            "HP:0033127": 1,
            "HP:0025354": 1,
            "HP:0001608": 1,
            "HP:0001197": 1,
            "HP:0000119": 1,
            "HP:0001939": 1,
            "HP:0000707": 1,
            "HP:0025031": 1,
            "HP:0001626": 1,
            "HP:0000818": 1,
            "HP:0025142": 1,
            "HP:0002086": 1,
            "HP:0002715": 1,
            "HP:0000478": 1,
            "HP:0040064": 1,
            "HP:0002664": 1,
            "HP:0000598": 1,
            "HP:0000769": 1,
            "HP:0045027": 1,
            "HP:0000152": 1,
        }

...

    def test_compute_pairwise_resnik(self) -> None:
        """Test pairwise Resnik computation."""

        compute_pairwise_resnik(
            dag=self.test_graph,
            counts=self.test_counts,
            path=self.resnik_outpath,
        )
        self.assertTrue(os.path.exists(self.resnik_outpath))

we consistently get an error like this:

ValueError: The provided two nodes 3 and 1 do not have a shared parent node. Perhaps, the provided DAG has multiple root nodes and these two nodes are in different root portions of the DAG. Another analogous explanation is that the two nodes may be in different connected components.

Any idea what's going on here? The two nodes definitely have shared ancestry and are in the same DAG.
Does this have something to do with the directionality of the graph? A full version of HPO subclass_of nodes/edges works without issue.

@hrshdhgd @justaddcoffee

Parallelized Embedding

Hey,
I'm trying to process a directed graph, the scales are about 5 million nodes and 100 million edges.
I've managed to load the graph from a csv file, i get a very nice Graph object (within 5 minutes).
I'm now trying to embedd the graph with grape.embedders.Node2VecSkipGramEnsmallen, but it doesn't seem to succeed, I've let it run for over 10 hours.
In order to make it faster, i did enable the Graph's vector_source, vector_cumulative_node_degree and vector_reciprocal_sqrt_degrees.
Reading your paper, it seems that the embedding process could be parallelized, but i can't find the way to do that.
I'd appreciate if you could describe what part/s of the embedding process are parallelized? and how can i make it run in parallel?
Thank you,
Bruria.

Bring your own data?

I'm emulating a lazy reviewer by only reading the README - I'd much rather bring my own data than use something built in. It would be great to have a example on the README that shows how the data should look (i.e., file format) and also what code you need to use to embed that network

Node2VecSkipGramEnsmallen lacks num_walks parameter

Hi,

I am using the Node2VecSkipGramEnsmallen function and noticed that it does not take a num_walks parameter to represent the number of walks per node. The parameters that I see that are available are:
mappingproxy({'embedding_size': <Parameter "embedding_size: int = 100">,
'epochs': <Parameter "epochs: int = 30">,
'clipping_value': <Parameter "clipping_value: float = 6.0">,
'number_of_negative_samples': <Parameter "number_of_negative_samples: int = 10">,
'walk_length': <Parameter "walk_length: int = 128">,
'iterations': <Parameter "iterations: int = 10">,
'window_size': <Parameter "window_size: int = 5">,
'return_weight': <Parameter "return_weight: float = 0.25">,
'explore_weight': <Parameter "explore_weight: float = 4.0">,
'max_neighbours': <Parameter "max_neighbours: Optional[int] = 100">,
'learning_rate': <Parameter "learning_rate: float = 0.01">,
'learning_rate_decay': <Parameter "learning_rate_decay: float = 0.9">,
'central_nodes_embedding_path': <Parameter "central_nodes_embedding_path: Optional[str] = None">,
'contextual_nodes_embedding_path': <Parameter "contextual_nodes_embedding_path: Optional[str] = None">,
'normalize_by_degree': <Parameter "normalize_by_degree: bool = False">,
'stochastic_downsample_by_degree': <Parameter "stochastic_downsample_by_degree: Optional[bool] = False">,
'normalize_learning_rate_by_degree': <Parameter "normalize_learning_rate_by_degree: Optional[bool] = False">,
'use_scale_free_distribution': <Parameter "use_scale_free_distribution: Optional[bool] = True">,
'random_state': <Parameter "random_state: int = 42">,
'dtype': <Parameter "dtype: str = 'f32'">,
'ring_bell': <Parameter "ring_bell: bool = False">,
'enable_cache': <Parameter "enable_cache: bool = False">,
'verbose': <Parameter "verbose: bool = True">})
I was wondering if iterations is the equivalent to num_walks? Thanks!

Grape documentation

Hi, I've been exploring this repository to work on graphs. The report in text format of the graphs is very useful, anyhow I find it difficult to find the metadata of the graph, except the weight. The documentation also does not make it clear, because it provides basically a description of the function name and the data types, but not what these data types represent and what is the intuition behind.

One example is a when a Graph is created with the parameter additional_graph_kwargs. Where do I find which kwargs I can set?

There are other examples that are not only regarding the dataset, but graph processing in general. It is not clear what the intuition behind certain functions is.

Is there any plan on improving the documentation so that it is more likely that users use your framework?

Error in "compute_node_embedding"

I ran the default pipeline of using CBOW, SkipGram and Glove to embed Cora.ipynb. The pipeline successfully ran without error until it reached to "compute_node_embedding" where the following error appeared:
IndexError Traceback (most recent call last)
in ()
6 first_order_rw_node_embedding, training_history = compute_node_embedding(
7 graph,
----> 8 node_embedding_method_name=node_embedding_method_name,
9 )
...
IndexError: pop from empty list

TransE error: "ValueError: One of the provided node embedding computed with the TransE method contains NaN values."

When generating embeddings for KG-Microbe (KGX edge file from KG-Hub) using TransE, the following error was observed:

ValueError Traceback (most recent call last)
in
----> 1 embedding = model.fit_transform(kg)

~/Library/Python/3.7/lib/python/site-packages/cache_decorator/cache.py in wrapped(*args, **kwargs)
595 if not cache_enabled:
596 self.logger.info("The cache is disabled")
--> 597 result = function(*args, **kwargs)
598 self._check_return_type_compatability(result, self.cache_path)
599 return result

~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/abstract_embedding_model.py in fit_transform(self, graph, return_dataframe, verbose)
164 graph=graph,
165 return_dataframe=return_dataframe,
--> 166 verbose=verbose
167 )
168

~/Library/Python/3.7/lib/python/site-packages/embiggen/embedders/ensmallen_embedders/transe.py in _fit_transform(self, graph, return_dataframe, verbose)
112 embedding_method_name=self.model_name(),
113 node_embeddings= node_embedding,
--> 114 edge_type_embeddings= edge_type_embedding,
115 )
116

~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/embedding_result.py in init(self, embedding_method_name, node_embeddings, edge_embeddings, node_type_embeddings, edge_type_embeddings)
76 if np.isnan(numpy_embedding).any():
77 raise ValueError(
---> 78 f"One of the provided {embedding_list_name} "
79 f"computed with the {embedding_method_name} method "
80 "contains NaN values."

ValueError: One of the provided node embedding computed with the TransE method contains NaN values.

I am attaching a jupyter notebook to reproduce the problem.
load_graph_and.ipynb.zip

The input edge file is here: https://kg-hub.berkeleybop.io/kg-microbe/current/kg-microbe.tar.gz

Question regarding the main GRAPE module import

Hi! One minor question/suggestion: I'm wondering if it would be a better idea not to load all modules as is done right now. For example, when I want to load from the local module utils.py, it fails because utils was taken by ensmallen. One way I've tried to work around this is to pop the automatically loaded ensmallen utils module by sys.modules.pop['utils'].

grape/grape/__init__.py

Lines 13 to 30 in 788e195

def import_all(module_locals):
"""Execute dynamic loading of submodules."""
import ensmallen as _ensmallen
import embiggen as _embiggen
import sys as _sys
import pkgutil as _pkgutil
for _module in (_ensmallen, _embiggen):
for _loader, _module_name, _is_pkg in _pkgutil.iter_modules(_module.__path__):
if not _is_pkg:
continue
if _module_name.startswith(("_", "~")):
continue
_loaded_module = _loader.find_module(
_module_name
).load_module(_module_name)
_sys.modules[f'grape.{_module_name}'] = _loaded_module
module_locals[_module_name] = _loaded_module

Would it be better to simply do the following instead of importing everything?

import embiggen
import ensmallen

__all__ = ["embiggne", "ensmallen"]  # or add whatever top level modules that make sense

Graph visualization error

Hello.
I am trying the Using CBOW to embed Cora python notebook (linked) and after replacing "CBOWEnsmallen" with "DeepWalkCBOWEnsmallen", the first order embedding runs successfully but fails at the graph visualization. I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_3453/3695275499.py in <module>
----> 1 GraphVisualizer(
      2     graph,
      3     node_embedding_method_name="CBOW - First order"
      4 ).fit_and_plot_all(first_embedding)

~/anaconda3/lib/python3.9/site-packages/embiggen/visualizations/graph_visualizer.py in fit_and_plot_all(self, node_embedding, number_of_columns, show_letters, include_distribution_plots, **node_embedding_kwargs)
   4236         distribution_plot_methods_to_call = []
   4237 
-> 4238         if not self._graph.has_constant_non_zero_node_degrees():
   4239             node_scatter_plot_methods_to_call.append(
   4240                 self.plot_node_degrees,

AttributeError: The method 'has_constant_non_zero_node_degrees' does not exists, did you mean one of the following?
* 'has_constant_edge_weights'
* 'get_non_zero_subgraph_node_degrees'
* 'has_nodes'
* 'has_edges'
* 'has_selfloops'
* 'has_node_ontologies'
* 'has_node_oddities'
* 'get_node_degrees'
* 'has_node_name'
* 'has_node_types'

Looks like the issue has to do with embiggen dependencies in the graph visualization. Below are the package versions I am using:
embiggen==0.11.13
ensmallen==0.8.7
grape==0.1.9

As well, I was not able to successfully run the second-order embeddings

model = DeepWalkCBOWEnsmallen(
    return_weight=2.0,
    explore_weight=0.1
)
second_embedding = model.fit_transform(graph).get_node_embedding_from_index(0)

The above code gives the below error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_3453/3112314827.py in <module>
----> 1 model = DeepWalkCBOWEnsmallen(
      2     return_weight=2.0,
      3     explore_weight=0.1
      4 )
      5 second_embedding = model.fit_transform(graph).get_node_embedding_from_index(0)

TypeError: __init__() got an unexpected keyword argument 'return_weight'```

Need documentation on how to use a knowledge graph in grape

Hello, I have another question on how to import my data in grape. I think it is more a clarification on my method to import my KG.

kg = Graph.from_csv(directed=True,
                       edge_path="sample_mabkg.tsv",
                       sources_column_number= 0,
                       edge_list_edge_types_column_number=1,edge_list_separator="|",
                       destinations_column_number=2, name="mAbKG", verbose=True, edge_list_header=True)

but i saw that it exists node_path and other properties like in edge_path, so i don't know if i did in the good way my read from_csv. Can you please give me some explanation knowning i have a KG (with edge and node typed). Below is an example of my data.

Thank you for your answer

Gaoussou

 node source|edge|node destination
_:B4dff5e7d17225b25b13ad12737e49779|imgt:isDecidedBy|imgt:EC
pubmed:2843774|dc:title|Selective killing of HIV-infected cells by recombinant human CD4-Pseudomonas exotoxin hybrid protein.
imgt:Product_8e9250cf-276a-3282-954f-3791316ac5a6|rdf:type|obo:NCIT_C51980
imgt:Segment_212_1|obo:BFO_0000050|imgt:Construct_212
imgt:IgG4-kappa_1001|rdfs:label|IgG4-kappa_1001
imgt:V-D-GENE|owl:sameAs|obo:SO_0000510
imgt:Segment_536_1|rdf:type|imgt:Segment
imgt:LRR13|rdf:type|imgt:RepeatLabel
imgt:StudyProduct_c2bc9b3a-a15e-376f-bda5-f87089b3f54b|imgt:application_type|Therapeutic
imgt:StudyProduct_54a14ca8-f916-338b-af18-d079beb598a4|imgt:development_technology|  Dyax human antibody phage display library 

sample_mabkg.txt

The Everything Bagel GCN failed

I am following the tutorial to predict edges in an ensemble fashion. I computed three embeddings out of the graph

embedding_hyper_sketching = HyperSketching(number_of_hops=6).fit(g)
embedding_line_2 = SecondOrderLINEEnsmallen().fit_transform(g)
embedding_glee = GLEEEnsmallen().fit_transform(g)

I instantiate the object

model = GCNEdgePrediction(
    epochs=3, # 10 for production
    number_of_units_per_graph_convolution_layers = 32,
    number_of_units_per_ffnn_body_layer = 32,
    number_of_units_per_ffnn_head_layer = 16,
    kernels=["Symmetric Normalized Laplacian", "Transposed Symmetric Normalized Laplacian"],
    dropout_rate=0.7,
    use_edge_metrics=True,
    residual_convolutional_layers=False,
    use_node_embedding=True,
    edge_embedding_methods=["Concatenate", "Hadamard"],
    node_feature_names = ["GLEE", "LINE 2nd"],
    verbose=True
)

and when I compile it it complains

model.compile(
    graph=g,
    # The support graph is the graph whose topology is to be used for all things
    # including the convolutions, the metrics and the edge features.
    support=g,
    node_features=[embedding_glee, embedding_line_2],
    edge_features=[embedding_hyper_sketching]
)

with this message

AttributeError: 'EmbeddingResult' object has no attribute 'shape'

So then, when I fit it, the object throws this error

model.fit(
    graph=g,
    support=g,
    node_features=[embedding_glee, embedding_line_2],
    edge_features=[embedding_hyper_sketching]
)

NotImplementedError: Currently, we solely support edge features that are subclasses of AbstractEdgeFeature. This is because most commonly, it is not possible to precompute edge features for all possible edges of a complete graph and thus, we need to compute them on the fly. To do so, we need a common interface that allows us to query the edge features on demand, lazily, hence avoiding unsustainable memory peaks.You have provided an egde feature of type , which is not a subclass of AbstractEdgeFeature.

Getting figure to be inline

matplotlib plots figures inline by default or if we write

%matplotlib inline

Some of the figures produced by GRAPE get put into "subwindows" in the Jupyter notebook, and one needs to scroll up and down to see the entire figure. GRAPE does not seem to be responsive to the inline magic command above either.

For instance, in order for a certain figure to really appear online, I need to make it much smaller

visualizer = GraphVisualizer(sli_graph, automatically_display_on_notebooks=False)
fig, ax, cap = visualizer.plot_node_degree_distribution()
fig.set_figheight(3)
fig.set_figwidth(3)

even though the notebook could comfortably show (5,5) or even (8,8)

Issue loading graph from KG-Hub

Seems to download, but I'm getting an error seemingly when the graph is being loaded. Possibly either the nodes or edges file is not what GRAPE expects?

To reproduce:

from grape.datasets.kghub import KGIDG
g = KGIDG(version='20220722')

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/anaconda3/lib/python3.9/site-packages/ensmallen/datasets/graph_retrieval.py:419, in RetrievedGraph.__call__(self)
    413 try:
    414     (
    415         node_types_number,
    416         nodes_number,
    417         edge_types_number,
    418         edges_number
--> 419     ) = edge_list_utils.build_optimal_lists_files(
    420         # NOTE: the following parameters are supported by the parser, but
    421         # so far we have not encountered a single use case where we actually used them.
    422         # original_node_type_path,
    423         # original_node_type_list_separator,
    424         # original_node_types_column_number,
    425         # original_node_types_column,
    426         # original_numeric_node_type_ids,
    427         # original_minimum_node_type_id,
    428         # original_node_type_list_header,
    429         # original_node_type_list_support_balanced_quotes,
    430         # original_node_type_list_rows_to_skip,
    431         # original_node_type_list_max_rows_number,
    432         # original_node_type_list_comment_symbol,
    433         # original_load_node_type_list_in_parallel,
    434         # original_node_type_list_is_correct,
    435         # node_types_number,
    436         target_node_type_list_path=target_node_type_list_path,
    437         target_node_type_list_separator='\t',
    438         target_node_type_list_node_types_column_number=0,
    439         original_node_path=node_path,
    440         original_node_list_header=graph_arguments.get(
    441             "node_list_header"
    442         ),
    443         original_node_list_support_balanced_quotes=graph_arguments.get(
    444             "node_list_support_balanced_quotes"
    445         ),
    446         node_list_rows_to_skip=graph_arguments.get(
    447             "node_list_rows_to_skip"
    448         ),
    449         node_list_is_correct=graph_arguments.get(
    450             "node_list_is_correct"
    451         ),
    452         node_list_max_rows_number=graph_arguments.get(
    453             "node_list_max_rows_number"
    454         ),
    455         node_list_comment_symbol=graph_arguments.get(
    456             "node_list_comment_symbol"
    457         ),
    458         default_node_type=graph_arguments.get(
    459             "default_node_type"
    460         ),
    461         original_nodes_column_number=graph_arguments.get(
    462             "nodes_column_number"
    463         ),
    464         original_nodes_column=graph_arguments.get(
    465             "nodes_column"
    466         ),
    467         original_node_types_separator=graph_arguments.get(
    468             "node_types_separator"
    469         ),
    470         original_node_list_separator=graph_arguments.get(
    471             "node_list_separator"
    472         ),
    473         original_node_list_node_types_column_number=graph_arguments.get(
    474             "node_list_node_types_column_number"
    475         ),
    476         original_node_list_node_types_column=graph_arguments.get(
    477             "node_list_node_types_column"
    478         ),
    479         nodes_number=graph_arguments.get("nodes_number"),
    480         # original_minimum_node_id,
    481         # original_numeric_node_ids,
    482         # original_node_list_numeric_node_type_ids,
    483         original_skip_node_types_if_unavailable=True,
    484         # It make sense to load the node list in parallel only when
    485         # you have to preprocess the node types, since otherwise the nodes number
    486         # would be unknown.
    487         original_load_node_list_in_parallel=target_node_type_list_path is not None,
    488         maximum_node_id=graph_arguments.get(
    489             "maximum_node_id"
    490         ),
    491         target_node_path=target_node_path,
    492         target_node_list_separator='\t',
    493         target_nodes_column=graph_arguments.get(
    494             "nodes_column"
    495         ),
    496         target_nodes_column_number=0,
    497         target_node_list_node_types_column_number=1,
    498         target_node_types_separator="|",
    499         # original_edge_type_path,
    500         # original_edge_type_list_separator,
    501         # original_edge_types_column_number,
    502         # original_edge_types_column,
    503         # original_numeric_edge_type_ids,
    504         # original_minimum_edge_type_id,
    505         # original_edge_type_list_header,
    506         # edge_type_list_rows_to_skip,
    507         # edge_type_list_max_rows_number,
    508         # edge_type_list_comment_symbol,
    509         # load_edge_type_list_in_parallel=True,
    510         # edge_type_list_is_correct,
    511         # edge_types_number,
    512         target_edge_type_list_path=target_edge_type_list_path,
    513         target_edge_type_list_separator='\t',
    514         target_edge_type_list_edge_types_column_number=0,
    515         original_edge_path=os.path.join(
    516             self._cache_path, graph_arguments["edge_path"]),
    517         original_edge_list_header=graph_arguments.get(
    518             "edge_list_header"
    519         ),
    520         original_edge_list_support_balanced_quotes=graph_arguments.get(
    521             "edge_list_support_balanced_quotes"
    522         ),
    523         original_edge_list_separator=graph_arguments.get(
    524             "edge_list_separator"
    525         ),
    526         original_sources_column_number=graph_arguments.get(
    527             "sources_column_number"
    528         ),
    529         original_sources_column=graph_arguments.get(
    530             "sources_column"
    531         ),
    532         original_destinations_column_number=graph_arguments.get(
    533             "destinations_column_number"
    534         ),
    535         original_destinations_column=graph_arguments.get(
    536             "destinations_column"
    537         ),
    538         original_edge_list_edge_types_column_number=graph_arguments.get(
    539             "edge_list_edge_types_column_number"
    540         ),
    541         original_edge_list_edge_types_column=graph_arguments.get(
    542             "edge_list_edge_types_column"
    543         ),
    544         default_edge_type=graph_arguments.get(
    545             "default_edge_type"
    546         ),
    547         original_weights_column_number=graph_arguments.get(
    548             "weights_column_number"
    549         ),
    550         original_weights_column=graph_arguments.get(
    551             "weights_column"
    552         ),
    553         default_weight=graph_arguments.get(
    554             "default_weight"
    555         ),
    556         original_edge_list_numeric_node_ids=graph_arguments.get(
    557             "edge_list_numeric_node_ids"
    558         ),
    559         skip_weights_if_unavailable=graph_arguments.get(
    560             "skip_weights_if_unavailable"
    561         ),
    562         skip_edge_types_if_unavailable=graph_arguments.get(
    563             "skip_edge_types_if_unavailable"
    564         ),
    565         edge_list_comment_symbol=graph_arguments.get(
    566             "edge_list_comment_symbol"
    567         ),
    568         edge_list_max_rows_number=graph_arguments.get(
    569             "edge_list_max_rows_number"
    570         ),
    571         edge_list_rows_to_skip=graph_arguments.get(
    572             "edge_list_rows_to_skip"
    573         ),
    574         load_edge_list_in_parallel=True,
    575         remove_chevrons=graph_arguments.get(
    576             "remove_chevrons"
    577         ),
    578         remove_spaces=graph_arguments.get(
    579             "remove_spaces"
    580         ),
    581         edges_number=graph_arguments.get("edges_number"),
    582         target_edge_path=target_edge_path,
    583         target_edge_list_separator='\t',
    584         sort_temporary_directory=self._sort_tmp_dir,
    585         directed=self._directed,
    586         verbose=self._verbose > 0,
    587         name=self._name,
    588     )
    589 except Exception as e:

ValueError: Cannot open the file at graphs/kghub/KGIDG/20220722/KG-IDG/merged-kg_nodes.tsv

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Input In [32], in <cell line: 2>()
      1 from grape.datasets.kghub import KGIDG
----> 2 g = KGIDG(version='20220722')

File ~/anaconda3/lib/python3.9/site-packages/ensmallen/datasets/kghub.py:159, in KGIDG(directed, preprocess, bioregistry, load_nodes, load_node_types, load_edge_types, load_edge_weights, auto_enable_tradeoffs, sort_tmp_dir, verbose, ring_bell, cache, cache_path, cache_sys_var, version, **kwargs)
     95 def KGIDG(
     96     directed=False, preprocess="auto", bioregistry=False, load_nodes=True, load_node_types=True,
     97     load_edge_types=True, load_edge_weights=True, auto_enable_tradeoffs=True,
     98     sort_tmp_dir=None, verbose=2, ring_bell=False, cache=True, cache_path=None,
     99     cache_sys_var="GRAPH_CACHE_DIR", version="current", **kwargs
    100 ) -> Graph:
    101     """Return KG-IDG graph	
    102 
    103     Parameters
   (...)
    157 	
    158     """
--> 159     return RetrievedGraph(
    160         "KGIDG", version, "kghub", directed, preprocess, bioregistry, load_nodes,
    161         load_node_types, load_edge_types, load_edge_weights, auto_enable_tradeoffs, sort_tmp_dir,
    162         verbose, ring_bell, cache, cache_path, cache_sys_var, kwargs
    163     )()

File ~/anaconda3/lib/python3.9/site-packages/ensmallen/datasets/graph_retrieval.py:590, in RetrievedGraph.__call__(self)
    414     (
    415         node_types_number,
    416         nodes_number,
   (...)
    587         name=self._name,
    588     )
    589 except Exception as e:
--> 590     raise RuntimeError(
    591         f"Something went wrong while preprocessing the graph {self._name}, "
    592         f"version {self._version}, "
    593         f"retrieved from the {self._repository} repository. "
    594         "This is NOT the loading step, but a preprocessing step "
    595         "that loads remote data from third parties. "
    596         "As such there may have been some changes in the remote data "
    597         "that may have made them incompatible with the current "
    598         "expected parametrization. "
    599         "Do open up an issue in the Ensmallen's GitHub repository reporting also the complete"
    600         "exception of this error to help us keep the automatic graph retrieval "
    601         "in good shape. Thank you!"
    602     ) from e
    603 # Store the obtained metadata
    604 self.store_preprocessed_metadata(
    605     node_types_number,
    606     nodes_number,
    607     edge_types_number,
    608     edges_number
    609 )

RuntimeError: Something went wrong while preprocessing the graph KGIDG, version 20220722, retrieved from the kghub repository. This is NOT the loading step, but a preprocessing step that loads remote data from third parties. As such there may have been some changes in the remote data that may have made them incompatible with the current expected parametrization. Do open up an issue in the Ensmallen's GitHub repository reporting also the completeexception of this error to help us keep the automatic graph retrieval in good shape. Thank you!

Cannot install latest version of Grape

Trying to install grape version 0.2.2 causes an error:

System: Windows 10, 64bit, Intel
Python Version 3.11

pip install --no-cache-dir grape==0.2.2

ERROR: Could not find a version that satisfies the requirement ensmallen>=0.8.64 (from grape) (from versions: 0.0.1, 0.6.0, 0.6.1, 0.6.2, 0.6.3, 0.6.4, 0.6.5, 0.6.6, 0.7.0.dev19, 0.8.0, 0.8.1, 0.8.2, 0.8.25, 0.8.26, 0.8.27, 0.8.28, 0.8.29, 0.8.36, 0.8.42, 0.8.43, 0.8.44)
ERROR: No matching distribution found for ensmallen>=0.8.64

Any input if I am doing something wrong here?
Thank you in advance for your help!

setting node ontologies

Hi,

I see references to node ontologies and predicted node ontologies in the code, and as a result of GraphVisualizer.fit_and_plot_all. Is there a way to set ontological categories or other node attributes after reading in a graph so they'd show up when plot_node_ontologies is called, or something similar? If these are predicted, how is that done?

Thanks! Hope it's not too obvious a question to answer.

Saving classifier models

Could support for saving classifier models please be added?
This came up while meeting with @LucaCappelletti94 recently but it's become relevant again in the course of updating neat-ml to use grape classifiers.

Training classifiers isn't a major time commitment, but on our neat runs we've separated the process of training+testing vs. applying classifiers, so being unable to save or at least pickle the classifier object means we need to redo training for each model.

Method not found in api documentation: Graph.from_pd

Hi, After I installed grape 0.2.2 version, I can use the Graph.from_pd method by referring to the example, but I cannot find a detailed description of this method in the api document. It would be really great if the documentation could be updated.

Thanks and best wishes!

Error when using tensorflow embegging.

Hi Team, did some experimentation and found this issue when trying different tensorflow based embedding approaches:

from grape.embedders import StructuredEmbeddingTensorFlow
embedding = StructuredEmbeddingTensorFlow().fit_transform(graph)

produces error:

ValueError: We have found an useless method in the class StubClass, implementing method Structured Embedding from library TensorFlow and task Node Embedding. It does not make sense to implement the `requires_positive_edge_weights` method when the `can_use_edge_weights` always returns False, as it is already handled in the root abstract model class.

Anything I am doing wrong?

Thanks!

macOS arm64 import grape does not work // BMI2, AVX2 incompatibility

Hello,

I am sorry to annoy you with that, a macOS was not my first choice to work on, but here I am. So, technically, pip install grape do download and compile.

But, anytime I want to import it, it begins this warning :
UserWarning: Ensmallen is compiled for the Intel Haswell architecture (2013).On the current machine, the flags '['avx2', 'bmi2', 'popcnt']' are required but '{'avx2', 'bmi2'}' are not available. The library will use a slower but more compatible version (Intel Core2 2006).

And then, the kernel always die instead of going on.

As you may know, apple M1 does not support AVX/2 installation and I know this is beyond reach.

But do you have any clue on how I could still install grape locally?

Thanks

Question concerning node type loading from csv

Hi Team,
thank you for the awesome library!

I am trying to import a very basic dataset for a POC and struggling with the from_csv method.

I want to construct a graph using my own data:

pl.DataFrame({
    'source': ['A', 'A', 'A', 'A', 'A', 'F', 'F', 'F', 'A', 'F'],
    'destination': ['B', 'C', 'D', 'E' , 'F', 'G', 'H', 'I', 'J', 'J'],
}
).write_csv('edges.csv')

pl.DataFrame({
    'node_type': ['link', 'sat', 'sat', 'sat' , 'sat', 'link', 'sat', 'sat', 'sat', 'sat'],
}
).write_csv('node_types.csv')

pl.DataFrame({
    'node_name': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']}
).write_csv('node_names.csv')

Resulting in three csv files with a header each.

I am then constructing the graph using the following snippet:

graph = Graph.from_csv(
    #Edges
    edge_path="edges.csv",
    sources_column="source",
    destinations_column="destination",
    edge_list_header=True,
    #Nodes
    node_path = "node_names.csv",
    nodes_column = "node_name",
    node_list_header  = True,
    #Node Types
    node_type_path = "node_types.csv",
    node_types_column = "node_type",
    node_type_list_header = True,
    skip_node_types_if_unavailable = False,
    directed = False
)

When I run graph.get_node_type_names() I get the error:

ValueError                                Traceback (most recent call last)
Cell In [6], line 1
----> 1 graph.get_unique_node_type_ids()

ValueError: The current graph instance does not have node types.

Anything I am doing wrong?

Thanks for the time!

Methods for generating node embeddings from word embeddings

While updating NEAT to use the most recent grape release, @justaddcoffee and @hrshdhgd and I took a look at what we're using to generate node embeddings based on pretrained word embeddings like BERT etc. : https://github.com/Knowledge-Graph-Hub/NEAT/blob/main/neat/graph_embedding/graph_embedding.py

We know we can run something like get_okapi_tfidf_weighted_textual_embedding() on a graph, but is there a more "on demand" way to run this in grape now for an arbitrary graph?

Link to the two sub packages

Hi, first of all, thanks for making such an amazing graph embedding resource!

I'm wondering whether you can add some descriptions in the README clarifying that this repo is a thin wrapper of the two core packages embiggen and ensmallen and add links accordingly. I was a bit confused for a few minutes trying to find the source code and only came to realize it wraps the two libraries after looking at __init__.py.

`Illegal instruction (core dumped)` on importing grape

In another issue that may have something to do with our aging build server:
When we import grape in this environment (see info below), we get only Illegal instruction (core dumped).

cpuinfo output:

processor       : 23
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           X5675  @ 3.07GHz
stepping        : 2
microcode       : 0x1f
cpu MHz         : 1599.987
cache size      : 12288 KB
physical id     : 1
siblings        : 12
core id         : 10
cpu cores       : 6
apicid          : 53
initial apicid  : 53
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 6133.21
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Edge prediction pipeline suggestion

Opening an issue as per our discussion @LucaCappelletti94:

The edge prediction evaluation pipeline could benefit from a parameter such as node_features_names so that if a user passes in features without obvious names (e.g., the 2d array text embedding in node_features = [2d_array_text_embeddings, kg_embedding_function()]), the pipeline will name the different text embedding method(s) used. This would be similar to how it names the KG embedding functions based on the function names and saves them in a column.

Bipartite Graph predict proba with undirected graph

Hi.
I noticed the performance metrics are not identical when using predict_proba_bipartite_graph_from_edge_node_types, when I swap the source and destination nodes. The graph used as input is an undirected graph, which I would expect would yield similar predictions for the same edge type regardless of which is source and destination nodes. Is this behavior intentional?

Below are the version of the software I am running currently:
grape==0.1.17
embiggen==0.11.27
ensmallen==0.8.14

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.