anacletolab / grape Goto Github PK
View Code? Open in Web Editor NEW🍇 GRAPE is a Rust/Python Graph Representation Learning library for Predictions and Evaluations
License: MIT License
🍇 GRAPE is a Rust/Python Graph Representation Learning library for Predictions and Evaluations
License: MIT License
Hi!
I am using edge prediction evaluation and LogisticRegressionCVEdgePrediction for my graph, which has the following:
print(f"Number of edges in graph: {len(graph.get_edge_node_ids(directed=False))}")
Number of edges in graph: 4541
print(f"Number of nodes in graph: {len(graph.get_node_ids())}")
Number of nodes in graph: 1875.
However, when I run edge prediction evaluation:
results = edge_prediction_evaluation(
holdouts_kwargs=dict(train_size=0.8),
graphs=graph,
models=LogisticRegressionCVEdgePrediction(max_iter=500),
number_of_holdouts=1,
node_features=model
)
where model = Node2VecSkipGramEnsmallen(embedding_size=EMBEDDING_SIZE,walk_length = WALK_LENGTH,return_weight = RETURN_WEIGHT, explore_weight=EXPLORE_WEIGHT, iterations=NUM_WALKS).fit_transform(graph)
I find that the results report:
nodes_number: 1875
edges_number: 9082
I was wondering why the prediction is reporting double the number of edges than are actually present in the graph? Thanks!
First of all, congratulations on a great library. I just happen to have some questions surrounding the use of custom models alongside your preprocessing module.
I have a use case that might benefit from the flexibility of implementing custom models in PyG or PyKeen, or using already implemented models for which grape might not have support already. I ve seem that embiggen has some implemented wrappers, and I wonder if you could direct me to an example of how to use those PyG and PyKeen wrappers to implement custom models.
I also think this could be a really cool tutorial to have in your documentation, and I would be ok with contributing to adding support to more PyKeen models if desired :)
Thanks
Big data engineering processes using Apache Spark produce triple sets. To avoid tedious IO serialisation and coalescing to/from CSV files PySpark provides toPandas()
method. This method collects the partitioned and distributed dataset into the local memory of the driver node and make it accessible as Pandas data frame.
Thus, having a graph constructor straight from already produced data frames will be really convenient.
pedges = edges.toPandas()
pnodes = nodes.toPandas()
g = (Graph.from_pd(directed=True,
node_path=pnodes,
nodes_column_number=0,
node_list_node_types_column_number=1,
edge_path=pedges,
sources_column_number=0,
destinations_column_number=2,
edge_list_edge_types_column_number=4,
weights_column_number=11)
.remove_components(top_k_components=1)
)
The joy on installation on Windoze...
Collecting embiggen>=0.11.9
Downloading embiggen-0.11.38.tar.gz (154 kB)
---------------------------------------- 154.2/154.2 kB ? eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [10 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "C:\cygwin64\tmp\pip-install-37lyy1_b\embiggen_3ec9ca91df6044b1b2470bb84cb6184d\setup.py", line 54, in <module>
long_description=readme(),
File "C:\cygwin64\tmp\pip-install-37lyy1_b\embiggen_3ec9ca91df6044b1b2470bb84cb6184d\setup.py", line 12, in readme
return f.read()
File "C:\Users\richa\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 2: character maps to <undefined>
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Current this is not working:
from grape.datasets.monarchinitiative import Monarch
g = Monarch()
Downloading files: 0%
0/2 [00:00<?, ?it/s]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/ensmallen/datasets/graph_retrieval.py](https://localhost:8080/#) in download(self)
349 # Download the necessary data
--> 350 self._downloader.download(
351 self._graph["urls"],
9 frames
ValueError: Request to url https://archive.monarchinitiative.org/202103/kgx/sri-reference-kg_nodes.tsv finished with status code 404.
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/ensmallen/datasets/graph_retrieval.py](https://localhost:8080/#) in download(self)
353 )
354 except Exception as e:
--> 355 raise RuntimeError(
356 f"Something went wrong while downloading the graph {self._name}, "
357 f"version {self._version}, "
RuntimeError: Something went wrong while downloading the graph Monarch, version 202103, retrieved from the monarchinitiative repository. In this step, we are trying to download data provided from third parties, and such data may now be offline or moved. Please do investigate what has happened at the URLs reported below in this error message and do open up an issue in the Ensmallen's GitHub repository reporting also the completeexception of this error to help us keep the automatic graph retrieval in good shape. Thank you!Specifically, we were trying to download the following urls: ['https://archive.monarchinitiative.org/202103/kgx/sri-reference-kg_nodes.tsv', 'https://archive.monarchinitiative.org/202103/kgx/sri-reference-kg_edges.tsv']
New builds can be retrieved from here:
https://kg-hub.berkeleybop.io/kg-monarch/index.html
Hopefully this follows the pattern that Grape expects? @LucaCappelletti94 @putmantime
For consistency with other methods, this should be named
get_number_of_weighted_triads()
I was pairing with Luca and we came across this problem running on mac M1
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Connected to pydev debugger (build 231.9011.38)
Traceback (most recent call last):
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 680, in _load_unlocked
File "", line 850, in exec_module
File "", line 228, in _call_with_frames_removed
File "/Users/benchamberlain/workspace/subgraph-sketching/src/data.py", line 22, in
from src.datasets.elph import get_hashed_train_val_test_datasets, make_train_eval_data
File "/Users/benchamberlain/workspace/subgraph-sketching/src/datasets/elph.py", line 17, in
from embiggen.embedders.ensmallen_embedders.hyper_sketching import HyperSketching
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/site-packages/embiggen/init.py", line 2, in
from embiggen.visualizations import GraphVisualizer
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/site-packages/embiggen/visualizations/init.py", line 2, in
from embiggen.visualizations.graph_visualizer import GraphVisualizer
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/site-packages/embiggen/visualizations/graph_visualizer.py", line 16, in
from ensmallen import Graph # pylint: disable=no-name-in-module
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/site-packages/ensmallen/init.py", line 14, in
unavailable_flags = set(HASWELL_FLAGS) - set(cpuinfo.get_cpu_info()["flags"])
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/site-packages/cpuinfo/cpuinfo.py", line 2762, in get_cpu_info
output = json.loads(output, object_hook = _utf_to_str)
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/json/init.py", line 359, in loads
return cls(**kw).decode(s)
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/benchamberlain/anaconda3/envs/ss/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
python-BaseException
Another hardware-related oddity, courtesy of our aging build server.
When I import grape, it raises NameError: name 'HASWELL_FLAGS' is not defined
- this is on our build server, where we previously had issues with missing AVX flags.
In this case, I'm also in a Docker container and in a virtualenv, but it happens outside that env, too.
Python 3.9.5 (default, Nov 23 2021, 15:27:38)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import grape
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/.cache/pypoetry/virtualenvs/semsim-op1wFzG_-py3.9/lib/python3.9/site-packages/grape/__init__.py", line 9, in <module>
from embiggen import *
File "/root/.cache/pypoetry/virtualenvs/semsim-op1wFzG_-py3.9/lib/python3.9/site-packages/embiggen/__init__.py", line 2, in <module>
from embiggen.visualizations import GraphVisualizer
File "/root/.cache/pypoetry/virtualenvs/semsim-op1wFzG_-py3.9/lib/python3.9/site-packages/embiggen/visualizations/__init__.py", line 2, in <module>
from embiggen.visualizations.graph_visualizer import GraphVisualizer
File "/root/.cache/pypoetry/virtualenvs/semsim-op1wFzG_-py3.9/lib/python3.9/site-packages/embiggen/visualizations/graph_visualizer.py", line 17, in <module>
from ensmallen import Graph # pylint: disable=no-name-in-module
File "/root/.cache/pypoetry/virtualenvs/semsim-op1wFzG_-py3.9/lib/python3.9/site-packages/ensmallen/__init__.py", line 27, in <module>
).format(HASWELL_FLAGS, unavailable_flags)
NameError: name 'HASWELL_FLAGS' is not defined
Looks like this could fail more gracefully, at least.
Embeddings were successfully generated with Node2VecSkipGramEnsmallen from GRAPE.
However, when trying to visualize the graph dimensionality reduction only the node degree plot showed in the top left, the other plots are blank along with this error message:
0 points=points,
5511 nrows=nrows,
5512 ncols=ncols,
5513 plotting_callbacks=plotting_callbacks,
5514 show_letters=show_letters,
5515 )
File ~/Documents/VIMSS/ontology/KG-Hub/embeddings/venv/lib/python3.10/site-packages/embiggen/visualizations/graph_visualizer.py:5155, in GraphVisualizer._fit_and_plot_all(self, points, nrows, ncols, plotting_callbacks, show_letters)
5150 evaluation_letters = []
5152 for ax, plot_callback, letter in zip(
5153 flat_axes, itertools.chain(plotting_callbacks), "abcdefghjkilmnopqrstuvwxyz"
5154 ):
-> 5155 figure, axes, caption = plot_callback(
5156 figure=figure,
5157 axes=ax,
5158 **(
5159 dict(loc="lower center")
5160 if "loc" in inspect.signature(plot_callback).parameters
5161 else dict()
5162 ),
5163 apply_tight_layout=False,
5164 )
5166 if "heatmap" in caption.lower():
5167 heatmaps_letters.append(letter)
File ~/Documents/VIMSS/ontology/KG-Hub/embeddings/venv/lib/python3.10/site-packages/embiggen/visualizations/graph_visualizer.py:3720, in GraphVisualizer.plot_node_degrees(self, figure, axes, scatter_kwargs, train_indices, test_indices, train_marker, test_marker, use_log_scale, show_title, show_legend, return_caption, loc, annotate_nodes, show_edges, edge_scatter_kwargs, **kwargs)
3649 def plot_node_degrees(
3650 self,
3651 figure: Optional[Figure] = None,
(...)
3666 **kwargs: Dict,
3667 ):
3668 """Plot node degrees heatmap.
3669
3670 Parameters
(...)
3718 Figure and Axis of the plot.
3719 """
-> 3720 return self._plot_node_metric(
3721 metric=np.fromiter(
3722 (
3723 self._support.get_node_degree_from_node_id(node_id)
3724 for node_id in self.iterate_subsampled_node_ids()
3725 ),
3726 dtype=np.uint32,
3727 ),
3728 metric_name="Node degrees",
3729 figure=figure,
3730 axes=axes,
3731 scatter_kwargs=scatter_kwargs,
3732 train_indices=train_indices,
3733 test_indices=test_indices,
3734 train_marker=train_marker,
3735 test_marker=test_marker,
3736 use_log_scale=use_log_scale,
3737 show_title=show_title,
3738 show_legend=show_legend,
3739 return_caption=return_caption,
3740 loc=loc,
3741 annotate_nodes=annotate_nodes,
3742 show_edges=show_edges,
3743 edge_scatter_kwargs=edge_scatter_kwargs,
3744 **kwargs,
3745 )
File ~/Documents/VIMSS/ontology/KG-Hub/embeddings/venv/lib/python3.10/site-packages/embiggen/visualizations/graph_visualizer.py:3630, in GraphVisualizer._plot_node_metric(self, metric, metric_name, figure, axes, scatter_kwargs, train_indices, test_indices, train_marker, test_marker, use_log_scale, show_title, show_legend, return_caption, loc, annotate_nodes, show_edges, edge_scatter_kwargs, **kwargs)
3628 color_bar = figure.colorbar(scatter[0], ax=axes)
3629 color_bar.set_alpha(1)
-> 3630 color_bar.draw_all()
3632 if annotate_nodes:
3633 figure, axes = self.annotate_nodes(
3634 figure=figure,
3635 axes=axes,
3636 points=self._node_decomposition,
3637 )
AttributeError: 'Colorbar' object has no attribute 'draw_all'
Hello, in my graph data, each node has many features, and the type of the feature value is float. Similarly, the edges also have many features, and the type of the feature value is also float. Please tell me how I should create a graph in grape through Graph.from_pd. I read related tutorials, but they did not introduce the situation if nodes have multiple features or edges have multiple features. At the same time, I did not find any relevant description through the help(Graph.from_pd) method.
best wishes!
I am trying to follow this tutorial to try out some embedding methods. However, running
>>> from grape import get_available_models_for_node_embedding
>>>
>>> all_embedding_methods = get_available_models_for_node_embedding()
Traceback (most recent call last):
File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 730, in get_model_metadata
"requires_edge_type_features": model_class.requires_edge_type_features(),
File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 380, in requires_edge_type_features
raise NotImplementedError(
NotImplementedError: The `requires_edge_type_features` method must be implemented in the child classes of abstract model. It was not implemented in the class StubClass.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 763, in get_available_models_for_node_embedding
df = get_models_dataframe()
File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 752, in get_models_dataframe
[
File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 753, in <listcomp>
get_model_metadata(model_class)
File "/home/filco306/.envs/generalvenv/lib/python3.10/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 742, in get_model_metadata
raise NotImplementedError(
NotImplementedError: Some of the mandatory static methods were not implemented in model class StubClass. The previous exception was: The `requires_edge_type_features` method must be implemented in the child classes of abstract model. It was not implemented in the class StubClass.
Versions:
>>> grape.print_version()
{'GRAPE Version': '0.2.2', 'Python version': '3.10.6', 'Platform': 'Linux-5.4.0-150-generic-x86_64-with-glibc2.31', 'Threads number': 48, 'PyTorch version': '1.13.0', 'PyKEEN version': '1.9.0'}
embiggen==0.11.71
ensmallen==0.8.65
Is there a work-around this? Thank you for a great package!
When using the edge prediction evaluation pipeline, I tried running a few different edge prediction models (e.g., KipfGCNEdgePrediction, not for the MLP, Decision Tree, or Random Forest), I got this error:
Traceback (most recent call last):
File [script I ran]
<module> KipfGCNEdgePrediction(),
File "/home/dylansteinecke/anaconda3/lib/python3.9/site-packages/embiggen/utils/abstract_models/model_stub.py", line 101, in init
super().init(**parent_class.smoke_test_parameters())
File "/home/dylansteinecke/anaconda3/lib/python3.9/site-packages/embiggen/utils/abstract_models/abstract_model.py", line 124, in smoke_test_parameters
raise NotImplementedError((
NotImplementedError: The smoke_test_parameters
method must be implemented in the child classes of abstract model.
Is there a way I can address it or are these models not currently permitted in the edge prediction evaluation pipeline? Thank you.
In a fresh notebook, attempting to import grape
yields an ImportError about a missing libgfortran-ed201abd.so.3.0.0
.
>>> !pip install grape -U
>>> import grape
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?276a3afe-1b97-4f33-82e6-6df2db01934a)
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
/home/harry/kg-bioportal/data/merged/KG-Bioportal analysis.ipynb Cell 2' in <cell line: 1>()
----> [1](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/harry/kg-bioportal/data/merged/KG-Bioportal%20analysis.ipynb#ch0000001vscode-remote?line=0) import grape
File ~/.local/lib/python3.8/site-packages/grape/__init__.py:9, in <module>
1 """GraPE main module.
2
3 For now, this is a simple wrapper of GraPE main two sub-modules that for
(...)
6 These packages are mimed here by the two sub-directories, ensmallen and embiggen.
7 """
----> 9 from embiggen import *
10 from ensmallen import Graph
13 def import_all(module_locals):
File ~/.local/lib/python3.8/site-packages/embiggen/__init__.py:2, in <module>
1 """Module with models for graph machine learning and visualization."""
----> 2 from embiggen.visualizations import GraphVisualizer
3 from embiggen.utils import (
4 EmbeddingResult,
5 get_models_dataframe,
(...)
9 get_available_models_for_node_embedding,
10 )
...
691 'spherical_kn',
692 ]
694 from scipy._lib._testutils import PytestTester
ImportError: libgfortran-ed201abd.so.3.0.0: cannot open shared object file: No such file or directory
I've seen that this may be related to libraries packaged with numpy, as seen in the following:
ContinuumIO/anaconda-issues#445
numpy/numpy#14348
This may be environment-specific, of course.
The image produced by GraphVisualizer has this
Degrees distribution of graph <graphname>
In English, one always uses the singular in phrases like this. Also, 'graph' seems superfluous.
Suggested label:
Degree distribution of <graphname>
Hi !
I am a beginner in GNN and saw you repo and it seems that it could work for my problem but I just need to be sure.
My goal is to try to predict the chemical composition of organisms across the tree of life. I have a CSV file that is similar to this example :
molecules | species | papers | mol_pathway | mol_sub_pathway | species_domain | species_family |
---|---|---|---|---|---|---|
H20 | Homo sapiens | 14 | Terpenoids | Monoterpenoids | Eukaryotes | Hominidae |
So at each row we have unique pair of molecule-species (I'm thinking that would be the edge between 2 nodes of different type hence the Heterogenous graph), a certain number of papers that have actually found the molecule in that species (edge weight ?) , and then some information about the molecule and the species.
In this database there are 2 things we know : how species are related (classic phylogenic tree) and how molecules are related (group-subgroup structure seen above).
One fair assumption is that closely related species may share a similar set of molecules and molecules related in their synthesis may share a similar distribution across species. What I would like to have as a result is a matrix of s (species) by m (molecules) of probabilities that tell me if the edge between that molecule and that species could exist.
My questions are :
Sorry if those are very rooky questions, and thanks in advance for the reply ! :)
Thanks for that repo. It seems that you have integrated several tools / libraries / approaches under Grape's hood. Do you intend to create a tutorial for a customer analytics recommendation?
Thanks in advance.
The current release on PyPI points to https://github.com/LucaCappelletti94/grape in the homepage, which apparently wasn't gracefully transferred so it doesn't know to redirect here. If you update that metadata in the setup.py
Line 49 in fabc7ab
and release again, this should resolve itself
As of grape
0.1.9, node embedding model names have changed, such that a call to embiggen's AbstractModel.get_task_data(model_name, task_name)
with one of the frequently used model names like CBOW
or SkipGram
throws a ValueError
.
I see from grape.get_available_models_for_node_embedding()
that these now have more specific names like Node2Vec CBOW
.
No problem with being specific, but we'd still like to be able to specify CBOW
, SkipGram
, or GloVe
in config definitions without having to verify the exact model names embiggen is expecting first. Could we use the short names as aliases to a default model, like CBOW
will be understood as Node2Vec CBOW
, etc?
The name convention also appears to confuse the alternative suggests provided in the ValueError
text, so we get suggestions like this:
ValueError: The provided model name `CBOW` is not available. Did you mean BoxE?
Hello,
Thanks you for your amaeing work, i'm a phD student working on the embeddings of biomedical data particularly in immunogenetics, and currently i'm comparing tools to embed data. I found your works very interesting. I got some issues when i try to use external model from pykeen and karateclub. i got this message :
ValueError: We have found an useless method in the class StubClass, implementing method HolE from library PyKEEN and task Node Embedding. It does not make sense to implement the `requires_positive_edge_weights` method when the `can_use_edge_weights` always returns False, as it is already handled in the root abstract model class.
Also for the vizualisation, when i did ``` from grape import GraphVisualizer
visualizer = GraphVisualizer(kg.remove_disconnected_nodes())
visualizer.fit_and_plot_all(embedding)
I got this warning without no visualisation: FutureWarning: The parameter `square_distances` has not effect and will be removed in version 1.3.
Thank you in advance for your answer
Gaoussou
I am attempting to install grape using pip on Ubuntu 20.04.4 LTS with python 3.8.3.
Most of the build/install appears to work just fine until I hit this error, providing a little additional context. I have also tried to install ensmallen directly with pip install ensmallen
and I get the same error. Any advice you have would be appreciated.
Requirement already satisfied: idna<3,>=2.5 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (1.25.9)
Requirement already satisfied: certifi>=2017.4.17 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (2020.6.20)
Requirement already satisfied: chardet<4,>=3.0.2 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (3.0.4)
Collecting typing-extensions>=3.7.4.3
Using cached typing_extensions-4.3.0-py3-none-any.whl (25 kB)
ERROR: Could not find a version that satisfies the requirement support_luca>=1.0.2 (from dict_hash>=1.1.25->cache_decorator>=2.1.11->ensmallen>=0.8.21->grape) (from versions: none)
ERROR: No matching distribution found for support_luca>=1.0.2 (from dict_hash>=1.1.25->cache_decorator>=2.1.11->ensmallen>=0.8.21->grape)
Hello!
We're trying to put together some unit tests for computing pairwise Resnik similarity, but are encountering an error with a description that doesn't seem correct.
With the input here and a test like this:
def setUp(self) -> None:
"""Set up."""
self.test_graph_path_nodes = "tests/resources/test_hpo_nodes.tsv"
self.test_graph_path_edges = "tests/resources/test_hpo_edges.tsv"
self.resnik_outpath = "tests/output/resnik_out"
self.test_graph = Graph.from_csv(
directed=True,
node_path=self.test_graph_path_nodes,
edge_path=self.test_graph_path_edges,
nodes_column="id",
node_list_node_types_column="category",
sources_column="subject",
destinations_column="object",
edge_list_edge_types_column="predicate",
)
self.test_counts = {
"HP:0000118": 23,
"HP:0000001": 24,
"HP:0001507": 1,
"HP:0001574": 1,
"HP:0001871": 1,
"HP:0033127": 1,
"HP:0025354": 1,
"HP:0001608": 1,
"HP:0001197": 1,
"HP:0000119": 1,
"HP:0001939": 1,
"HP:0000707": 1,
"HP:0025031": 1,
"HP:0001626": 1,
"HP:0000818": 1,
"HP:0025142": 1,
"HP:0002086": 1,
"HP:0002715": 1,
"HP:0000478": 1,
"HP:0040064": 1,
"HP:0002664": 1,
"HP:0000598": 1,
"HP:0000769": 1,
"HP:0045027": 1,
"HP:0000152": 1,
}
...
def test_compute_pairwise_resnik(self) -> None:
"""Test pairwise Resnik computation."""
compute_pairwise_resnik(
dag=self.test_graph,
counts=self.test_counts,
path=self.resnik_outpath,
)
self.assertTrue(os.path.exists(self.resnik_outpath))
we consistently get an error like this:
ValueError: The provided two nodes 3 and 1 do not have a shared parent node. Perhaps, the provided DAG has multiple root nodes and these two nodes are in different root portions of the DAG. Another analogous explanation is that the two nodes may be in different connected components.
Any idea what's going on here? The two nodes definitely have shared ancestry and are in the same DAG.
Does this have something to do with the directionality of the graph? A full version of HPO subclass_of nodes/edges works without issue.
It would be handy to know what versions of a graph are available, especially for KG-Hub projects.
For example, it'd be nice to be able to do this:
from grape.datasets.kghub import KGCOVID19
KGCOVID19.versions()
and then choose a KG build to load
Hey,
I'm trying to process a directed graph, the scales are about 5 million nodes and 100 million edges.
I've managed to load the graph from a csv file, i get a very nice Graph object (within 5 minutes).
I'm now trying to embedd the graph with grape.embedders.Node2VecSkipGramEnsmallen, but it doesn't seem to succeed, I've let it run for over 10 hours.
In order to make it faster, i did enable the Graph's vector_source, vector_cumulative_node_degree and vector_reciprocal_sqrt_degrees.
Reading your paper, it seems that the embedding process could be parallelized, but i can't find the way to do that.
I'd appreciate if you could describe what part/s of the embedding process are parallelized? and how can i make it run in parallel?
Thank you,
Bruria.
I'm emulating a lazy reviewer by only reading the README - I'd much rather bring my own data than use something built in. It would be great to have a example on the README that shows how the data should look (i.e., file format) and also what code you need to use to embed that network
Hi,
I am using the Node2VecSkipGramEnsmallen function and noticed that it does not take a num_walks parameter to represent the number of walks per node. The parameters that I see that are available are:
mappingproxy({'embedding_size': <Parameter "embedding_size: int = 100">,
'epochs': <Parameter "epochs: int = 30">,
'clipping_value': <Parameter "clipping_value: float = 6.0">,
'number_of_negative_samples': <Parameter "number_of_negative_samples: int = 10">,
'walk_length': <Parameter "walk_length: int = 128">,
'iterations': <Parameter "iterations: int = 10">,
'window_size': <Parameter "window_size: int = 5">,
'return_weight': <Parameter "return_weight: float = 0.25">,
'explore_weight': <Parameter "explore_weight: float = 4.0">,
'max_neighbours': <Parameter "max_neighbours: Optional[int] = 100">,
'learning_rate': <Parameter "learning_rate: float = 0.01">,
'learning_rate_decay': <Parameter "learning_rate_decay: float = 0.9">,
'central_nodes_embedding_path': <Parameter "central_nodes_embedding_path: Optional[str] = None">,
'contextual_nodes_embedding_path': <Parameter "contextual_nodes_embedding_path: Optional[str] = None">,
'normalize_by_degree': <Parameter "normalize_by_degree: bool = False">,
'stochastic_downsample_by_degree': <Parameter "stochastic_downsample_by_degree: Optional[bool] = False">,
'normalize_learning_rate_by_degree': <Parameter "normalize_learning_rate_by_degree: Optional[bool] = False">,
'use_scale_free_distribution': <Parameter "use_scale_free_distribution: Optional[bool] = True">,
'random_state': <Parameter "random_state: int = 42">,
'dtype': <Parameter "dtype: str = 'f32'">,
'ring_bell': <Parameter "ring_bell: bool = False">,
'enable_cache': <Parameter "enable_cache: bool = False">,
'verbose': <Parameter "verbose: bool = True">})
I was wondering if iterations is the equivalent to num_walks? Thanks!
Hi, I've been exploring this repository to work on graphs. The report in text format of the graphs is very useful, anyhow I find it difficult to find the metadata of the graph, except the weight. The documentation also does not make it clear, because it provides basically a description of the function name and the data types, but not what these data types represent and what is the intuition behind.
One example is a when a Graph is created with the parameter additional_graph_kwargs. Where do I find which kwargs I can set?
There are other examples that are not only regarding the dataset, but graph processing in general. It is not clear what the intuition behind certain functions is.
Is there any plan on improving the documentation so that it is more likely that users use your framework?
I ran the default pipeline of using CBOW, SkipGram and Glove to embed Cora.ipynb. The pipeline successfully ran without error until it reached to "compute_node_embedding" where the following error appeared:
IndexError Traceback (most recent call last)
in ()
6 first_order_rw_node_embedding, training_history = compute_node_embedding(
7 graph,
----> 8 node_embedding_method_name=node_embedding_method_name,
9 )
...
IndexError: pop from empty list
When generating embeddings for KG-Microbe (KGX edge file from KG-Hub) using TransE, the following error was observed:
ValueError Traceback (most recent call last)
in
----> 1 embedding = model.fit_transform(kg)
~/Library/Python/3.7/lib/python/site-packages/cache_decorator/cache.py in wrapped(*args, **kwargs)
595 if not cache_enabled:
596 self.logger.info("The cache is disabled")
--> 597 result = function(*args, **kwargs)
598 self._check_return_type_compatability(result, self.cache_path)
599 return result
~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/abstract_embedding_model.py in fit_transform(self, graph, return_dataframe, verbose)
164 graph=graph,
165 return_dataframe=return_dataframe,
--> 166 verbose=verbose
167 )
168
~/Library/Python/3.7/lib/python/site-packages/embiggen/embedders/ensmallen_embedders/transe.py in _fit_transform(self, graph, return_dataframe, verbose)
112 embedding_method_name=self.model_name(),
113 node_embeddings= node_embedding,
--> 114 edge_type_embeddings= edge_type_embedding,
115 )
116
~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/embedding_result.py in init(self, embedding_method_name, node_embeddings, edge_embeddings, node_type_embeddings, edge_type_embeddings)
76 if np.isnan(numpy_embedding).any():
77 raise ValueError(
---> 78 f"One of the provided {embedding_list_name} "
79 f"computed with the {embedding_method_name} method "
80 "contains NaN values."
ValueError: One of the provided node embedding computed with the TransE method contains NaN values.
I am attaching a jupyter notebook to reproduce the problem.
load_graph_and.ipynb.zip
The input edge file is here: https://kg-hub.berkeleybop.io/kg-microbe/current/kg-microbe.tar.gz
Hi! One minor question/suggestion: I'm wondering if it would be a better idea not to load all modules as is done right now. For example, when I want to load from the local module utils.py
, it fails because utils
was taken by ensmallen
. One way I've tried to work around this is to pop the automatically loaded ensmallen utils
module by sys.modules.pop['utils']
.
Lines 13 to 30 in 788e195
Would it be better to simply do the following instead of importing everything?
import embiggen
import ensmallen
__all__ = ["embiggne", "ensmallen"] # or add whatever top level modules that make sense
Hello.
I am trying the Using CBOW to embed Cora python notebook (linked) and after replacing "CBOWEnsmallen" with "DeepWalkCBOWEnsmallen", the first order embedding runs successfully but fails at the graph visualization. I get the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_3453/3695275499.py in <module>
----> 1 GraphVisualizer(
2 graph,
3 node_embedding_method_name="CBOW - First order"
4 ).fit_and_plot_all(first_embedding)
~/anaconda3/lib/python3.9/site-packages/embiggen/visualizations/graph_visualizer.py in fit_and_plot_all(self, node_embedding, number_of_columns, show_letters, include_distribution_plots, **node_embedding_kwargs)
4236 distribution_plot_methods_to_call = []
4237
-> 4238 if not self._graph.has_constant_non_zero_node_degrees():
4239 node_scatter_plot_methods_to_call.append(
4240 self.plot_node_degrees,
AttributeError: The method 'has_constant_non_zero_node_degrees' does not exists, did you mean one of the following?
* 'has_constant_edge_weights'
* 'get_non_zero_subgraph_node_degrees'
* 'has_nodes'
* 'has_edges'
* 'has_selfloops'
* 'has_node_ontologies'
* 'has_node_oddities'
* 'get_node_degrees'
* 'has_node_name'
* 'has_node_types'
Looks like the issue has to do with embiggen dependencies in the graph visualization. Below are the package versions I am using:
embiggen==0.11.13
ensmallen==0.8.7
grape==0.1.9
As well, I was not able to successfully run the second-order embeddings
model = DeepWalkCBOWEnsmallen(
return_weight=2.0,
explore_weight=0.1
)
second_embedding = model.fit_transform(graph).get_node_embedding_from_index(0)
The above code gives the below error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_3453/3112314827.py in <module>
----> 1 model = DeepWalkCBOWEnsmallen(
2 return_weight=2.0,
3 explore_weight=0.1
4 )
5 second_embedding = model.fit_transform(graph).get_node_embedding_from_index(0)
TypeError: __init__() got an unexpected keyword argument 'return_weight'```
Hello, I have another question on how to import my data in grape. I think it is more a clarification on my method to import my KG.
kg = Graph.from_csv(directed=True,
edge_path="sample_mabkg.tsv",
sources_column_number= 0,
edge_list_edge_types_column_number=1,edge_list_separator="|",
destinations_column_number=2, name="mAbKG", verbose=True, edge_list_header=True)
but i saw that it exists node_path and other properties like in edge_path, so i don't know if i did in the good way my read from_csv. Can you please give me some explanation knowning i have a KG (with edge and node typed). Below is an example of my data.
Thank you for your answer
Gaoussou
node source|edge|node destination
_:B4dff5e7d17225b25b13ad12737e49779|imgt:isDecidedBy|imgt:EC
pubmed:2843774|dc:title|Selective killing of HIV-infected cells by recombinant human CD4-Pseudomonas exotoxin hybrid protein.
imgt:Product_8e9250cf-276a-3282-954f-3791316ac5a6|rdf:type|obo:NCIT_C51980
imgt:Segment_212_1|obo:BFO_0000050|imgt:Construct_212
imgt:IgG4-kappa_1001|rdfs:label|IgG4-kappa_1001
imgt:V-D-GENE|owl:sameAs|obo:SO_0000510
imgt:Segment_536_1|rdf:type|imgt:Segment
imgt:LRR13|rdf:type|imgt:RepeatLabel
imgt:StudyProduct_c2bc9b3a-a15e-376f-bda5-f87089b3f54b|imgt:application_type|Therapeutic
imgt:StudyProduct_54a14ca8-f916-338b-af18-d079beb598a4|imgt:development_technology| Dyax human antibody phage display library
I am following the tutorial to predict edges in an ensemble fashion. I computed three embeddings out of the graph
embedding_hyper_sketching = HyperSketching(number_of_hops=6).fit(g)
embedding_line_2 = SecondOrderLINEEnsmallen().fit_transform(g)
embedding_glee = GLEEEnsmallen().fit_transform(g)
I instantiate the object
model = GCNEdgePrediction(
epochs=3, # 10 for production
number_of_units_per_graph_convolution_layers = 32,
number_of_units_per_ffnn_body_layer = 32,
number_of_units_per_ffnn_head_layer = 16,
kernels=["Symmetric Normalized Laplacian", "Transposed Symmetric Normalized Laplacian"],
dropout_rate=0.7,
use_edge_metrics=True,
residual_convolutional_layers=False,
use_node_embedding=True,
edge_embedding_methods=["Concatenate", "Hadamard"],
node_feature_names = ["GLEE", "LINE 2nd"],
verbose=True
)
and when I compile it it complains
model.compile(
graph=g,
# The support graph is the graph whose topology is to be used for all things
# including the convolutions, the metrics and the edge features.
support=g,
node_features=[embedding_glee, embedding_line_2],
edge_features=[embedding_hyper_sketching]
)
with this message
AttributeError: 'EmbeddingResult' object has no attribute 'shape'
So then, when I fit it, the object throws this error
model.fit(
graph=g,
support=g,
node_features=[embedding_glee, embedding_line_2],
edge_features=[embedding_hyper_sketching]
)
NotImplementedError: Currently, we solely support edge features that are subclasses of AbstractEdgeFeature. This is because most commonly, it is not possible to precompute edge features for all possible edges of a complete graph and thus, we need to compute them on the fly. To do so, we need a common interface that allows us to query the edge features on demand, lazily, hence avoiding unsustainable memory peaks.You have provided an egde feature of type , which is not a subclass of AbstractEdgeFeature.
matplotlib plots figures inline by default or if we write
%matplotlib inline
Some of the figures produced by GRAPE get put into "subwindows" in the Jupyter notebook, and one needs to scroll up and down to see the entire figure. GRAPE does not seem to be responsive to the inline
magic command above either.
For instance, in order for a certain figure to really appear online, I need to make it much smaller
visualizer = GraphVisualizer(sli_graph, automatically_display_on_notebooks=False)
fig, ax, cap = visualizer.plot_node_degree_distribution()
fig.set_figheight(3)
fig.set_figwidth(3)
even though the notebook could comfortably show (5,5) or even (8,8)
Seems to download, but I'm getting an error seemingly when the graph is being loaded. Possibly either the nodes or edges file is not what GRAPE expects?
To reproduce:
from grape.datasets.kghub import KGIDG
g = KGIDG(version='20220722')
Output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/anaconda3/lib/python3.9/site-packages/ensmallen/datasets/graph_retrieval.py:419, in RetrievedGraph.__call__(self)
413 try:
414 (
415 node_types_number,
416 nodes_number,
417 edge_types_number,
418 edges_number
--> 419 ) = edge_list_utils.build_optimal_lists_files(
420 # NOTE: the following parameters are supported by the parser, but
421 # so far we have not encountered a single use case where we actually used them.
422 # original_node_type_path,
423 # original_node_type_list_separator,
424 # original_node_types_column_number,
425 # original_node_types_column,
426 # original_numeric_node_type_ids,
427 # original_minimum_node_type_id,
428 # original_node_type_list_header,
429 # original_node_type_list_support_balanced_quotes,
430 # original_node_type_list_rows_to_skip,
431 # original_node_type_list_max_rows_number,
432 # original_node_type_list_comment_symbol,
433 # original_load_node_type_list_in_parallel,
434 # original_node_type_list_is_correct,
435 # node_types_number,
436 target_node_type_list_path=target_node_type_list_path,
437 target_node_type_list_separator='\t',
438 target_node_type_list_node_types_column_number=0,
439 original_node_path=node_path,
440 original_node_list_header=graph_arguments.get(
441 "node_list_header"
442 ),
443 original_node_list_support_balanced_quotes=graph_arguments.get(
444 "node_list_support_balanced_quotes"
445 ),
446 node_list_rows_to_skip=graph_arguments.get(
447 "node_list_rows_to_skip"
448 ),
449 node_list_is_correct=graph_arguments.get(
450 "node_list_is_correct"
451 ),
452 node_list_max_rows_number=graph_arguments.get(
453 "node_list_max_rows_number"
454 ),
455 node_list_comment_symbol=graph_arguments.get(
456 "node_list_comment_symbol"
457 ),
458 default_node_type=graph_arguments.get(
459 "default_node_type"
460 ),
461 original_nodes_column_number=graph_arguments.get(
462 "nodes_column_number"
463 ),
464 original_nodes_column=graph_arguments.get(
465 "nodes_column"
466 ),
467 original_node_types_separator=graph_arguments.get(
468 "node_types_separator"
469 ),
470 original_node_list_separator=graph_arguments.get(
471 "node_list_separator"
472 ),
473 original_node_list_node_types_column_number=graph_arguments.get(
474 "node_list_node_types_column_number"
475 ),
476 original_node_list_node_types_column=graph_arguments.get(
477 "node_list_node_types_column"
478 ),
479 nodes_number=graph_arguments.get("nodes_number"),
480 # original_minimum_node_id,
481 # original_numeric_node_ids,
482 # original_node_list_numeric_node_type_ids,
483 original_skip_node_types_if_unavailable=True,
484 # It make sense to load the node list in parallel only when
485 # you have to preprocess the node types, since otherwise the nodes number
486 # would be unknown.
487 original_load_node_list_in_parallel=target_node_type_list_path is not None,
488 maximum_node_id=graph_arguments.get(
489 "maximum_node_id"
490 ),
491 target_node_path=target_node_path,
492 target_node_list_separator='\t',
493 target_nodes_column=graph_arguments.get(
494 "nodes_column"
495 ),
496 target_nodes_column_number=0,
497 target_node_list_node_types_column_number=1,
498 target_node_types_separator="|",
499 # original_edge_type_path,
500 # original_edge_type_list_separator,
501 # original_edge_types_column_number,
502 # original_edge_types_column,
503 # original_numeric_edge_type_ids,
504 # original_minimum_edge_type_id,
505 # original_edge_type_list_header,
506 # edge_type_list_rows_to_skip,
507 # edge_type_list_max_rows_number,
508 # edge_type_list_comment_symbol,
509 # load_edge_type_list_in_parallel=True,
510 # edge_type_list_is_correct,
511 # edge_types_number,
512 target_edge_type_list_path=target_edge_type_list_path,
513 target_edge_type_list_separator='\t',
514 target_edge_type_list_edge_types_column_number=0,
515 original_edge_path=os.path.join(
516 self._cache_path, graph_arguments["edge_path"]),
517 original_edge_list_header=graph_arguments.get(
518 "edge_list_header"
519 ),
520 original_edge_list_support_balanced_quotes=graph_arguments.get(
521 "edge_list_support_balanced_quotes"
522 ),
523 original_edge_list_separator=graph_arguments.get(
524 "edge_list_separator"
525 ),
526 original_sources_column_number=graph_arguments.get(
527 "sources_column_number"
528 ),
529 original_sources_column=graph_arguments.get(
530 "sources_column"
531 ),
532 original_destinations_column_number=graph_arguments.get(
533 "destinations_column_number"
534 ),
535 original_destinations_column=graph_arguments.get(
536 "destinations_column"
537 ),
538 original_edge_list_edge_types_column_number=graph_arguments.get(
539 "edge_list_edge_types_column_number"
540 ),
541 original_edge_list_edge_types_column=graph_arguments.get(
542 "edge_list_edge_types_column"
543 ),
544 default_edge_type=graph_arguments.get(
545 "default_edge_type"
546 ),
547 original_weights_column_number=graph_arguments.get(
548 "weights_column_number"
549 ),
550 original_weights_column=graph_arguments.get(
551 "weights_column"
552 ),
553 default_weight=graph_arguments.get(
554 "default_weight"
555 ),
556 original_edge_list_numeric_node_ids=graph_arguments.get(
557 "edge_list_numeric_node_ids"
558 ),
559 skip_weights_if_unavailable=graph_arguments.get(
560 "skip_weights_if_unavailable"
561 ),
562 skip_edge_types_if_unavailable=graph_arguments.get(
563 "skip_edge_types_if_unavailable"
564 ),
565 edge_list_comment_symbol=graph_arguments.get(
566 "edge_list_comment_symbol"
567 ),
568 edge_list_max_rows_number=graph_arguments.get(
569 "edge_list_max_rows_number"
570 ),
571 edge_list_rows_to_skip=graph_arguments.get(
572 "edge_list_rows_to_skip"
573 ),
574 load_edge_list_in_parallel=True,
575 remove_chevrons=graph_arguments.get(
576 "remove_chevrons"
577 ),
578 remove_spaces=graph_arguments.get(
579 "remove_spaces"
580 ),
581 edges_number=graph_arguments.get("edges_number"),
582 target_edge_path=target_edge_path,
583 target_edge_list_separator='\t',
584 sort_temporary_directory=self._sort_tmp_dir,
585 directed=self._directed,
586 verbose=self._verbose > 0,
587 name=self._name,
588 )
589 except Exception as e:
ValueError: Cannot open the file at graphs/kghub/KGIDG/20220722/KG-IDG/merged-kg_nodes.tsv
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
Input In [32], in <cell line: 2>()
1 from grape.datasets.kghub import KGIDG
----> 2 g = KGIDG(version='20220722')
File ~/anaconda3/lib/python3.9/site-packages/ensmallen/datasets/kghub.py:159, in KGIDG(directed, preprocess, bioregistry, load_nodes, load_node_types, load_edge_types, load_edge_weights, auto_enable_tradeoffs, sort_tmp_dir, verbose, ring_bell, cache, cache_path, cache_sys_var, version, **kwargs)
95 def KGIDG(
96 directed=False, preprocess="auto", bioregistry=False, load_nodes=True, load_node_types=True,
97 load_edge_types=True, load_edge_weights=True, auto_enable_tradeoffs=True,
98 sort_tmp_dir=None, verbose=2, ring_bell=False, cache=True, cache_path=None,
99 cache_sys_var="GRAPH_CACHE_DIR", version="current", **kwargs
100 ) -> Graph:
101 """Return KG-IDG graph
102
103 Parameters
(...)
157
158 """
--> 159 return RetrievedGraph(
160 "KGIDG", version, "kghub", directed, preprocess, bioregistry, load_nodes,
161 load_node_types, load_edge_types, load_edge_weights, auto_enable_tradeoffs, sort_tmp_dir,
162 verbose, ring_bell, cache, cache_path, cache_sys_var, kwargs
163 )()
File ~/anaconda3/lib/python3.9/site-packages/ensmallen/datasets/graph_retrieval.py:590, in RetrievedGraph.__call__(self)
414 (
415 node_types_number,
416 nodes_number,
(...)
587 name=self._name,
588 )
589 except Exception as e:
--> 590 raise RuntimeError(
591 f"Something went wrong while preprocessing the graph {self._name}, "
592 f"version {self._version}, "
593 f"retrieved from the {self._repository} repository. "
594 "This is NOT the loading step, but a preprocessing step "
595 "that loads remote data from third parties. "
596 "As such there may have been some changes in the remote data "
597 "that may have made them incompatible with the current "
598 "expected parametrization. "
599 "Do open up an issue in the Ensmallen's GitHub repository reporting also the complete"
600 "exception of this error to help us keep the automatic graph retrieval "
601 "in good shape. Thank you!"
602 ) from e
603 # Store the obtained metadata
604 self.store_preprocessed_metadata(
605 node_types_number,
606 nodes_number,
607 edge_types_number,
608 edges_number
609 )
RuntimeError: Something went wrong while preprocessing the graph KGIDG, version 20220722, retrieved from the kghub repository. This is NOT the loading step, but a preprocessing step that loads remote data from third parties. As such there may have been some changes in the remote data that may have made them incompatible with the current expected parametrization. Do open up an issue in the Ensmallen's GitHub repository reporting also the completeexception of this error to help us keep the automatic graph retrieval in good shape. Thank you!
Trying to install grape version 0.2.2 causes an error:
System: Windows 10, 64bit, Intel
Python Version 3.11
pip install --no-cache-dir grape==0.2.2
ERROR: Could not find a version that satisfies the requirement ensmallen>=0.8.64 (from grape) (from versions: 0.0.1, 0.6.0, 0.6.1, 0.6.2, 0.6.3, 0.6.4, 0.6.5, 0.6.6, 0.7.0.dev19, 0.8.0, 0.8.1, 0.8.2, 0.8.25, 0.8.26, 0.8.27, 0.8.28, 0.8.29, 0.8.36, 0.8.42, 0.8.43, 0.8.44)
ERROR: No matching distribution found for ensmallen>=0.8.64
Any input if I am doing something wrong here?
Thank you in advance for your help!
Hi,
I see references to node ontologies and predicted node ontologies in the code, and as a result of GraphVisualizer.fit_and_plot_all. Is there a way to set ontological categories or other node attributes after reading in a graph so they'd show up when plot_node_ontologies is called, or something similar? If these are predicted, how is that done?
Thanks! Hope it's not too obvious a question to answer.
Could support for saving classifier models please be added?
This came up while meeting with @LucaCappelletti94 recently but it's become relevant again in the course of updating neat-ml
to use grape
classifiers.
Training classifiers isn't a major time commitment, but on our neat
runs we've separated the process of training+testing vs. applying classifiers, so being unable to save or at least pickle the classifier object means we need to redo training for each model.
Hi, After I installed grape 0.2.2 version, I can use the Graph.from_pd method by referring to the example, but I cannot find a detailed description of this method in the api document. It would be really great if the documentation could be updated.
Thanks and best wishes!
Hi Team, did some experimentation and found this issue when trying different tensorflow based embedding approaches:
from grape.embedders import StructuredEmbeddingTensorFlow
embedding = StructuredEmbeddingTensorFlow().fit_transform(graph)
produces error:
ValueError: We have found an useless method in the class StubClass, implementing method Structured Embedding from library TensorFlow and task Node Embedding. It does not make sense to implement the `requires_positive_edge_weights` method when the `can_use_edge_weights` always returns False, as it is already handled in the root abstract model class.
Anything I am doing wrong?
Thanks!
Hello,
I am sorry to annoy you with that, a macOS was not my first choice to work on, but here I am. So, technically, pip install grape do download and compile.
But, anytime I want to import it, it begins this warning :
UserWarning: Ensmallen is compiled for the Intel Haswell architecture (2013).On the current machine, the flags '['avx2', 'bmi2', 'popcnt']' are required but '{'avx2', 'bmi2'}' are not available. The library will use a slower but more compatible version (Intel Core2 2006).
And then, the kernel always die instead of going on.
As you may know, apple M1 does not support AVX/2 installation and I know this is beyond reach.
But do you have any clue on how I could still install grape locally?
Thanks
Hi Team,
thank you for the awesome library!
I am trying to import a very basic dataset for a POC and struggling with the from_csv method.
I want to construct a graph using my own data:
pl.DataFrame({
'source': ['A', 'A', 'A', 'A', 'A', 'F', 'F', 'F', 'A', 'F'],
'destination': ['B', 'C', 'D', 'E' , 'F', 'G', 'H', 'I', 'J', 'J'],
}
).write_csv('edges.csv')
pl.DataFrame({
'node_type': ['link', 'sat', 'sat', 'sat' , 'sat', 'link', 'sat', 'sat', 'sat', 'sat'],
}
).write_csv('node_types.csv')
pl.DataFrame({
'node_name': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']}
).write_csv('node_names.csv')
Resulting in three csv files with a header each.
I am then constructing the graph using the following snippet:
graph = Graph.from_csv(
#Edges
edge_path="edges.csv",
sources_column="source",
destinations_column="destination",
edge_list_header=True,
#Nodes
node_path = "node_names.csv",
nodes_column = "node_name",
node_list_header = True,
#Node Types
node_type_path = "node_types.csv",
node_types_column = "node_type",
node_type_list_header = True,
skip_node_types_if_unavailable = False,
directed = False
)
When I run graph.get_node_type_names()
I get the error:
ValueError Traceback (most recent call last)
Cell In [6], line 1
----> 1 graph.get_unique_node_type_ids()
ValueError: The current graph instance does not have node types.
Anything I am doing wrong?
Thanks for the time!
While updating NEAT to use the most recent grape release, @justaddcoffee and @hrshdhgd and I took a look at what we're using to generate node embeddings based on pretrained word embeddings like BERT etc. : https://github.com/Knowledge-Graph-Hub/NEAT/blob/main/neat/graph_embedding/graph_embedding.py
We know we can run something like get_okapi_tfidf_weighted_textual_embedding()
on a graph, but is there a more "on demand" way to run this in grape now for an arbitrary graph?
Hi, first of all, thanks for making such an amazing graph embedding resource!
I'm wondering whether you can add some descriptions in the README clarifying that this repo is a thin wrapper of the two core packages embiggen and ensmallen and add links accordingly. I was a bit confused for a few minutes trying to find the source code and only came to realize it wraps the two libraries after looking at __init__.py
.
In another issue that may have something to do with our aging build server:
When we import grape
in this environment (see info below), we get only Illegal instruction (core dumped)
.
cpuinfo
output:
processor : 23
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
stepping : 2
microcode : 0x1f
cpu MHz : 1599.987
cache size : 12288 KB
physical id : 1
siblings : 12
core id : 10
cpu cores : 6
apicid : 53
initial apicid : 53
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 6133.21
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
Opening an issue as per our discussion @LucaCappelletti94:
The edge prediction evaluation pipeline could benefit from a parameter such as node_features_names so that if a user passes in features without obvious names (e.g., the 2d array text embedding in node_features = [2d_array_text_embeddings, kg_embedding_function()]), the pipeline will name the different text embedding method(s) used. This would be similar to how it names the KG embedding functions based on the function names and saves them in a column.
In the latest version (0.1.24), the link in the README for "manually compile Ensmallen" (https://github.com/AnacletoLAB/ensmallen/blob/master/bindings/python/README.md) is broken
Hi.
I noticed the performance metrics are not identical when using predict_proba_bipartite_graph_from_edge_node_types, when I swap the source and destination nodes. The graph used as input is an undirected graph, which I would expect would yield similar predictions for the same edge type regardless of which is source and destination nodes. Is this behavior intentional?
Below are the version of the software I am running currently:
grape==0.1.17
embiggen==0.11.27
ensmallen==0.8.14
and multi-layer network?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.