aplbrain / grand Goto Github PK

Your favorite Python graph libraries, scalable and interoperable. Graph databases in memory, and familiar graph APIs for cloud databases.

License: Apache License 2.0

Python 100.00%

graph grand neptune serverless dynamodb networkx networkit sql network-analysis graph-theory

grand's Issues

Multi-edges between two nodes

Hello, I use the new version 0.5.1 of grand-graph:

from grand import Graph
from grandcypher import GrandCypher
from grand.backends._sqlbackend import SQLBackend

backend=SQLBackend(db_url="sqlite:///demo2.db")
G = Graph(backend=backend)

G.nx.add_node("spranger", type="Person")
G.nx.add_node("meier", type="Person")
G.nx.add_node("krause", type="Person")
G.nx.add_node("Berlin", type="City")
G.nx.add_node("Paris", type="City")
G.nx.add_node("London", type="City")

G.nx.add_edge("spranger", "Paris", type="LIVES_IN")
G.nx.add_edge("krause", "Berlin", type="LIVES_IN")
G.nx.add_edge("meier", "London", type="LIVES_IN")
G.nx.add_edge("spranger", "Berlin", type="BORN_IN")
G.nx.add_edge("krause", "Berlin", type="BORN_IN")
G.nx.add_edge("meier", "Berlin", type="BORN_IN")


result1 = GrandCypher(G.nx).run("""
MATCH (n)-[r]->(c)
WHERE
    n.type == "Person"
    and
    c.type == "City"
    
RETURN n, r, c    
""")

from lark.lexer import Token
n = result1[Token('CNAME', 'n')]
r = result1[Token('CNAME', 'r')]
c = result1[Token('CNAME', 'c')]

for i in range(len(n)):
    print(f"{n[i]} - {r[i].get('type')} -> {c[i]}")

backend.commit()
backend.close()

results in

spranger - BORN_IN -> Berlin
spranger - LIVES_IN -> Paris
meier - BORN_IN -> Berlin
meier - LIVES_IN -> London
krause - BORN_IN -> Berlin-

The "krause - LIVES_IN -> Berlin" relation is not stored (as any second relation between two same nodes) .
This might be due to the cause that our "G.nx" doesn't cope with multigraphs.
In plain grandcypher I can query "Match (p:Person)" . How do I do this in my cypher query above?
Would it be a good idea to have the backend and the graph layer (e.g. netwrokx) completely transparent and just run Cypher queries, also for creating nodes and relations?

Kind Regards.
Steffen, the graphologist

Small wiki doc issue with SQLBackend

In the wiki this:

import grand
from grand.backends import SQLBackend

grand.Graph(backend=SQLBackend("sqlite:///my-file.db"))

should be this:

import grand
from grand.backends import SQLBackend

grand.Graph(backend=SQLBackend(db_url="sqlite:///my-file.db"))

since the SQLBackend constructor only has kwargs, and no positional args.
I am unable to edit the wiki, hence the issue.

best

Add CypherDialect

https://github.com/aplbrain/grand-cypher/

Add default dialect

In order to have drop-in compatibility with networkx, we could make the default dialect to be nx, such that graph.nx.{method} == graph.{method}. Duck test with nx digraph.

Add pandas/dask edgelist backend

Use compound PK to get_edge by ID

grand/grand/backends/sqlbackend.py

Lines 278 to 282 in a2d350a

 self._edge_table.select().where( 

 and_( 

 (self._edge_table.c[self._edge_source_key] == u), 

 (self._edge_table.c[self._edge_target_key] == v), 

 )

Migrate to proproject.toml (PEP 518)

This repo is still on the "old" setup.py instead of the new pyproject.toml; would be good for version pinning to ugprade!

Allow user to specify source/target column names in sqlbackend

Optionally support TTL cache on Backend function calls

Especially when interrogating larger networks, values might not change enough in between function calls to make re-calling out to a database worthwhile. In these cases, it would be advantageous to cache results either for a certain amount of time or for the lifespan of the Backend.

Add rustworkx backend

https://github.com/Qiskit/rustworkx

Will need support for vanity IDs like we did in #6

Add DynamoDB batch insertion retries and error management

As per @movestill's comment on #1, it's no good to just assume DynamoDB batch writes will succeed!

#1 (comment)

wiki page entry on the labels convention

Improve SQLBackend degree performance

Can use something akin to:

out_degree

SELECT 
    source,
    COUNT(DISTINCT source) as source_count
FROM {G.backend._edge_table_name}
GROUP BY
    source

degree (undirected)

SELECT 
    vert, COUNT(DISTINCT vert) as vert_count 
FROM 
    (
        SELECT source as vert FROM {G.backend._edge_table_name} 
        UNION ALL
        SELECT target FROM {G.backend._edge_table_name}
    )
GROUP BY vert

Implement EdgeAtlas interface for Graph.nx#edges

Auto-detect graph objects (and guess Backend) from URI

To enable this syntax:

from grand import Graph

g1 = Graph("https://example.com/net.graphml")

g2 = Graph("~/graph.sqlite3")

g3 = Graph("s3://example/my-open-cypher")

NetworkXDialect does not work correctly with networkx.DiGraph

Hi @j6k4m8,

There is an issue with grand.Graph and grand.dialects.NetworkXDialect.

Since NetworkXDialect is inherited from networkx.Graph, there happen to be discrepancies between grand.dialects.NetworkXDialect and networkx.Digraph which is popagated back to grand.Graph. One of them is the networkx.Graph.edges returns EdgeView while networkx.Digraph.edges returns OutEdgeView.

Below is one of the test to replicate the issue

def test_nx_edges(self):
        G = Graph(directed=True).nx
        H = nx.DiGraph()
        G.add_edge("1", "2")
        G.add_edge("2", "1")   # <<< this won't work with EdgeView for G
        G.add_edge("1", "3")
        H.add_edge("1", "2")
        H.add_edge("2", "1")   # <<< OutEdgeView returns this for H
        H.add_edge("1", "3")
        self.assertEqual(dict(G.edges), dict(H.edges))
        self.assertEqual(dict(G.edges()), dict(H.edges()))
        self.assertEqual(list(G.edges["1", "2"]), list(H.edges["1", "2"]))

The result is

    def test_nx_edges(self):
        G = Graph(directed=True).nx
        H = nx.DiGraph()
        # H = nx.Graph()
        G.add_edge("1", "2")
        G.add_edge("2", "1")
        G.add_edge("1", "3")
        H.add_edge("1", "2")
        H.add_edge("2", "1")
        H.add_edge("1", "3")
>       self.assertEqual(dict(G.edges), dict(H.edges))
E       AssertionError: {('1', '2'): {}, ('1', '3'): {}} != {('1', '2'): {}, ('1', '3'): {}, ('2', '1'): {}}
E       - {('1', '2'): {}, ('1', '3'): {}}
E       + {('1', '2'): {}, ('1', '3'): {}, ('2', '1'): {}}
E       ?                              ++++++++++++++++

Add a performance tracker like codspeed or asv

...so that we can track performance and regression over code changes!

Add fast backend short-circuit for len(edges)

Right now it's very slow to get the count of edges for certain backends. Adding a query for this directly (rather than requiring enumeration of all edges) would make some functions (such as nx.density) MUCH faster.

SQLBackend should allow multiple writes to nx.add_node

Calling add_node twice in networkx updates the existing node; it should do the same in graph.Graph#nx.add_node.

Question: edge attributes

My situation is that I use networkx and have access to a postgres db. I find networkx to be quite slow and thought of using some of the alternatives esp. networkit. The challenge I have is with attributes ie. networkit seems to allow only a single numerical 'weight' for edge attributes. My graphs need pretty rich edge attributes and networkx accommodates those. So:

would grand allow me to use networkx syntax/features and edge attribute functionality with networkit eg. filter edges on rich attribute set but retain algorithms running at higher speeds?
you mention grand interacting with dynamo db. I'm not sure I understand that. Is grand using the db to store the graph structure and if so, could it do that with a postgres db? Note: I had a look at this and it seems like this is what I had in mind but when I read your readme.

Tightly pinned versions in requirements

Hi, I was trying to use the project. But the tight pins on the requirements, seems to be hindering me from installing it in my virtualenv.

https://github.com/aplbrain/grand/blob/master/setup.py#L18

Can we reduce these constrains to >= so that it's clearer what the minimum requirements are ?

Some notes:

I still need to support pandas 0.25
Support for lower numpy versions like 1.11 is generally preferred as conda will use that by default
SQLAlchemy 1.4 is what I'm using right now - but a bunch of folks are using 1.3 also. There are some backward incompatible changes between them
network - I try to use the latest. But am open to trying older versions if needed

no nodes persisted in sqlite SQLBackend

If I ran
G = Graph(backend=SQLBackend(db_url="sqlite:///demo.db")) G.nx.add_node("A", foo="bar")
demo.db contains tables (grand_Nodes, grand_Edges), but those are empty, no data from the nodes.

Neo4j backend?

Is there a userbase and use-case for this?

Use SQLbackend, how to connect to existing SQL DB file and read the graph into memory

As showed in the title.

The 'graph' attribute isn't present in `G.nx`

Via @MikeB2019x:

@j6k4m8 screenshare not required at the moment but I may take you up on that in the future. So trying to write out a graphml as suggested throws an error (stack trace below). If I compare a networkx graph's attributes and those of G.nx, you'll see: [...,'edges', 'get_edge_data','graph','graph_attr_dict_factory','has_edge','has_node'...] for the former compared to [...'edges','get_edge_data','graph_attr_dict_factory','has_edge','has_node',...] for the latter. That is, the 'graph' attribute isn't present in G.nx. I'm guessing that's intentional?

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [35], in <module>
      1 graphml_file_name = 'graphtools.graphml'
----> 3 nx.write_graphml(G.nx, graphml_file_name)

File <class 'networkx.utils.decorators.argmap'> compilation 17:5, in argmap_write_graphml_lxml_13(G, path, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
      3 from contextlib import contextmanager
      4 from pathlib import Path
----> 5 import warnings
      7 import networkx as nx
      8 from networkx.utils import create_random_state, create_py_random_state

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:171, in write_graphml_lxml(G, path, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
    160 except ImportError:
    161     return write_graphml_xml(
    162         G,
    163         path,
   (...)
    168         edge_id_from_attribute,
    169     )
--> 171 writer = GraphMLWriterLxml(
    172     path,
    173     graph=G,
    174     encoding=encoding,
    175     prettyprint=prettyprint,
    176     infer_numeric_types=infer_numeric_types,
    177     named_key_ids=named_key_ids,
    178     edge_id_from_attribute=edge_id_from_attribute,
    179 )
    180 writer.dump()

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:729, in GraphMLWriterLxml.__init__(self, path, graph, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
    726 self.attribute_types = defaultdict(set)
    728 if graph is not None:
--> 729     self.add_graph_element(graph)

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:740, in GraphMLWriterLxml.add_graph_element(self, G)
    737 else:
    738     default_edge_type = "undirected"
--> 740 graphid = G.graph.pop("id", None)
    741 if graphid is None:
    742     graph_element = self._xml.element("graph", edgedefault=default_edge_type)

AttributeError: 'NetworkXDialect' object has no attribute 'graph'

Originally posted by @MikeB2019x in #35 (comment)

SQLBackend all_edges_as_iterable accesses _node_table instead of _edge_table resulting in KeyError: 'Source'

Just a quick find which results in:

  File ".local/lib/python3.10/site-packages/grand/backends/_sqlbackend.py", line 309, in all_edges_as_iterable
    self._node_table.c[self._edge_source_key],
  File ".local/lib/python3.10/site-packages/sqlalchemy/sql/base.py", line 1608, in __getitem__
    return self._index[key][1]
KeyError: 'Source'

This:
https://github.com/aplbrain/grand/blob/e71be46259fde37136cff0bdad3f998787f92cf3/grand/backends/_sqlbackend.py#L297C9-L317

should access self._edge_table instead of self._node_table.

best

	self._edge_table.select().where(
	and_(
	(self._edge_table.c[self._edge_source_key] == u),
	(self._edge_table.c[self._edge_target_key] == v),
	)

aplbrain / grand Goto Github PK

grand's Issues

Recommend Projects

Recommend Topics

Recommend Org