Giter Club home page Giter Club logo

grand's Introduction

Codecov

Graph toolkit interoperability and scalability for Python

Installation

pip install grand-graph

Example use-cases

  • Write NetworkX commands to analyze true-serverless graph databases using DynamoDB*
  • Query a host graph in SQL for subgraph isomorphisms with DotMotif
  • Write iGraph code to construct a graph, and then play with it in Networkit
  • Attach node and edge attributes to Networkit or IGraph graphs

* Neptune is not true-serverless.

Why it's a big deal

Grand is a Rosetta Stone of graph technologies. A Grand graph has a "Backend," which handles the implementation-details of talking to data on disk (or in the cloud), and an "Dialect", which is your preferred way of talking to a graph.

For example, here's how you make a graph that is persisted in DynamoDB (the "Backend") but that you can talk to as though it's a networkx.DiGraph (the "Dialect"):

import grand

G = grand.Graph(backend=grand.DynamoDBBackend())

G.nx.add_node("Jordan", type="Person")
G.nx.add_node("DotMotif", type="Project")

G.nx.add_edge("Jordan", "DotMotif", type="Created")

assert len(G.nx.edges()) == 1
assert len(G.nx.nodes()) == 2

It doesn't stop there. If you like the way IGraph handles anonymous node insertion (ugh) but you want to handle the graph using regular NetworkX syntax, use a IGraphDialect and then switch to a NetworkXDialect halfway through:

import grand

G = grand.Graph()

# Start in igraph:
G.igraph.add_vertices(5)

# A little bit of networkit:
G.networkit.addNode()

# And switch to networkx:
assert len(G.nx.nodes()) == 6

# And back to igraph!
assert len(G.igraph.vs) == 6

You should be able to use the "dialect" objects the same way you'd use a real graph from the constituent libraries. For example, here is a NetworkX algorithm running on NetworkX graphs alongside Grand graphs:

import networkx as nx

nx.algorithms.isomorphism.GraphMatcher(networkxGraph, grandGraph.nx)

Here is an example of using Networkit, a highly performant graph library, and attaching node/edge attributes, which are not supported by the library by default:

import grand
from grand.backends.networkit import NetworkitBackend

G = grand.Graph(backend=NetworkitBackend())

G.nx.add_node("Jordan", type="Person")
G.nx.add_node("Grand", type="Software")
G.nx.add_edge("Jordan", "Grand", weight=1)

print(G.nx.edges(data=True)) # contains attributes, even though graph is stored in networkit

Current Support

โœ… = Fully Implemented ๐Ÿค” = In Progress ๐Ÿ”ด = Unsupported
Dialect Description & Notes Status
IGraphDialect Python-IGraph interface โœ…
NetworkXDialect NetworkX-like interface โœ…
NetworkitDialect Networkit-like interface โœ…
Backend Description & Notes Status
DataFrameBackend Stored in pandas-like tables โœ…
DynamoDBBackend Edge/node tables in DynamoDB โœ…
GremlinBackend For Gremlin datastores โœ…
IGraphBackend An IGraph graph, in memory โœ…
NetworkitBackend A Networkit graph, in memory โœ…
NetworkXBackend A NetworkX graph, in memory โœ…
SQLBackend Two SQL-queryable tables โœ…

You can read more about usage and learn about backends and dialects in the wiki.

Citing

If this tool is helpful to your research, please consider citing it with:

# https://doi.org/10.1038/s41598-021-91025-5
@article{Matelsky_Motifs_2021,
    title={{DotMotif: an open-source tool for connectome subgraph isomorphism search and graph queries}},
    volume={11},
    ISSN={2045-2322},
    url={http://dx.doi.org/10.1038/s41598-021-91025-5},
    DOI={10.1038/s41598-021-91025-5},
    number={1},
    journal={Scientific Reports},
    publisher={Springer Science and Business Media LLC},
    author={Matelsky, Jordan K. and Reilly, Elizabeth P. and Johnson, Erik C. and Stiso, Jennifer and Bassett, Danielle S. and Wester, Brock A. and Gray-Roncal, William},
    year={2021},
    month={Jun}
}

Made with ๐Ÿ’™ at JHU APL

grand's People

Contributors

acthecoder23 avatar davidmezzetti avatar j6k4m8 avatar raphtor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grand's Issues

Small wiki doc issue with SQLBackend

In the wiki this:

import grand
from grand.backends import SQLBackend

grand.Graph(backend=SQLBackend("sqlite:///my-file.db"))

should be this:

import grand
from grand.backends import SQLBackend

grand.Graph(backend=SQLBackend(db_url="sqlite:///my-file.db"))

since the SQLBackend constructor only has kwargs, and no positional args.
I am unable to edit the wiki, hence the issue.

best

Question: edge attributes

My situation is that I use networkx and have access to a postgres db. I find networkx to be quite slow and thought of using some of the alternatives esp. networkit. The challenge I have is with attributes ie. networkit seems to allow only a single numerical 'weight' for edge attributes. My graphs need pretty rich edge attributes and networkx accommodates those. So:

  1. would grand allow me to use networkx syntax/features and edge attribute functionality with networkit eg. filter edges on rich attribute set but retain algorithms running at higher speeds?

  2. you mention grand interacting with dynamo db. I'm not sure I understand that. Is grand using the db to store the graph structure and if so, could it do that with a postgres db? Note: I had a look at this and it seems like this is what I had in mind but when I read your readme.

Tightly pinned versions in requirements

Hi, I was trying to use the project. But the tight pins on the requirements, seems to be hindering me from installing it in my virtualenv.

https://github.com/aplbrain/grand/blob/master/setup.py#L18

Can we reduce these constrains to >= so that it's clearer what the minimum requirements are ?

Some notes:

  • I still need to support pandas 0.25
  • Support for lower numpy versions like 1.11 is generally preferred as conda will use that by default
  • SQLAlchemy 1.4 is what I'm using right now - but a bunch of folks are using 1.3 also. There are some backward incompatible changes between them
  • network - I try to use the latest. But am open to trying older versions if needed

Optionally support TTL cache on Backend function calls

Especially when interrogating larger networks, values might not change enough in between function calls to make re-calling out to a database worthwhile. In these cases, it would be advantageous to cache results either for a certain amount of time or for the lifespan of the Backend.

Multi-edges between two nodes

Hello, I use the new version 0.5.1 of grand-graph:

from grand import Graph
from grandcypher import GrandCypher
from grand.backends._sqlbackend import SQLBackend

backend=SQLBackend(db_url="sqlite:///demo2.db")
G = Graph(backend=backend)

G.nx.add_node("spranger", type="Person")
G.nx.add_node("meier", type="Person")
G.nx.add_node("krause", type="Person")
G.nx.add_node("Berlin", type="City")
G.nx.add_node("Paris", type="City")
G.nx.add_node("London", type="City")

G.nx.add_edge("spranger", "Paris", type="LIVES_IN")
G.nx.add_edge("krause", "Berlin", type="LIVES_IN")
G.nx.add_edge("meier", "London", type="LIVES_IN")
G.nx.add_edge("spranger", "Berlin", type="BORN_IN")
G.nx.add_edge("krause", "Berlin", type="BORN_IN")
G.nx.add_edge("meier", "Berlin", type="BORN_IN")


result1 = GrandCypher(G.nx).run("""
MATCH (n)-[r]->(c)
WHERE
    n.type == "Person"
    and
    c.type == "City"
    
RETURN n, r, c    
""")

from lark.lexer import Token
n = result1[Token('CNAME', 'n')]
r = result1[Token('CNAME', 'r')]
c = result1[Token('CNAME', 'c')]

for i in range(len(n)):
    print(f"{n[i]} - {r[i].get('type')} -> {c[i]}")

backend.commit()
backend.close()

results in

  • spranger - BORN_IN -> Berlin
  • spranger - LIVES_IN -> Paris
  • meier - BORN_IN -> Berlin
  • meier - LIVES_IN -> London
  • krause - BORN_IN -> Berlin-
  1. The "krause - LIVES_IN -> Berlin" relation is not stored (as any second relation between two same nodes) .
    This might be due to the cause that our "G.nx" doesn't cope with multigraphs.
  2. In plain grandcypher I can query "Match (p:Person)" . How do I do this in my cypher query above?
  3. Would it be a good idea to have the backend and the graph layer (e.g. netwrokx) completely transparent and just run Cypher queries, also for creating nodes and relations?

Kind Regards.
Steffen, the graphologist

NetworkXDialect does not work correctly with networkx.DiGraph

Hi @j6k4m8,

There is an issue with grand.Graph and grand.dialects.NetworkXDialect.

Since NetworkXDialect is inherited from networkx.Graph, there happen to be discrepancies between grand.dialects.NetworkXDialect and networkx.Digraph which is popagated back to grand.Graph. One of them is the networkx.Graph.edges returns EdgeView while networkx.Digraph.edges returns OutEdgeView.

Below is one of the test to replicate the issue

def test_nx_edges(self):
        G = Graph(directed=True).nx
        H = nx.DiGraph()
        G.add_edge("1", "2")
        G.add_edge("2", "1")   # <<< this won't work with EdgeView for G
        G.add_edge("1", "3")
        H.add_edge("1", "2")
        H.add_edge("2", "1")   # <<< OutEdgeView returns this for H
        H.add_edge("1", "3")
        self.assertEqual(dict(G.edges), dict(H.edges))
        self.assertEqual(dict(G.edges()), dict(H.edges()))
        self.assertEqual(list(G.edges["1", "2"]), list(H.edges["1", "2"]))

The result is

    def test_nx_edges(self):
        G = Graph(directed=True).nx
        H = nx.DiGraph()
        # H = nx.Graph()
        G.add_edge("1", "2")
        G.add_edge("2", "1")
        G.add_edge("1", "3")
        H.add_edge("1", "2")
        H.add_edge("2", "1")
        H.add_edge("1", "3")
>       self.assertEqual(dict(G.edges), dict(H.edges))
E       AssertionError: {('1', '2'): {}, ('1', '3'): {}} != {('1', '2'): {}, ('1', '3'): {}, ('2', '1'): {}}
E       - {('1', '2'): {}, ('1', '3'): {}}
E       + {('1', '2'): {}, ('1', '3'): {}, ('2', '1'): {}}
E       ?                              ++++++++++++++++

Add fast backend short-circuit for len(edges)

Right now it's very slow to get the count of edges for certain backends. Adding a query for this directly (rather than requiring enumeration of all edges) would make some functions (such as nx.density) MUCH faster.

Add default dialect

In order to have drop-in compatibility with networkx, we could make the default dialect to be nx, such that graph.nx.{method} == graph.{method}. Duck test with nx digraph.

The 'graph' attribute isn't present in `G.nx`

Via @MikeB2019x:

@j6k4m8 screenshare not required at the moment but I may take you up on that in the future. So trying to write out a graphml as suggested throws an error (stack trace below). If I compare a networkx graph's attributes and those of G.nx, you'll see: [...,'edges', 'get_edge_data','graph','graph_attr_dict_factory','has_edge','has_node'...] for the former compared to [...'edges','get_edge_data','graph_attr_dict_factory','has_edge','has_node',...] for the latter. That is, the 'graph' attribute isn't present in G.nx. I'm guessing that's intentional?

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [35], in <module>
      1 graphml_file_name = 'graphtools.graphml'
----> 3 nx.write_graphml(G.nx, graphml_file_name)

File <class 'networkx.utils.decorators.argmap'> compilation 17:5, in argmap_write_graphml_lxml_13(G, path, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
      3 from contextlib import contextmanager
      4 from pathlib import Path
----> 5 import warnings
      7 import networkx as nx
      8 from networkx.utils import create_random_state, create_py_random_state

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:171, in write_graphml_lxml(G, path, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
    160 except ImportError:
    161     return write_graphml_xml(
    162         G,
    163         path,
   (...)
    168         edge_id_from_attribute,
    169     )
--> 171 writer = GraphMLWriterLxml(
    172     path,
    173     graph=G,
    174     encoding=encoding,
    175     prettyprint=prettyprint,
    176     infer_numeric_types=infer_numeric_types,
    177     named_key_ids=named_key_ids,
    178     edge_id_from_attribute=edge_id_from_attribute,
    179 )
    180 writer.dump()

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:729, in GraphMLWriterLxml.__init__(self, path, graph, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
    726 self.attribute_types = defaultdict(set)
    728 if graph is not None:
--> 729     self.add_graph_element(graph)

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:740, in GraphMLWriterLxml.add_graph_element(self, G)
    737 else:
    738     default_edge_type = "undirected"
--> 740 graphid = G.graph.pop("id", None)
    741 if graphid is None:
    742     graph_element = self._xml.element("graph", edgedefault=default_edge_type)

AttributeError: 'NetworkXDialect' object has no attribute 'graph'

Originally posted by @MikeB2019x in #35 (comment)

SQLBackend all_edges_as_iterable accesses _node_table instead of _edge_table resulting in KeyError: 'Source'

Just a quick find which results in:

  File ".local/lib/python3.10/site-packages/grand/backends/_sqlbackend.py", line 309, in all_edges_as_iterable
    self._node_table.c[self._edge_source_key],
  File ".local/lib/python3.10/site-packages/sqlalchemy/sql/base.py", line 1608, in __getitem__
    return self._index[key][1]
KeyError: 'Source'

This:
https://github.com/aplbrain/grand/blob/e71be46259fde37136cff0bdad3f998787f92cf3/grand/backends/_sqlbackend.py#L297C9-L317

should access self._edge_table instead of self._node_table.

best

Improve SQLBackend degree performance

Can use something akin to:

out_degree

SELECT 
    source,
    COUNT(DISTINCT source) as source_count
FROM {G.backend._edge_table_name}
GROUP BY
    source

degree (undirected)

SELECT 
    vert, COUNT(DISTINCT vert) as vert_count 
FROM 
    (
        SELECT source as vert FROM {G.backend._edge_table_name} 
        UNION ALL
        SELECT target FROM {G.backend._edge_table_name}
    )
GROUP BY vert

no nodes persisted in sqlite SQLBackend

If I ran
G = Graph(backend=SQLBackend(db_url="sqlite:///demo.db")) G.nx.add_node("A", foo="bar")
demo.db contains tables (grand_Nodes, grand_Edges), but those are empty, no data from the nodes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.