Giter Club home page Giter Club logo

ehab-abdelhamid / grami Goto Github PK

View Code? Open in Web Editor NEW
104.0 8.0 39.0 6.51 MB

GraMi is a novel framework for frequent subgraph mining in a single large graph, GraMi outperforms existing techniques by 2 orders of magnitudes. GraMi supports finding frequent subgraphs as well as frequent patterns, Compared to subgraphs, patterns offer a more powerful version of matching that captures transitive interactions between graph nodes (like friend of a friend) which are very common in modern applications. Also, GraMi supports user-defined structural and semantic constraints over the results, as well as approximate results. For more details, check our paper: Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph. PVLDB, 7(7):517-528, 2014.

License: Other

Java 100.00%
frequent-patterns frequent-subgraph-mining graph-mining graph

grami's People

Contributors

ehab-abdelhamid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grami's Issues

interpreting output

Hi,
This is not really an issue... more of a question... I would really appreciate if you could briefly respond.

Could you please explain how to interpret the output.txt? I ran grami on test2.lg:
./grami -f test2.lg -s 3 -t 1 -p 0
what I got was this:

0.073
3
0:
v 0 1
v 1 2
v 2 3
e 0 1 5
e 1 2 6
1:
v 0 1
v 1 2
e 0 1 5
2:
v 0 2
v 1 3
e 0 1 6

Could you please briefly explain what each value means? For example, I am not sure what is "v 0 2". There was no such node in the graph...

Thank you.

Cannot find AMI on AWS

When I search for the public image "ami-90fc15f8", no matching images are found. Does the image still exist under a new ID?

The frequency of each subgraph in Output.txt

Hello, would u please tell me about where's the var counts each subgraph's frequency in ur code of 'GRAMI_UNDIRECTED_SUBGRAPH', I want to know the frequency of these subgraphs in the Output file. Thank u.

build script produces warnings and grami script cannot be run correctly

Hello, I had tried to run GraMi on Windows, and after manually compiling the files into their bin folders (because I couldn't get the build script to work), I couldn't get the grami script to work.
I figured they were Linux shell scripts. Thus, I have switched to a Linux environment (Ubuntu 18.04.3), and now the build script can be run but it produces the following warnings:
Note: Some input files use unchecked or unsafe operations. \nNote: Recompile with -Xlint:unchecked for details.
I am unsure about whether these messages should appear, but I believe the files did compile with no errors.
However, when I run the grami script, I receive the following result:
Dataset: ./grami: line 92: [: -eq: unary operator expected [UNDIRECTED] ./grami: line 99: [: -eq: unary operator expected Frequent Subgraph Mining No constraints on labels Minimum frequency: ./grami: line 117: [: -eq: unary operator expected ./grami: line 122: [: -eq: unary operator expected ./grami: line 127: [: -eq: unary operator expected ./grami: line 132: [: -eq: unary operator expected Starting GraMi ... Error: Could not find or load main class Dijkstra.main GraMi Finished.
I am using Java 1.8.0_222.
Could you please help me solve this issue?

Multiple labels per node/edge

The proposed methods also support directed graphs and multiple labels per node/edge.
Does this software support this feature? If not so, is it possible to patch it via software?
I would be appreciated if you had an idea on how to change the software..

Thank you.

Readme.txt examples don't work 'Error: Could not find or load main class Dijkstra.main'

I followed the readme.txt and tried to run the examples. None of them worked due to 'Error: Could not find or load main class Dijkstra.main'. I can see that the main files are there but I suppose they are not being referenced correctly by grami executable?

Genghis:GraMi-master user$ ./grami -f citeseer.lg -s 160 -t 1 -p 1 -d 200
Dataset: citeseer.lg
[DIRECTED]
Frequenct Pattern Mining, with minimum distance: 200
No constraints on labels
Minimum frequency: 160
Starting GraMi ...
Error: Could not find or load main class Dijkstra.main
GraMi Finished.

How to perform frequent pattern mining

In frequent pattern mining, i think need to input a subgraph pattern(P), a single large graph(G) and a frequency threshold to find the number of occurrences of the subgraph P in G.
Question 2 in the GraMi paper.

So in the GraMi framework, how to input subgraph pattern(P)?

OutOfMemoryError

Hi,
I run code with my own graph which consist of 30K nodes(all labels are 1), and about 600K edges(most labels are different each other).

However, when I run the code about 5~6 hours, it occurs OutOfMemoryError exception.
This happens even I set the initial size of heap memory in JVM as 40GB .

Do you know why exception occurs and solutions?

GraMi seems too slow for my dataset.

I am trying to do frequent subgraph mining on my dataset, there are 24577 nodes, and almost 90k edges.
I set the frequency threshold to 2000, but it takes a long time.
Q:

  1. is there any argument to set the time-up value?
  2. how can I know the progress of mining?

Approximate mining

This is a question not an issue. I run GraMi to mine frequent subgraphs in undirected graph and I always run out of memory, so I thought of using approximate mining. However, I do not understand quite well the command in the example:
Find frequent subgraphs in the "mico" undirected graph, with minimum frequency = 14000 and approximation: ./grami -f mico.lg -s 9340 -t 0 -p 0 -approxA 0.0002 -approxB=0
if minimum frequency = 14000 then why -s 9340 and not -s 14000?
approxA 0.0002 isn't that very low? a good approximation will have alpha close to 1, something like 0.7 or 0.8, right?

About input file doubts

I have two questions, I’m glad to have your replay to answer my doubts:
1, for graph input file, each vertex has its label, is this label has already been DFS encoded or it’s just a manual define label to set different vertex has different label, not encoding for it?
2, for graph input file, do you have any parse scripts to parse related graph?

questions about the node label and the edge label?

Dear author, I have several questions about the node label and the edge label.

  1. Node labels typically represent some attribute or classification of a node, GraMi will not only look for substructures that are not only structurally identical but also of the same label, is that right?
  2. Does the edge label do the same thing?

Run on other dataset

Great work! But I wonder how can I run GraMi on my own dataset instead of the ones given in Dataset folder? There're some details that I do not understand:

  1. What is the meaning of the first line, i.e. # t 1 in mico.lg

  2. Are the edges directed or not? If there is an undirected edge between 0 and 1, and there exists e 0 1 0 in .lg file, do I need to add e 1 0 0 as well?

Thank you for your prompt rely.

times up message

when I run GramMi with ./grami -f mico.lg -s 9340 -t 0 -p 1 -approxA 0.002 -approxB=0, sometimes it displays time's up! .

What is it doing? I see code like this:

timer.schedule(new StopTask(), 5*1000);

What's it doing?

Cycle sub pattern exist in the result

Hi, I use this tool to figure out the sub graphs in a Directed Acyclic Graph(DAG), but there are cycle existing in the result. I don't know why this happened. such as below:

-------------------------Looking into frequency of: v 0 15
v 1 4
e 0 1 1
e 1 0 1

Freq: 10

installation on linux fails b/c bin directory does not exist

Hello Ehab, I forked your project, downloaded it, installed, but the install fails b/c the bin folder doesn't exist and I had to create it, I wasn't sure if this is by design or a minor issue. I am interested in your work. Specifically, I'm interested in not only learning what the most frequent subgraphs are, but print out the embeddings: print the specific node #s & edges that countribute to the most FS.

what does the paper mean by 'a single large graph'?

hi, Ehab.You do a excellent job in this work.I appreciate that.Now i just have a confusion when i don't know what kind of subgraphs will be mined.Here i give some examples.
ie.1

t 1

v 0 1
v 1 2
v 2 1
v 3 2
e 0 1 1
e 2 3 1

(This got a right result)
0.122
1
0:
v 0 1
v 1 2
e 0 1 1

ie.2

t 1

v 0 1
v 1 1
v 2 2
e 0 2 1
e 1 2 1
(And this got nothing)

I expect the result looks like
v 0 1
v 1 2
e 0 1 1.

Could you help me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.