jshun / ligra Goto Github PK
View Code? Open in Web Editor NEWLigra: A Lightweight Graph Processing Framework for Shared Memory
Home Page: http://jshun.github.io/ligra
License: MIT License
Ligra: A Lightweight Graph Processing Framework for Shared Memory
Home Page: http://jshun.github.io/ligra
License: MIT License
Not sure if this is a known issue - {Bfs, bfs-bitvector, and Components} get a segfault when compiled with -O0 -g
. These applications don't crash when compiled with -O2
. The applications seem to fail consistently for different input graphs.
System Info - debian stretch, g++-6.3, using CILK for parallelization
Update
Made the following observations on further testing:
no_output
edgeMap applications are not affectedBacktrace of segfault
Thread 1 "BFS-Bitvector" received signal SIGSEGV, Segmentation fault.
0x0000555555577b3c in std::_Tuple_impl<0ul, unsigned int, pbbs::empty>::operator=(std::_Tuple_impl<0ul, unsigned int, pbbs::empty>&&) (this=0x0,
__in=<unknown type in /home/vignesh/ligra/apps/BFS-Bitvector, CU 0x0, DIE 0x8f0d>)
at /usr/include/c++/6/tuple:299
299 _M_head(this) = std::forward<_Head>(_M_head(__in));
(gdb) bt
#0 0x0000555555577b3c in std::_Tuple_impl<0ul, unsigned int, pbbs::empty>::operator=(std::_Tuple_impl<0ul, unsigned int, pbbs::empty>&&) (
this=0x0,
__in=<unknown type in /home/vignesh/ligra/apps/BFS-Bitvector, CU 0x0, DIE 0x8f0d>)
at /usr/include/c++/6/tuple:299
#1 0x000055555557274e in std::tuple<unsigned int, pbbs::empty>::operator=(std::tuple<unsigned int, pbbs::empty>&&) (this=0x0,
__in=<unknown type in /home/vignesh/ligra/apps/BFS-Bitvector, CU 0x0, DIE 0x8f23>)
at /usr/include/c++/6/tuple:1178
#2 0x000055555556eadd in auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>)::{lambda(unsigned int, unsigned int, bool)#1}::operator()(unsigned int, unsigned int, bool) const (__closure=0x7fffffffdbb0, ngh=1, offset=0, m=true) at edgeMap_utils.h:47
#3 0x000055555557a28b in decode_uncompressed::decodeOutNghSparse<asymmetricVertex, BFS_F, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>)::{lambda(unsigned int, unsigned int, bool)#1}>(asymmetricVertex, long, unsigned int, BFS_F&, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>)::{lambda(unsigned int, unsigned int, bool)#1}&) (v=0x7fffffffd8e0, i=0, o=0, f=..., g=...)
at vertex.h:66
#4 0x0000555555574a2f in asymmetricVertex::decodeOutNghSparse<BFS_F, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>)::{lambda(unsigned int, unsigned int, bool)#1}>(long, unsigned int, BFS_F&, auto get_emsparse_gen<pbbs::empty, 0>(std::tuple<unsigned int, pbbs::empty>*)::{lambda(unsigned int, unsigned int, bool)#1}&) (this=0x7fffffffd8e0, i=0, o=0, f=..., g=...) at vertex.h:337
#5 0x0000555555563297 in edgeMapSparse<pbbs::empty, asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> () at ligra.h:125
#6 0x00007ffff7bc43b1 in ?? () from /usr/lib/x86_64-linux-gnu/libcilkrts.so.5
#7 0x00007ffff7bc4d4b in ?? () from /usr/lib/x86_64-linux-gnu/libcilkrts.so.5
#8 0x00007ffff7bc50fe in ?? () from /usr/lib/x86_64-linux-gnu/libcilkrts.so.5
#9 0x0000555555571485 in edgeMapSparse<pbbs::empty, asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> (GA=...,
frontierVertices=0x55563c61b720, indices=..., degrees=0x55555579c6e0, m=1, f=..., fl=0) at ligra.h:122
#10 0x000055555556ca46 in edgeMapData<pbbs::empty, asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> (GA=..., vs=..., f=...,
threshold=23149703, fl=@0x7fffffffdf0c: 0) at ligra.h:266
#11 0x000055555556a506 in edgeMap<asymmetricVertex, vertexSubsetDatapbbs::empty, BFS_F> (GA=..., vs=..., f=..., threshold=-1,
fl=@0x7fffffffdf0c: 0) at ligra.h:276
#12 0x0000555555566d63 in Compute (GA=..., P=...) at BFS-Bitvector.C:70
#13 0x00005555555588f4 in main (argc=2, argv=0x7fffffffe448) at ligra.h:510
Is it possible to have a header describing what each file does and which problem does it solve? For example, I only know that Bellman Ford solves the Single source shortest path, because I've implemented Bellman-Ford.
Docstrings for functions would be nice. Right now it's near impossible to tell which function does what and what its return type is. E.g. I don't know where or how to find the result of the Bellman Ford computation. For all I know it might not even compute the Shortest Path correctly
I am trying to convert the Twitter graph to the adjacency list format but it always seems to end in a segfault.
I tried debugging with gdb and it seems to be happening in theutils/graphIO.hpp file when the words are being converted into edges.
Any help would be greatly appreciated.
Lack of header file/ implementation file is bad enough (compilation takes a lot longer, not to mention that C++ wasn't designed that way), but lack of inline qualifiers in the header files, means that there would be redefinitions of all functions as soon as there is need to link two files that include ligra.h.
When I run ./BFS and ./PageRank on a generated graph with 100M nodes, cpu utilization is only around 100%. Is it normal? Shouldnt ligra uses all cpus to compute?
Below is the edgeMap function in https://github.com/jshun/ligra/blob/master/apps/Components.C
inline bool update(uintE s, uintE d){ //Update function writes min ID
uintE origID = IDs[d];
if(IDs[s] < origID) {
IDs[d] = min(origID,IDs[s]);
if(origID == prevIDs[d]) return 1;
}
return 0;
}
Why do we need if(origID == prevIDs[d])
? Is this check always true?
I downloaded the snap file soc-LiveJournal1
Converted it using SNAPtoAdj in /utils
$ ./SNAPtoAdj soc-LiveJournal1.txt lig_socLJ1
Then I tried to Run BFS on this data in the /apps folder
This was the output
$ ./BFS lig_socLJ1
Segmentation fault(core dumped)
However, when I used the -s flag
$ ./BFS -s lig_socLJ1
Running time : 0.571
Running time : 0.518
Running time : 0.511
As you can see, I did not use the SNAPtoAdj utility with the -s flag.
I was wondering, where I could be wrong. Please, help me out.
It looks like you are discarding the residuals of all vertices every round, as opposed to only resetting residuals that are greater than the activation threshold. As a consequence, a vertex that takes more than one round to accumulate enough residual error to become active again, stays inactive, and the algorithm converges in less iterations than it should.
Maybe something like:
struct PR_Vertex_Reset {
double* nghSum;
PR_Vertex_Reset(double* _nghSum) :
nghSum(_nghSum) {}
inline bool operator () (uintE i) {
if (nghSum[i] > epsilon2) nghSum[i] = 0.0; //reset only if we crossed threshold
return 1;
}
};
Please let me know if I am misunderstanding something here.
This is on master branch. Steps to replicate:
BellmanFord.C
to use no_dense
instead of dense_forward
, but don't change the initial threshold of G.m/20
here:Line 79 in 7755d95
Why:
Step 1 passes a threshold of 0 to edgeMap
. This is not a problem when in dense mode, but when in sparse mode, it leads to activating a code path that skips computation of out degrees here:
Line 248 in 7755d95
Later edgeMap tries to deference a nullptr by attempting to run a prefix-sum on the offsets
array, which is pointing to degrees
(now null because of above):
Line 119 in 7755d95
In the example above, threshold
was set to 0 by the application. However, edgeMap can also end up setting it 0 here:
Line 238 in 7755d95
(A threshold of 0 again is not a problem when in dense mode, but skips degrees computation when in sparse mode)
A potential fix on edgeMap
(instead of on the application) would be to do this, which is what I do locally (UPDATE: changed it so that we don't accidentally overwrite user-specified threshold):
if (threshold == -1) {
if (numEdges > 20) {
threshold = numEdges / 20; // default threshold
} else {
// num edges is too small, use 1 as a threshold instead
threshold = 1;
}
}
Julian, let me know if you're ok with this and I'll send it over.
Does BFS cannot traverse all vertices?
while(!Frontier.isEmpty()){ //loop until frontier is empty
vertexSubset output = edgeMap(GA, Frontier, BFS_F(Parents));
Frontier.del();
Frontier = output; //set new frontier
}
[clong@shamrock apps]$ make -j
g++ -std=c++14 -O3 -DBYTERLE -o encoder encoder.C
g++ -std=c++14 -O3 -DBYTERLE -o decoder decoder.C
g++ -std=c++14 -O3 -DBYTERLE -o BFS BFS.C
g++ -std=c++14 -O3 -DBYTERLE -o BC BC.C
g++ -std=c++14 -O3 -DBYTERLE -o BellmanFord BellmanFord.C
g++ -std=c++14 -O3 -DBYTERLE -o Components Components.C
g++: error: unrecognized command line option ‘-std=c++14’
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [encoder] Error 1
make: *** Waiting for unfinished jobs....
make: *** [decoder] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [BFS] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [BC] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [BellmanFord] Error 1
g++: error: unrecognized command line option ‘-std=c++14’
make: *** [Components] Error 1
Adjacency graph provides the largest amount of words with minimum information. The word that would best describe the format would be adjacency list. According to #21 the internal storage type is Compressed Sparse Row, but you'd never guess that from the readMe.
could I also get all the members of returned set S? I want to know what are the index of nodes that belong to the local community of the certain input node. Many thanks!
I compiled SNAPtoAdj with Cilk enabled, and the tool works well for all my graphs, except for friendster.
It ends up with segmentation fault, if I tell the tool to convert the friendster graph into a symmetric one.
The command is like
./SNAPtoAdj -s friendster.txt friendster_J
wghSNAPtoAdj has similar issue as well.
I tried different compilers, icpc and g++ 5.4.0.
When compiling, it prints out the following commands respectively.
icpc -std=c++14 -O3 -DCILKP -o SNAPtoAdj SNAPtoAdj.C
or
g++ -std=c++14 -fcilkplus -lcilkrts -O2 -DCILK -o SNAPtoAdj SNAPtoAdj.C
However, the executable files of both failed on friendster.
Could someone give me any idea what is going wrong?
Thanks in advance!
I have a bunch of weighted/unweighted graphs in .adj
format and I'm trying to run BFS, CC and Pagerank on them. Since, BFS doesn't directly run weighted graphs, I used the ./encode
to compress the graphs and then run BFS on it. I'm getting segmentation faults when I run it with compressed weighted graphs, but they run fine when I run with compressed/uncompressed unweighted graphs.
To reproduce:
Download CTU-46.adj from here: CTU-46 and download CTu-uw-46.adj from here: CTu-uw-46
Compress the matrix:
./encoder -w CTU-46.adj out46-w.compressed
./encoder CTu-uw-46.adj out46.compressed
Weighted
./BFS -c out46-w.compressed
size = 1396572
n = 41474 m = 85951 totalSpace = 369766
reading file...
inTotalSpace = 363182
creating graph...
[1] 15186 segmentation fault ./BFS -c out46-w.compressed
Unweighted
./BFS -c out46-uw.compressed
size = 994259
n = 41474 m = 85951 totalSpace = 164219
reading file...
inTotalSpace = 166416
creating graph...
Running time : 0.00235
Running time : 0.00234
Running time : 0.00164
I think your work great and I am very interested in it.
I get the code and when I try to run some algorithm in a graph larger than the naive example, I find it use much more memory than the origin graph expression,which is simplely each line. I think it is strange.
As I get into the source code, a struct named Word catch my eye. through the /proc/pid/status I find that the form of this struct take more than half memory of the program. And I find some comment in ligra/IO.h#220 (sorry I dont know how to give the link) that cancel the destruct function.
I am confused about these:
Why it takes so much memory to compute in ligra even in symmetric graph?
why the destruct func commented ?
Thanks.
Going through the code, I wondered in the ligra.h
file why the lines 475 - 478 call the function char* getOptionValue(string option)
and not bool getOption(string option)
and why this works. After a while I undestood this works because the last argument is always the file, thus after every option is always another argument. I don't think this is wanted, because the function bool getOption(string option)
exists, too.
How do I run the Ligra benchmarking suit on a bare-metal multicore RISC-V emulator? If an OS like Linux is required, can I run it on a non-MMU Linux environment on my multicore RISC-V emulator?
I wanted to know how is the Graph is actually represented in memory for some algorithms written here in this framework. I wanted to run some algorithms for my school project where the input to the algorithm is a graph.
I can see the input data provided here but I can't conclude the representation of the Graph(Although, it's written 'Adjacency Graph' on the top, but is it 'Adjacency List' or what?). It seems like just vertices of the graph is provided in the input file, but how do I form edges in the Graph?
In the docs here I see that:
One of the main goals of Ligra is to abstract away implementation specific details about how the graph is represented (Is the graph compressed/uncompressed? Is it directed/undirected?)
But still, I wanted to pass my own graph input which is represented as Adjacency List
? Can you please tell how is the Graph formed from the data given below? How can I form Adjacency List out of it?
AdjacencyGraph
128
708
0
8
8
17
24
28
28
33
37
.
.
.
These are the first few lines of the input file written here.
Or can you please tell how to pass our own graph (represented as Adjacency List) as the input to the algorithms written here.
Example: How do I pass the below graph in the input?
1
/ \
2 3
/ \ / \
4 5,6 7
\ | | /
8
If I have to represent the above graph in memory then I would write the file something like this:
(source node, destination node)
1, 2
1, 3
2, 4
2, 5
3, 6
3, 7
4, 8
5, 8
6, 8
7, 8
So how can I pass this representation(or modify this representation) to work with the algorithms written here.
The header of the generated files perfectly well describes that a specific graph is weighted or that it is unweighted. I suggest that using the encoder with the -w
flag is offloading the responsibility onto the user.
However, if the user determines that a graph is weighted, by looking at the first line of the input graph, then so can the program. Otherwise, a user will only have to guess, and the program (given it exits abnormally) can again try interpreting the graph as weighted.
How can I use the closeness and betweenness centrality measures?
Use the markdown format to make README look pretty.
How to set the threads numbers of Ligra?
I tried to run a couple of them (BFS, CC, PR etc.) on large graphs and I see only one core at 100% with htop. Are those implementations in apps/
multicore? If so, how do I make it run on multiple cores?
Being able to convert to and from Graphviz (or any other format, for that matter) would allow for this freamework to be somewhat more useable than just for benchmarking. Right now, for use with anything but the included generator I need to write my own parser, which defeats the purpose of this being a framework.
I noticed "long rounds = P.getOptionLongValue("-rounds",3);" in ligra.h file. And I am finding the same result is calculated four times, which seems taking more time.
I reduced rounds to 1, does that mean actually setting rounds to 1 will take even longer time in terms of parallel computing? I do not understand why we need to compute the same thing for four times.
Also , if I deleted the for loops, and replace it with below code
graph<compressedSymmetricVertex> G =
readCompressedGraph<compressedSymmetricVertex>(iFile,symmetric); //symmetric graph
Compute(G,P);
G.del();
And I got the result:
size = 13233762
n = 7164775534703567937 m = 3677866877581936505 totalSpace = 3904913592795935540
reading file...
Segmentation fault: 11
I am thinking maybe it is because my input data file is too large, and some memory problem occur. But I am still confused, what is the best strategy here. I am running local graph clustering to get PPR vector for each node of a graph of 343299 nodes. So I also want to make sure that I choose an approach that saves time. Thanks!
If I have an undirected graph, but I did not put -s in the command line, will the result be affected? Based on my result and ligra.h file, I think no, but just want to double check. Many thanks!
Hi, how do I view the computed betweenness values for each vertex in BC?
I am using the wikipedia-ru graph, which is in the SNAP format to create the Ligra supported Adjacency graph format.
But, it is segfaulting for me. I am using g++ compiler with -fopenmp flag.
Please, let me know where I am going wrong.
Thanks.
-Nibedita
Is there any conversion code from graphs in Matrix Market format to the graph format used by libra? Thanks
How to produce :
$./utils/randLocalGraph -s -m 100 -n 101 graph_file.txt
Segmentation fault (core dumped)
Does randLocalGraph not permit to generate graph where n > m ? (it works for n <= m)
( I am trying to generate graph with many components, any help on that will be great)
Thanks !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.