Giter Club home page Giter Club logo

powergraph's Issues

graphlab hits LLVM bug on older 64-bit Mac

What I suspect is the root cause: http://llvm.org/bugs/show_bug.cgi?id=14947

I have a full email written out that I sent to [email protected], but it bounced. Is the list broken, or is the email address on the download page bad?

I transcribe the email below.

Hey folks,

See the information you requested below.  (P.S. I sit at UW Allen Center in CSE 434 if you happen to be around and want to debug in person.)

Thanks,
Dan

1.
OS X, latest XCode installed, use macports as package manager. ./configure worked fine. Then I went into the graph_analytics subdirectory and ran make -j3 and got the following error: (The output is actually from make since I wanted to turn of parallelism for the clearest possible output.)
[ 87%] Building CXX object toolkits/graph_analytics/CMakeFiles/approximate_diameter.dir/approximate_diameter.cpp.o
fatal error: error in backend: Cannot select: intrinsic
      %llvm.x86.sse42.crc32.64.64
make[2]: *** [toolkits/graph_analytics/CMakeFiles/approximate_diameter.dir/approximate_diameter.cpp.o] Error 1
make[1]: *** [toolkits/graph_analytics/CMakeFiles/approximate_diameter.dir/all] Error 2
make: *** [all] Error 2

2. Mac OS X 10.8.4

3. % uname -a
Darwin dhm.dyn.cs.washington.edu 12.4.0 Darwin Kernel Version 12.4.0: Wed May  1 17:57:12 PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64

4.
MacBook Pro 13-inch, Mid 2010. Processor: 2.66 GHz Intel Core 2 Duo / Memory 8 GB 1067 MHz DDR3 / Graphics NVIDIA GeForce 320M 256 MB

5. % g++ -v
Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2336.11~28/src/configure --disable-checking --enable-werror -prefix=/Applications/Xcode.app/Contents/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2336.11~28/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)

% clang++ -v
Apple clang version 4.0 (tags/Apple/clang-421.0.60) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix


6. graphlabapi/config.log and graphlabapi/configure.deps are attached.

7. Git Log:

commit c9b5637729f7ee7d8dc9c3ac2bb68c127b43704e
Merge: 6d8ac8a bba8592
Author: Yucheng Low <[email protected]>
Date:   Thu Aug 8 09:59:24 2013 -0700

    Merge pull request #11 from ylow/master

    Removed old research-experimental-legacy-stuff from apps and ext-apis.

Missing libraries in linking instructions?

I have installed and linked against GraphLab the hello world application working outside GraphLab's source tree (as described in the second point of the "writing your own apps" section in the README).

I found that more libraries than the ones indicated in the README were required to compile successfully. I wonder if this is normal and, in case it is, it could be worth to update these instructions.

The line in the README is:

g++ -pthread -lz -ltcmalloc -levent -levent_pthreads -ljson -lboost_filesystem -lboost_program_options -lboost_system -lboost_iostreams -lboost_date_time -lhdfs -lgraphlab hello_world.cpp

In my case I also had to use the following flags:

-lmpi -lmpi++ -lzookeeper_mt -lzookeeper_st -lboost_context

My machine is running Ubuntu 12.04, Open MPI 1.4.3, configured with --no_jvm, and the rest of the libraries' versions are the ones bundled with cmake.

Thanks in advance!

Warp system documentation

In the basic tutorial it is mentioned:

All of GraphLab lives in the graphlab namespace. You may use

using namespace graphlab;

if you wish, but we recommend against it.

Then, when one progresses to the warp tutorial, the code is written with graphlab in the namespace. It is a bit confusing for a tutorial. If you wish I can change it and submit a pull request.

Jesús.

ec2 tutorial demo is broken

Yue Zhao reported the following error:

./gl-ec2 -i /.ssh/amazonec2.pem -z us-east-1a -s 1 launch launchtest
Setting up security groups...
Checking for running cluster...
GraphLab AMI for Standard Instances: ami-108d1c79
Launching instances...
Launched slaves, regid = r-2ff0b74b
Launched master, regid = r-57f0b733
Waiting for instances to start up...
Waiting 120 more seconds...
Copying SSH key /Users/bickson/.ssh/amazonec2.pem to master...
Copy hostfile to master...
Searching for existing cluster launchtest...
Found 1 master(s), 1 slaves, 0 ZooKeeper nodes
lost connection
Traceback (most recent call last):
File "./gl_ec2.py", line 700, in
main()
File "./gl_ec2.py", line 508, in main
setup_cluster(conn, master_nodes, slave_nodes, zoo_nodes, opts, cluster_name, True)
File "./gl_ec2.py", line 369, in setup_cluster
scp(master, opts, "machines", '
/machines')
File "./gl_ec2.py", line 482, in scp
(opts.identity_file, local_file, host, dest_file), shell=True)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 511, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'scp -q -o StrictHostKeyChecking=no -i /Users/bickson/.ssh/amazonec2.pem 'machines' '[email protected]:~/machines'' returned non-zero exit status 1

Excessive calls to graph finalize()

Graph finalization needs a "fast path" to avoid performing the full finalization communication path even when there are no vertices/edges added. This will improve performance a lot for large distributed graphs when there are repeated calls to engine start(), or save() functions.

Pagerank fails for a 2.2GB edges graph

With the following assertion:

�[1;32mINFO: mpi_tools.hpp(init:63): MPI Support was not compiled.
�[0m�[1;32mINFO: dc.cpp(init:573): Cluster of 1 instances created.
�[0m�[1;32mINFO: distributed_graph.hpp(set_ingress_method:3200): Automatically determine ingress method: grid
�[0mLoading graph in format: tsv
�[1;32mINFO: distributed_graph.hpp(load_from_posixfs:2189): Loading graph from file: ./num.tsv.dir
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 7742363 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 16209633 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 24663743 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 33109915 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 41622369 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 50180020 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 58714497 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 67235905 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 75748042 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 84392622 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 93208590 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 102012952 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 110826650 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 119658156 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 128472230 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 137280717 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 146094500 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 154910147 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 163723149 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 172539421 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 181351288 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 190168561 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 198990999 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 207805908 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 216626537 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 225440047 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 234260632 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 243079690 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 251896272 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 260715268 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 269534298 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 278359869 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 287182890 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 296001737 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 304842040 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 313651197 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 322474643 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 331295202 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 340116244 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 348929665 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 357755904 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 366574989 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 375399991 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 384226069 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 393048171 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 401870458 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 410693577 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 419512178 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 428337204 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 437171549 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 445988794 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 454808795 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 463628636 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 472459330 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 481298054 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 490132381 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 498990129 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 507825113 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 516664671 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 525504724 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 534346572 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 543195658 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 552056628 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 560944575 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 569846024 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 578744237 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 587662485 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 596600010 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 605535397 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 614473133 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 623416720 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 632353813 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 641290492 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 650254745 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 659188807 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 668120336 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 677050109 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 685983890 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 694921006 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 703848569 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 712774120 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 721702555 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 730637252 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 739566873 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 748506682 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 757441949 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 766377787 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 775309579 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 784239514 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 793176119 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 802106050 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 811042290 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 819973553 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 828904416 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 837836405 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 846770955 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 855705655 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 864636268 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 873569900 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 882497260 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 891423234 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 900353789 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 909285453 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 918221222 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 927153552 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 936084909 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 945023041 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 953955331 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 962880557 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 971822683 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 980762735 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 989701356 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 998638058 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1007571841 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1016503632 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1025434161 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1034357865 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1043284260 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1052216554 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1061150818 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1070093370 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1079030960 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1087964250 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1096899017 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1105829571 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1114760858 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1123691717 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1132620071 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1141551289 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1150485221 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1159417332 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1168349286 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1177285845 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1186219304 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1195152515 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1204086390 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1212958116 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1221792883 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1230619322 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1239440058 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1248269899 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1257097643 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1265926422 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1274748909 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1283568557 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1292405566 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1301225454 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1310058961 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1318884629 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1327708129 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1336529763 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1345349879 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1354170943 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1362994900 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1371818724 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1380640333 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1389469976 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1398290797 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1407113667 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1415934972 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1424758011 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1433578354 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1442403914 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1451231058 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1460054064 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1468873786 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1477697844 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1486532839 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1495361377 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1504192137 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1513034293 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1521861287 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1530687415 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1539515468 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1548335736 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1557160425 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1565977089 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1574799273 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1583620742 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1592438707 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1601259869 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1610090391 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1618914501 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1627746146 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1636572447 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1645396231 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1654217507 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1663042112 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1671872988 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1680701068 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1689526380 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1698351460 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1707171089 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1715993686 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1724818081 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1733640849 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1742465077 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1751296186 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1760115641 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1769046683 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1777874659 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1786708694 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1795525308 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1804345547 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1813161834 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1821985622 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1830797844 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1839621340 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1848441371 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1857261397 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1866077098 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1874896624 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1883719445 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1892539324 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1901359667 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1910182525 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1918995750 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1927814828 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1936633558 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1945446827 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1954266459 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1963086098 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1971896766 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1980708534 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1989520913 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1998350180 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2007168029 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2015982229 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2024798129 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2033617483 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2042440955 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2051253171 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2060068909 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2068885894 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2077713225 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2086524672 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2095344367 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2104171316 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2112982881 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2121874632 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2130792405 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2139711871 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2148642740 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2157576857 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2166497521 Lines read
�[0mINFO: distributed_graph.hpp(finalize:702): Distributed graph: enter finalize
�[0m�[1;32mINFO: distributed_ingress_base.hpp(finalize:185): Finalizing Graph...
�[0mINFO: memory_info.cpp(log_usage:90): Memory Info: Post Flush
Heap: 51676 MB
Allocated: 51153.8 MB
�[0mINFO: distributed_ingress_base.hpp(finalize:232): Graph Finalize: constructing local graph
�[0mtcmalloc: large alloc 17349451776 bytes == 0xca5502000 @ 0x6826df 0x50e200
tcmalloc: large alloc 17349451776 bytes == 0x10b072c000 @ 0x6826df 0x50e200
tcmalloc: large alloc 1610620928 bytes == 0x151bb2a000 @ 0x6826df 0x51c90d
tcmalloc: large alloc 3221233664 bytes == 0x157bcac000 @ 0x6826df 0x51c90d
INFO: memory_info.cpp(log_usage:90): Memory Info: Finished populating local graph.
Heap: 90912 MB
Allocated: 36696.1 MB
�[0mINFO: distributed_ingress_base.hpp(finalize:277): Graph Finalize: finalizing local graph.
�[0mtcmalloc: large alloc 17349451776 bytes == 0x163bfae000 @ 0x6826df 0x50b268
tcmalloc: large alloc 8674729984 bytes == 0x1a47194000 @ 0x6826df 0x51fc4d
tcmalloc: large alloc 17349451776 bytes == 0x1c4ca8a000 @ 0x6826df 0x50b268
tcmalloc: large alloc 17349451776 bytes == 0x2057c74000 @ 0x6826df 0x50b268
tcmalloc: large alloc 34698895360 bytes == 0x2462e5a000 @ 0x6826df 0x5063ea
tcmalloc: large alloc 34698895360 bytes == 0x2c79224000 @ 0x6826df 0x5063ea
�[1;31mERROR: dynamic_csr_storage.hpp(wrap:81): Check failed: valueptr_vec[i]<value_vec.size() [18446744071562067971 < 2168680615]
�[0m

Extensions toolkit fails to compile

Its CMakeLists.txt says that it requires C++11, but when the project is configured to be compiled with C++11, this toolkit fails to compile.

GraphLab AMI is not available in the us-east region

The Graphlab AMIs are not available outside the us-west zone. Since some users do not have access to this zone or prefer using other zones, it is beneficial to copy the GraphLab AMI to other zone, e.g. the east zone.

Thanks,
-Khaled

Problem with SVD singular vectors accuracy

Bug report by Carlos Del Cacho:

Hello Danny,

Thank you for your fast response.

After correcting the issue I told you about read.csv in R, I get the same singular values. However, the accuracy of the first singular value is still not very good, I can't match R in terms of classifier performance after training. Perhaps I am still invoking it wrong.

Here are the first 10 components of the first V vector on the matrix I sent you. My understanding is that values should match, disregarding signs.

GraphLab:

v[1:10,1]
[1] -0.29007777 -0.02705319 -0.03089252 -0.02594989 -0.03459277
[6] -0.03003008 -0.02826690 -0.03112386 -0.02758899 -0.02632362

R Lanczos:

irlba(m,nv=100)$v[1:10,1]
[1] 0.03409986 0.03272688 0.03304065 0.02822084 0.02978243 0.02544790
[7] 0.02730222 0.02760405 0.03618342 0.03369470

R exact SVD

svd(m)$v[1:10,1]
[1] -0.03409986 -0.03272688 -0.03304065 -0.02822084 -0.02978243
[6] -0.02544790 -0.02730222 -0.02760405 -0.03618342 -0.03369470

SLEPc
0.0340999
0.0327269
0.0330406
0.0282208
0.0297824
0.0254479
0.0273022
0.0276041
0.0361834
0.0336947

ncpus does not work if NO_OPENMP is true

In linux,
configure graphlab with using --no_openmp
running graphlab pagerank like this :

mpiexec -n 5 -hostfile ~/machines /home/zork/Dev-pla/graphlab-trace/release/toolkits/graph_analytics/pagerank \
--graph /home/zork/graph/twitter_rv  \
--format snap --iterations=10 --ncpus=10 

will only use 1 cpu.

Typo in fiber documentation.

In the "Fiber Compatible Remote Requests" section, an example is given as::

... /* elsewhere /
graphlab::remote_future future = fiber_remote_request(1, /
call to machine 1 */
add_one,
1);

This should be request_future, not remote_future

PageRank Performance Regression

There is a performance regression in PageRank between v2.2 and v2.1 which appears in both single machine and distributed machine deployments. The observed slowdown is about 2-3x.

This can be tested with:

mpiexec -f ~/mpd.hosts -n 1 ./pagerank --graph ~/data/nfs/input/soc-LiveJournal.txt --format tsv --ncpus 16 --engine synchronous
mpiexec -f ~/mpd.hosts -n 4 ./pagerank --graph ~/data/nfs/input/soc-LiveJournal.txt --format tsv --ncpus 16 --engine synchronous

Running kmeans executable in distributed Environment

Graphab offers a kmeans executable to perform clustering, I've tried that on a single node and it works perfectly. My question is, how can I do that in a distributed environment?
I've created two virtual machine, they are on the same network and the ipaddresd of each one is reachable from the other (I've test using ping), each machine has the kmeans executuable compiled from the graphlab source.
I've see in the official documentation that the command for running the kmeans distributed is :

mpiexec -n [N machines] --hostfile [host file] ./kmeans ....

How a hostfile should be??
Someone has ever ran kmeans using mpi?

Thanks in advance for the help.

issue on parallel ingress using obvious heuristics

It seems that thers is no general thread-level parallel support for this:

file: ./graphlab-master/src/graphlab/graph/distributed_graph.hpp
1902 #ifdef _OPENMP
1903 #pragma omp parallel for
1904 #endif
1905 for(size_t i = 0; i < graph_files.size(); ++i) {
1906 if ((parallel_ingress && (i % rpc.numprocs() == rpc.procid()))

When loading from multiple files by multiple processes on 'obvious mode', openmp doesn't work correctly here.

User Guide's local deployment example getting error

Using GraphLab V1.0.1, I'm following the User Guide's local deployment example at: http://graphlab.com/learn/userguide/index.html#Deployment

After the job is finished, check execution.log and see this error in there:
...
[INFO] Task completed: train
[INFO] Task started: recommend
[ERROR] Exception raised from task: 'recommend' code: 'model'
Exception: ("Unable to complete task successfully, Exception raised, trace: Traceback (most recent call last):\nKeyError: 'model'\n", KeyError('model',))
[INFO] Stopping the server connection.
[INFO] GraphLab server shutdown

gl_ec2.py script fails silently when private key permissions are not 400

error message is:

ubuntu@ip-10-236-158-207:~/graphlab/scripts/ec2$ ./gl-ec2 -i ~/yxzhao02.pem -k yxzhao02 -s 1 launch launchtest
Setting up security groups...
Checking for running cluster...
GraphLab AMI for Standard Instances: ami-108d1c79
Launching instances...
Launched slaves, regid = r-77bf7313
Launched master, regid = r-ee76598c
Waiting for instances to start up...
Waiting 120 more seconds...
Copying SSH key /home/ubuntu/yxzhao02.pem to master...
Copy hostfile to master...

Searching for existing cluster launchtest...
Found 3 master(s), 6 slaves, 0 ZooKeeper nodes
lost connection
Traceback (most recent call last):
File "./gl_ec2.py", line 700, in
main()
File "./gl_ec2.py", line 508, in main
setup_cluster(conn, master_nodes, slave_nodes, zoo_nodes, opts, cluster_name, True)
File "./gl_ec2.py", line 369, in setup_cluster
scp(master, opts, "machines", '~/machines')
File "./gl_ec2.py", line 482, in scp
(opts.identity_file, local_file, host, dest_file), shell=True)
File "/usr/lib/python2.7/subprocess.py", line 511, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'scp -q -o StrictHostKeyChecking=no -i /hom

Ubuntu 14.04 build issue

There is a build issue with boost on Ubuntu 14.04 g++ and gcc versions where the downloaded boost version mis-communicates with g++ whether or not 64 bit ints are explicitly defined, causing a build error. There is a patch but that patch is not yet published to the downloaded link. This causes a build failure that can not be prevented before hand.

dmcennis/graphlab GraphRAT branch has a script that will patch both versions of the offending header file with the appropriate boost patch, but only works after the build has already failed. Hopefully the upstream releases a new link soon...

GraphLab doesn't build under Mac OS X 10.9.4 and Java 8. Asks for OpenMP too.

Hi, I encounter problems building GraphLab in Mac OS X 10.9.4 and Java 8. You can see here a file called jni_md.h is missing. I managed to find this file in: /Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/include/darwin/jni_md.h then I created a link in the /Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/include folder and that solved the problem.

Then I continued the compilation and I stacked on this problem, for some reason GraphLab is asking me for OpenMP, but in the configuration time, GraphLab detects that I don't support OpenMP.

What should I do?

User Guide's EC2 deployment example getting error

Using GraphLab V1.0.1, I'm following the User Guide's EC2 deployment example at: http://graphlab.com/learn/userguide/index.html#Deployment

Getting the error because 'conn' is None:

[INFO] Preparing using environment: ec2
[INFO] Beginning Job Validation.

[INFO] Validation complete. Job: 'ec2-exec' ready for execution

AttributeError Traceback (most recent call last)
in ()
1 # spin up an EC2 instance to run this work
2 job_ec2 = gl.deploy.job.create(tasks_with_bindings, name='ec2-exec',
----> 3 environment=ec2)

/Users/ttam/anaconda/lib/python2.7/site-packages/graphlab/deploy/job.pyc in create(tasks, name, environment, function, function_arguments, required_packages)
338
339 LOGGER.info("Validation complete. Job: '%s' ready for execution" % name)
--> 340 job = env.run(_session, cloned_artifacts, name, environment)
341 _session.register(job)
342 job.save() # save the job once prior to returning.

/Users/ttam/anaconda/lib/python2.7/site-packages/graphlab/deploy/_executionenvironment.pyc in run(self, session, tasks, name, environment)
66 """
67 job = _job.Job(name, tasks=tasks, environment=environment)
---> 68 return self.run_job(job, session)
69
70

/Users/ttam/anaconda/lib/python2.7/site-packages/graphlab/deploy/_executionenvironment.pyc in run_job(self, job, session)
299 job._serialize(serialized_job_file_path)
300
--> 301 commander = Ec2ExecutionEnvironment._start_commander_host(job.environment, credentials)
302 post_url = "http://%s:9004/submit" % commander.public_dns_name
303 LOGGER.debug("Sending %s to %s" % (serialized_job_file_path, post_url))

/Users/ttam/anaconda/lib/python2.7/site-packages/graphlab/deploy/_executionenvironment.pyc in _start_commander_host(environment, credentials)
266 security_group_name = environment.security_group,
267 tags = environment.tags, user_data = user_data,
--> 268 credentials = credentials)
269 return commander
270

/Users/ttam/anaconda/lib/python2.7/site-packages/graphlab/connect/aws/_ec2.pyc in _ec2_factory(instance_type, region, CIDR_rule, security_group_name, tags, user_data, credentials, ami_service_parameters, num_hosts)
419 # Does the security group already exist?
420 security_group = None
--> 421 for sg in conn.get_all_security_groups():
422 if(security_group_name == sg.name):
423 security_group = sg

AttributeError: 'NoneType' object has no attribute 'get_all_security_groups'

Getting error when loading graph from a binary

I build a small distributed graph and then save it to a binary file with the save_binary call. I then try to load the binary file; however, I get the error

dynamic_local_graph.hpp(finalize:334): Check failed: _csr_storage.num_values()==edges.size() [0 == 8]

I was testing on a really simple graph with 8 edges, which explains why edges.size() is 8. But I am not sure what exactly csr_storage.num_values() is. I am running this on Ubuntu 12.04.

PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs

Hi, GraphLab Experts,

This email is aimed at a first time disclosure of project PowerLyra, which is a new hybrid graph analytics engine based on GraphLab 2.2 (PowerGraph).

As you can see, natural graphs with skewed distribution raise unique challenges to graph computation and partitioning. Existing graph analytics frameworks usually use a “one size fits all” design that uniformly processes all vertices and result in suboptimal performance for natural graphs,
which either suffer from notable load imbalance and high contention for high-degree vertices (e.g., Pregel and GraphLab), or incur high communication cost among vertices even for low-degree vertices (e.g., PowerGraph).

We argued that skewed distribution in natural graphs also calls for differentiated processing of high-degree and low-degree vertices. We then developed PowerLyra, a new graph analytics engine that embraces the best of both worlds of existing frameworks, by dynamically applying different computation and partition strategies for different vertices. PowerLyra uses Pregel/GraphLab like computation models for process low-degree vertices to minimize computation, communication and synchronization overhead, and uses PowerGraph-like computation model for process high-degree vertices to reduce load imbalance and contention. To seamless support all PowerLyra application, PowerLyra further introduces an adaptive unidirectional graph communication.

PowerLyra additionally proposes a new hybrid graph cut algorithm that embraces the best of both worlds in edge-cut and vertex-cut,
which adopts edge-cut for low-degree vertices and vertex-cut for high-degree vertices. Theoretical analysis shows that the expected replication factor of random hybrid-cut is always better than both random vertex-cut and edge-cut. For skewed power-law graph, empirical validation shows that random hybrid-cut also decreases the replication factor of current default heuristic vertex-cut (Grid) from 5.76X to 3.59X and from 18.54X to 6.76X for constant 2.2 and 1.8 of synthetic graph respectively. We also develop a new distributed greedy heuristic hybrid-cut algorithm, namely Ginger, inspired by Fennel (a greedy streaming edge-cut algorithm for a single machine). Compared to Gird vertex-cut, Ginger can reduce the replication factor by up to 2.92X (from 2.03X) and 3.11X (from 1.26X) for synthetic and real-world graphs accordingly.

Finally, PowerLyra adopts locality-conscious data layout optimization in graph ingress phase to mitigate poor locality during vertex communication. we argue that a small increase of graph ingress time (less than 10% for power-law graph and 5% for real-world graph) is more worthwhile for an often larger speedup in execution time (usually more than 10% speedup, specially 21% for Twitter follow graph).

Right now, PowerLyra is implemented as an execution engine and graph partitions of GraphLab, and can seamlessly support all GraphLab applications. A detail evaluation on 48-node cluster using three different graph algorithms (PageRank, Approximate Diameter and Connected Components) show that PowerLyra outperforms current synchronous engine with Grid partition of PowerGraph (Jul. 8, 2013. commit:fc3d6c6)
by up to 5.53X (from 1.97X) and 3.26X (from 1.49X) for real-world (Twitter, UK-2005, Wiki, LiveJournal and WebGoogle) and synthetic (10-million vertex power-law graph ranging from 1.8 to 2.2) graphs accordingly, due to significantly reduced replication factor, less communication cost and improved load balance.

The website of PowerLyra: http://ipads.se.sjtu.edu.cn/projects/powerlyra.html

The latest release has ported to GraphLab 2.2 (Oct. 22, 2013. commit:e8022e6), which aims to provide best compatibility with minimum changes to framework (Perhaps, only add a "type" field to vertex_record.). But this version has no locality-conscious graph layout optimisation now. You can check out the branch from IPADS's gitlab server: git clone http://ipads.se.sjtu.edu.cn:1312/opensource/powerlyra.git

If you are interested in estimating or working with full PowerLyra, you can obtain a snapshot from Sep. 25, 2013, which is based on GraphLab 2.2 (Jul. 8, 2013. commit:fc3d6c6). Snapshot: http://ipads.se.sjtu.edu.cn/projects/powerlyra/powerlyra-snapshot-0.8-32685a.tar.gz (MD5: 32685a65d6edc2e52d791a2cffef1dfa)

If you have interests in trying, you can first refer to the documentation and tutorials from GraphLab.org, which provides step-to-step details on building, configuring and running. Further, you need select our PowerLyra engines and partitions in application running (see "quick start" in website of PowerLyra).

Any comments are welcome!

Rong Chen
Institute of Parallel and Distributed Systems,
Shanghai Jiao Tong University, China
http://ipads.se.sjtu.edu.cn/

Pure Virtual Function Called

I have a rather simple vertex program which runs completely fine with the synchronous engine. However, it crashes with the error "Pure Virtual Function Called" if I use the asynchronous engine. Maybe I have somehow missed that there is an additional method to be implemented when using the latter engine?

The code of the vertex program can be found here https://github.com/iglesias/graphlab-benchmark/blob/master/benchmark.cpp#L125

Here is GDB's backtrace https://gist.github.com/iglesias/7378890.

printlock.lock() mutex assertion when using USE_TRACEPOINT performance monitoring

I am getting the following error:
dc_call_dispatch: dc: time spent issuing RPC calls
Events: 1262
Total: 671.31 ms
Mean: 0.531941 ms
Min: 0.0446786 ms
Max: 0.797284 ms
dc_receive_multiplexing: dc: time spent exploding a chunk
Events: 0
Total: 0 ms
[Thread 0x7fffefe9f700 (LWP 10122) exited]
[Thread 0x7fffeee9d700 (LWP 10124) exited]
[Thread 0x7fffee69c700 (LWP 10125) exited]
[Thread 0x7fffef69e700 (LWP 10123) exited]
ERROR: mutex.hpp(lock:69): Check failed: !error

Program received signal SIGABRT, Aborted.
0x00007ffff5965425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) where
#0 0x00007ffff5965425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff5968b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x000000000076b7bf in graphlab::mutex::lock (this=0xd22f00) at /home/ubuntu/graphlab/src/graphlab/parallel/mutex.hpp:69
#3 0x00000000009043ad in graphlab::trace_count::~trace_count (this=0xd205a0, __in_chrg=) at /home/ubuntu/graphlab/src/graphlab/util/tracepoint.cpp:65
#4 0x00007ffff596a901 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007ffff596a985 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x00007ffff5950774 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x000000000075b4c9 in _start ()

(gdb) f 3
#3 0x00000000009043ad in graphlab::trace_count::~trace_count (this=0xd205a0, __in_chrg=) at /home/ubuntu/graphlab/src/graphlab/util/tracepoint.cpp:65

65 printlock.lock();

It seems that the printlock is already locked - or the mutex is in error state from some other reason.

The way to reproduce this error is to enable USE_TRACEPOINT
and run
./svd smallnetflix/ --rows=95527 --cos=3562 --nv=6 --nsv=2 --max_iter=2=
where the folder smallnetflix includes the smallnetflix_mm.train input file.

Thanks!

No meaning code in / toolkits / graph_analytics / pagerank.cpp


/* The scatter edges depend on whether the pagerank has converged */
edge_dir_type scatter_edges(icontext_type& context,
const vertex_type& vertex) const {
// If an iteration counter is set then
if (ITERATIONS) return graphlab::NO_EDGES;
// In the dynamic case we run scatter on out edges if the we need
// to maintain the delta cache or the tolerance is above bound.
if(USE_DELTA_CACHE || std::fabs(last_change) > TOLERANCE ) {
return graphlab::OUT_EDGES;
} else {
return graphlab::NO_EDGES;
}
}

/* The scatter function just signal adjacent pages */
void scatter(icontext_type& context, const vertex_type& vertex,
edge_type& edge) const {
if(USE_DELTA_CACHE) {
context.post_delta(edge.target(), last_change);
}

if(last_change > TOLERANCE || last_change < -TOLERANCE) {
    context.signal(edge.target());
} else {
  context.signal(edge.target()); //, std::fabs(last_change));
}

}

The if/else code in scatter() is duplicated.

Perhaps, it should be like following or anything else.

void scatter(icontext_type& context, const vertex_type& vertex,
edge_type& edge) const {
if(USE_DELTA_CACHE) {
context.post_delta(edge.target(), last_change);
if(last_change > TOLERANCE || last_change < -TOLERANCE)
context.signal(edge.target());
} else {
context.signal(edge.target()); //, std::fabs(last_change));
}
}

"set_ingress_method" may choose incompatible method !

Hi,

The set_ingress_method method in distributed_graph.hpp can choose "PDS" by mistake because function "sharding_constraint::is_pds_compatible(num_shards, p)" does not check if p is a prime number. !

static bool is_pds_compatible(size_t num_shards, int& p) {
p = floor(sqrt(num_shards-1));
return (p>0 && ((p*p+p+1) == (int)num_shards));
}

The source code of is_pds_compatible only checks if the p^2 + p + 1 equation is satisfied with no concerns for the second condition "p should be a prime number". For example, using auto ingress for a cluster of 21 machines should lead to Oblivious but GraphLab suggests PDS and then fail with error:

ERROR: generate_pds.hpp(get_pds:50): Fail to generate pds for p = 4
ERROR: sharding_constraint.hpp(sharding_constraint:96): Check failed: joint_nbr_cache[i][j].size()>0 [0 > 0]

Thanks,
-Khaled

can't run in machines which have more than 64 cores

The program written with graphlab apis can't run in machines which have more than 64 cores. The error message is :
ERROR: fiber_control.cpp(launch:226): Check failed: affinity.popcount() > 0 [0 > 0]

In fiber_control.hpp, I find a definition "typedef fixed_dense_bitset<64> affinity_type". I guess this error is result from this fixed length. Maybe you can change 64 to a larger number or make some other modifications.

c++: error: unrecognized command line option '-fast'

Hi,

I'm compiling graphlab on a Mac, running OS X 10.9.4 Mavericks.

I got the following compilation error:

libjson version: 7.6.0 target: OS: Darwin

c++: error: unrecognized command line option '-fast'
make[3]: *** [Objects_static/internalJSONNode.o] Error 1
make[2]: *** [../deps/json/src/libjson-stamp/libjson-build] Error 2
make[1]: *** [CMakeFiles/libjson.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs.

Any idea what's wrong?

Thanks in advance!

Compile error on OSX 10.9

Hi,

I'm getting this error while compiling graphlab (master) on OSX 10.9:

In file included from /Users/hstm/Development/src/graphlab/deps/local/include/opencv2/contrib/retina.hpp:76:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/c++/v1/valarray:4257:55: error:
'value_type' is a private member of
'graphlab::discrete_domain<4>::ConstIterator'
__val_expr<_BinaryOp<not_equal_to

Thanks,

Helge

GraphLab 2.2 requires OpenMPI 1.7.2 or greater

In both the tutorial:

http://graphlab.org/tutorials-2/graphlab-cluster-deployment-quick-start/

and the configure checks there is no mention of the fact that GraphLAB 2.2 seems to assume OpenMPI version 1.7.2 and not the actual OpenMPI official stable release of 1.6.2. This is apparently due to the event library conflicts discussed here:

https://groups.google.com/d/msg/graphlabapi/2K06PWBZUYk/EjhlQXJ2pNgJ

Quoted here:

"I suspected as much. We have an annoying compatibility problem with
OpenMPI 1.5 and 1.6 due to a bug in the OpenMPI build.
OpenMPI uses some parts of libevent, and the OpenMPI library incorrectly
exports some functions from libevent. This conflicts with the libevent we
use in our code.

Unfortunately I can't think of a simple solution to this problem...
Does anyone else have any ideas?

If you control the server/cluster, a simple workaround will be to install
OpenMPI 1.7 which fixes the bug, or MPICH2.

This should be fixed in the tutorial by clearly specifying which version is required and it should also be fixed in the configure script by having it check the OpenMPI version. Unfortunately I can't figure out where all of the M4 autoconf scripts are to fix configure or else I'd do it and send the pull request in.

Enable/fix likelihood calculation in cgs_lda

The likelihood calculation for the collapsed Gibbs sampler appears to be unstable and so it has been disabled. Here are the correct calculations based on my matlab code:

function llik = eval_llik(counts, n_td, n_wt, alpha, beta)
[ndocs, nvocab] = size(counts);
[ntopics, ~] = size(n_td);

llik_w_given_z = ...
  ntopics * (gammaln(nvocab * beta) - nvocab * gammaln(beta)) + ...
  sum((sum(gammaln(n_wt + beta)) - gammaln( sum(n_wt) + nvocab*beta)));

llik_z = ...
  ndocs * (gammaln(ntopics * alpha) - ntopics * gammaln(alpha)) + ...
  sum(sum(gammaln(n_td + alpha)) - gammaln(sum(n_td) + ntopics * alpha));

llik = llik_w_given_z + llik_z;

end

It would be helpful to have the likelihood calculation re-enabled for diagnostics.

Compile error on Ubuntu 12.04 32bit

I am getting the following error:
make[1]: 正在进入目录 /usr/local/graphlab/release/CMakeFiles/CMakeTmp' /usr/bin/cmake -E cmake_progress_report /usr/local/graphlab/release/CMakeFiles/CMakeTmp/CMakeFiles 1 Building CXX object CMakeFiles/cmTryCompileExec.dir/src.cxx.o /usr/bin/c++ -DHAS_CRC32 -O3 -Wno-unused-local-typedefs -Wno-attributes -march=native -mtune=native -Wall -g -fopenmp -o CMakeFiles/cmTryCompileExec.dir/src.cxx.o -c /usr/local/graphlab/release/CMakeFiles/CMakeTmp/src.cxx /usr/local/graphlab/release/CMakeFiles/CMakeTmp/src.cxx: 在函数‘int main(int, char**)’中: /usr/local/graphlab/release/CMakeFiles/CMakeTmp/src.cxx:1:68: 错误: ‘__builtin_ia32_crc32di’在此作用域中尚未声明 在全局域: cc1plus: 警告: 无法识别的命令行选项“-Wno-unused-local-typedefs” [默认启用] make[1]: *** [CMakeFiles/cmTryCompileExec.dir/src.cxx.o] 错误 1 make[1]:正在离开目录/usr/local/graphlab/release/CMakeFiles/CMakeTmp'
make: *** [cmTryCompileExec/fast] 错误 2

uname -a
Linux ray-pc 3.2.0-57-generic-pae #87-Ubuntu SMP Tue Nov 12 21:57:43 UTC 2013 i686 i686 i386 GNU/Linux

g++ -v
使用内建 specs。
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-linux-gnu/4.6/lto-wrapper
目标:i686-linux-gnu
配置为:../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --enable-targets=all --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=i686-linux-gnu --host=i686-linux-gnu --target=i686-linux-gnu
线程模型:posix
gcc 版本 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)

Add a configure option that disables tcmalloc

Since on MAC OS there are sometimes linker problems.

The configure option should do the following:

We should compile fine with clang. That should not be a problem.
The segfault appears to be inside the dynamic library loader, and from the looks of the compile warnings, might be related to tcmalloc. Indeed we have found that sometimes tcmalloc does not work on certain Mac setups, but we have never managed to figure out why.

You can try disabling tcmalloc. Unfortunately, we do not have a configure option to disable tcmalloc, you will need to edit the CMakeLists.txt for that.

Comment / delete the following lines(should be lines 223 to 240)
if(APPLE)
set (tcmalloc_shared "--enable-shared=yes")
else()
set (tcmalloc_shared "--enable-shared=no")
endif()

ExternalProject_Add(libtcmalloc
PREFIX ${GraphLab_SOURCE_DIR}/deps/tcmalloc
URL http://gperftools.googlecode.com/files/gperftools-2.0.tar.gz
URL_MD5 13f6e8961bc6a26749783137995786b6
PATCH_COMMAND patch -N -p0 -i ${GraphLab_SOURCE_DIR}/patches/tcmalloc.patch || true
CONFIGURE_COMMAND <SOURCE_DIR>/configure --enable-frame-pointers --prefix=<INSTALL_DIR> ${tcmalloc_shared}
INSTALL_DIR ${GraphLab_SOURCE_DIR}/deps/local)

link_libraries(tcmalloc)

set(TCMALLOC-FOUND 1)
add_definitions(-DHAS_TCMALLOC)

Look for the following block of code (should be lines 481-499) and delete the two occurences of tcmalloc and libtcalloc

macro(requires_core_deps NAME)
target_link_libraries(${NAME}
${Boost_LIBRARIES}
z
tcmalloc
event event_pthreads
zookeeper_mt
json)
add_dependencies(${NAME} boost libevent libjson zookeeper libtcmalloc)
if(MPI_FOUND)
target_link_libraries(${NAME} ${MPI_LIBRARY} ${MPI_EXTRA_LIBRARY})
endif(MPI_FOUND)
if(HADOOP_FOUND)
target_link_libraries(${NAME} hdfs ${JAVA_JVM_LIBRARY})
add_dependencies(${NAME} hadoop)
endif(HADOOP_FOUND)
endmacro(requires_core_deps)

Collapsed Variational Bayes (CVB0) for Topic Modeling

The current collapsed Gibbs (CGS) sampler algorithm for topic modeling performs well but has the challenge of assessing convergence. It would be interesting to try CVB0 updates instead which would simplify convergence assessment. In addition CVB0 actually fits the GraphLab abstraction slightly better. The challenge will be in reducing the memory footprint of CVB0 which naively requires ntopics * double * ntokens to store the variational approximations (rather than the int * ntokens required for CGS).

Allow manually disabling MPI

For some platforms and many use cases it may be desirable to run GraphLab as a single machine multicore platform. In these situations incompatibility with existing MPI installations (or bugs in existing MPI installations) could lead to issues.

error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’

Just checked out 2.2 and got this error building apps/twitterrank/CMakeFiles/twitterrank.dir/twitterrank.cpp.o

[ 98%] Building CXX object apps/twitterrank/CMakeFiles/twitterrank.dir/twitterrank.cpp.o
/usr/local/projects/graphlab2/graphlab/apps/cascades/cascades.cpp: In function ‘int main(int, char**)’:
/usr/local/projects/graphlab2/graphlab/apps/cascades/cascades.cpp:239:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
In file included from /usr/local/projects/graphlab2/graphlab/apps/ldademo/ldademo.cpp:24:0:
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::set_param_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:150:39: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:151:11: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:158:39: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:159:11: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::add_topic_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:218:39: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:219:11: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::lock_word_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:237:39: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:238:11: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:260:41: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:261:13: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
In file included from /usr/local/projects/graphlab2/graphlab/apps/twitterrank/twitterrank.cpp:27:0:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/pagerank.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > pagerank::weight_update_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/pagerank.hpp:97:33: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/pagerank.hpp:98:5: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp: In member function ‘virtual void lda::cgs_lda_vertex_program::scatter(graphlab::ivertex_program<graphlab::distributed_graph<lda::vertex_data, lda::edge_data>, lda::gather_type>::icontext_type&, const vertex_type&, graphlab::ivertex_program<graphlab::distributed_graph<lda::vertex_data, lda::edge_data>, lda::gather_type>::edge_type&) const’:
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:926:65: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:940:80: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
In file included from /usr/local/projects/graphlab2/graphlab/apps/twitterrank/twitterrank.cpp:28:0:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::set_param_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:141:33: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:142:5: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:149:33: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:150:5: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::add_topic_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:209:33: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:210:5: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::lock_word_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:228:33: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:229:5: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:251:35: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:252:7: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp: In member function ‘virtual void lda::cgs_lda_vertex_program::scatter(graphlab::ivertex_program<graphlab::distributed_graph<lda::vertex_data, lda::edge_data>, lda::gather_type>::icontext_type&, const vertex_type&, graphlab::ivertex_program<graphlab::distributed_graph<lda::vertex_data, lda::edge_data>, lda::gather_type>::edge_type&) const’:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:938:61: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:952:76: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-unused-local-typedefs" [enabled by default]
make[2]: *** [apps/ldademo/CMakeFiles/ldademo.dir/ldademo.cpp.o] Error 1
make[1]: *** [apps/ldademo/CMakeFiles/ldademo.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-unused-local-typedefs" [enabled by default]
make[2]: *** [apps/twitterrank/CMakeFiles/twitterrank.dir/twitterrank.cpp.o] Error 1
make[1]: *** [apps/twitterrank/CMakeFiles/twitterrank.dir/all] Error 2
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-unused-local-typedefs" [enabled by default]
Linking CXX executable label_propagation
[ 98%] Built target label_propagation
Linking CXX executable cascades
[ 98%] Built target cascades
make: *** [all] Error 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.