jegonzal / powergraph Goto Github PK

PowerGraph: A framework for large-scale machine learning and graph computation.

C++ 87.94% Shell 0.68% C 0.97% Python 6.38% Scala 0.01% MATLAB 0.40% Makefile 0.01% JavaScript 0.44% CSS 0.04% CMake 3.02% HTML 0.12%

powergraph's Introduction

GraphLab PowerGraph v2.2

UPDATE: For a signficant evolution of this codebase, see GraphLab Create which is available for download at turi.com

History

In 2013, the team that created GraphLab PowerGraph started the Seattle-based company, GraphLab, Inc. The learnings from GraphLab PowerGraph and GraphChi projects have culminated into GraphLab Create, a enterprise-class data science platform for data scientists and software engineers that can simplify building and deploying advanced machine learning models as a RESTful predictive service. In January 2015, GraphLab, Inc. was renamed to Turi. See turi.com for more information.

Status

GraphLab PowerGraph is no longer in active development by the founding team. GraphLab PowerGraph is now supported by the community at http://forum.turi.com/.

Introduction

GraphLab PowerGraph is a graph-based, high performance, distributed computation framework written in C++.

The GraphLab PowerGraph academic project was started in 2009 at Carnegie Mellon University to develop a new parallel computation abstraction tailored to machine learning. GraphLab PowerGraph 1.0 employed shared-memory design. In GraphLab PowerGraph 2.1, the framework was redesigned to target the distributed environment. It addressed the difficulties with real-world power-law graphs and achieved unparalleled performance at the time. In GraphLab PowerGraph 2.2, the Warp System was introduced and provided a new flexible, distributed architecture around fine-grained user-mode threading (fibers). The Warp System allows one to easily extend the abstraction, to improve optimization for example, while also improving usability.

GraphLab PowerGraph is the culmination of 4-years of research and development into graph computation, distributed computing, and machine learning. GraphLab PowerGraph scales to graphs with billions of vertices and edges easily, performing orders of magnitude faster than competing systems. GraphLab PowerGraph combines advances in machine learning algorithms, asynchronous distributed graph computation, prioritized scheduling, and graph placement with optimized low-level system design and efficient data-structures to achieve unmatched performance and scalability in challenging machine learning tasks.

Related is GraphChi, a spin-off project separate from the GraphLab PowerGraph project. GraphChi was designed to run very large graph computations on just a single machine, by using a novel algorithm for processing the graph from disk (SSD or hard drive) enabling a single desktop computer (actually a Mac Mini) to tackle problems that previously demanded an entire cluster. For more information, see https://github.com/GraphChi.

License

GraphLab PowerGraph is released under the Apache 2 license.

If you use GraphLab PowerGraph in your research, please cite our paper:

    @inproceedings{Low+al:uai10graphlab,
      title = {GraphLab: A New Parallel Framework for Machine Learning},
      author = {Yucheng Low and
                Joseph Gonzalez and
                Aapo Kyrola and
                Danny Bickson and
                Carlos Guestrin and
                Joseph M. Hellerstein},
      booktitle = {Conference on Uncertainty in Artificial Intelligence (UAI)},
      month = {July},
      year = {2010}
    }

Academic and Conference Papers

Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin (2012). "PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs." Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI '12).

Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin and Joseph M. Hellerstein (2012). "Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud." Proceedings of the VLDB Endowment (PVLDB).

Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein (2010). "GraphLab: A New Parallel Framework for Machine Learning." Conference on Uncertainty in Artificial Intelligence (UAI).

Li, Kevin; Gibson, Charles; Ho, David; Zhou, Qi; Kim, Jason; Buhisi, Omar; Brown, Donald E.; Gerber, Matthew, "Assessment of machine learning algorithms in cloud computing frameworks", Systems and Information Engineering Design Symposium (SIEDS), 2013 IEEE, pp.98,103, 26-26 April 2013

Towards Benchmarking Graph-Processing Platforms. by Yong Guo (Delft University of Technology), Marcin Biczak (Delft University of Technology), Ana Lucia Varbanescu (University of Amsterdam), Alexandru Iosup (Delft University of Technology), Claudio Martella (VU University Amsterdam), Theodore L. Willke (Intel Corporation), in Super Computing 13

Aapo Kyrola, Guy Blelloch, and Carlos Guestrin (2012). "GraphChi: Large-Scale Graph computation on Just a PC." Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI '12).

The Software Stack

The GraphLab PowerGraph project consists of a core API and a collection of high-performance machine learning and data mining toolkits built on top. The API is written in C++ and built on top of standard cluster and cloud technologies. Inter-process communication is accomplished over TCP-IP and MPI is used to launch and manage GraphLab PowerGraph programs. Each process is multithreaded to fully utilize the multicore resources available on modern cluster nodes. It supports reading and writing to both Posix and HDFS filesystems.

GraphLab PowerGraph has a large selection of machine learning methods already implemented (see /toolkits directory in this repo). You can also implement your own algorithms on top of the graph programming API (a certain degree of C++ knowledge is required).

GraphLab PowerGraph Feature Highlights

Unified multicore/distributed API: write once run anywhere
Tuned for performance: optimized C++ execution engine leverages extensive multi-threading and asynchronous IO
Scalable: Run on large cluster deployments by intelligently placing data and computation
HDFS Integration: Access your data directly from HDFS
Powerful Machine Learning Toolkits: Tackle challenging machine learning problems with ease

Building

The current version of GraphLab PowerGraph was tested on Ubuntu Linux 64-bit 10.04, 11.04 (Natty), 12.04 (Pangolin) as well as Mac OS X 10.7 (Lion) and Mac OS X 10.8 (Mountain Lion). It requires a 64-bit operating system.

Dependencies

To simplify installation, GraphLab PowerGraph currently downloads and builds most of its required dependencies using CMake’s External Project feature. This also means the first build could take a long time.

There are however, a few dependencies which must be manually satisfied.

On OS X: g++ (>= 4.2) or clang (>= 3.0) [Required]
- Required for compiling GraphLab.
On Linux: g++ (>= 4.3) or clang (>= 3.0) [Required]
- Required for compiling GraphLab.
*nix build tools: patch, make [Required]
- Should come with most Mac/Linux systems by default. Recent Ubuntu version will require to install the build-essential package.
zlib [Required]
- Comes with most Mac/Linux systems by default. Recent Ubuntu version will require the zlib1g-dev package.
Open MPI or MPICH2 [Strongly Recommended]
- Required for running GraphLab distributed.
JDK 6 or greater [Optional]
- Required for HDFS support

Satisfying Dependencies on Mac OS X

Installing XCode with the command line tools (in XCode 4.3 you have to do this manually in the XCode Preferences -> Download pane), satisfies all of these dependencies.

Satisfying Dependencies on Ubuntu

All the dependencies can be satisfied from the repository:

sudo apt-get update
sudo apt-get install gcc g++ build-essential libopenmpi-dev openmpi-bin default-jdk cmake zlib1g-dev git

Downloading GraphLab PowerGraph

You can download GraphLab PowerGraph directly from the Github Repository. Github also offers a zip download of the repository if you do not have git.

The git command line for cloning the repository is:

git clone https://github.com/graphlab-code/graphlab.git
cd graphlab

Compiling and Running

./configure

In the graphlabapi directory, will create two sub-directories, release/ and debug/ . cd into either of these directories and running make will build the release or the debug versions respectively. Note that this will compile all of GraphLab, including all toolkits. Since some toolkits require additional dependencies (for instance, the Computer Vision toolkit needs OpenCV), this will also download and build all optional dependencies.

We recommend using make’s parallel build feature to accelerate the compilation process. For instance:

make -j4

will perform up to 4 build tasks in parallel. When building in release/ mode, GraphLab does require a large amount of memory to compile with the heaviest toolkit requiring 1GB of RAM.

Alternatively, if you know exactly which toolkit you want to build, cd into the toolkit’s sub-directory and running make, will be significantly faster as it will only download the minimal set of dependencies for that toolkit. For instance:

cd release/toolkits/graph_analytics
make -j4

will build only the Graph Analytics toolkit and will not need to obtain OpenCV, Eigen, etc used by the other toolkits.

Compilation Issues

If you encounter issues please post the following on the GraphLab forum.

detailed description of the problem you are facing
OS and OS version
output of uname -a
hardware of the machine
utput of g++ -v and clang++ -v
contents of graphlab/config.log and graphlab/configure.deps

Writing Your Own Apps

There are two ways to write your own apps.

To work in the GraphLab PowerGraph source tree, (recommended)
Install and link against Graphlab PowerGraph (not recommended)

1: Working in the GraphLab PowerGraph Source Tree

This is the best option if you just want to try using GraphLab PowerGraph quickly. GraphLab PowerGraph uses the CMake build system which enables you to quickly create a C++ project without having to write complicated Makefiles.

Create your own sub-directory in the apps/ directory. for example apps/my_app
Create a CMakeLists.txt in apps/my_app containing the following lines:

project(GraphLab) add_graphlab_executable(my_app [List of cpp files space separated])
Substituting the right values into the square brackets. For instance:

project(GraphLab) add_graphlab_executable(my_app my_app.cpp)
Running "make" in the apps/ directory of any of the build directories should compile your app. If your app does not show up, try running

cd [the GraphLab API directory] touch apps/CMakeLists.txt

2: Installing and Linking Against GraphLab PowerGraph

To install and use GraphLab PowerGraph this way will require your system to completely satisfy all remaining dependencies, which GraphLab PowerGraph normally builds automatically. This path is not extensively tested and is not recommended

You will require the following additional dependencies

libevent (>=2.0.18)
libjson (>=7.6.0)
libboost (>=1.53)
libhdfs (required for HDFS support)
tcmalloc (optional)

Follow the instructions in the [Compiling] section to build the release/ version of the library. Then cd into the release/ build directory and run make install . This will install the following:

include/graphlab.hpp

The primary GraphLab header

include/graphlab/...

The folder containing the headers for the rest of the GraphLab library

lib/libgraphlab.a

The GraphLab static library.

Once you have installed GraphLab PowerGraph you can compile your program by running:

g++ -O3 -pthread -lzookeeper_mt -lzookeeper_st -lboost_context -lz -ltcmalloc -levent -levent_pthreads -ljson -lboost_filesystem -lboost_program_options -lboost_system -lboost_iostreams -lboost_date_time -lhdfs -lgraphlab hello_world.cpp

If you have compiled with MPI support, you will also need

-lmpi -lmpi++

Tutorials

See tutorials

Datasets

The following are data sets links we found useful when getting started with GraphLab PowerGraph.

##Social Graphs

##Collaborative Filtering

##Classification

##Misc

Amazon Web Services public datasets

Release Notes

map_reduce_vertices/edges and transform_vertices/edges are not parallelized on Mac OS X

These operations currently rely on OpenMP for parallelism.

On OS X 10.6 and earlier, gcc 4.2 has several OpenMP bugs and is not stable enough to use reliably.

On OS X 10.7, the clang ++ compiler does not yet support OpenMP.

map_reduce_vertices/edges and transform_vertices/edges use a lot more processors than what was specified in –ncpus

This is related to the question above. While there is a simple temporary solution (omp_set_num_threads), we intend to properly resolve the issue by not using openMP at all.

Unable to launch distributed GraphLab when each machine has multiple network interfaces

The communication initialization currently takes the first non-localhost IP address as the machine’s IP. A more reliable solution will be to use the hostname used by MPI.

powergraph's People

Contributors

Stargazers

Watchers

Forkers

freefrancisco kod3r jerrylam luoq karlnapf ameyavilankar dexter1691 jedisct1 aburan28 shingotaka mlnotes qk168899 kanghaiyang caunion phpdotsql jyizheng zjmonk alphaprime zhpengg waleedcs2000 hankes-garden windwild leelakrishna zxwinner imofftoseethewizard scott-vsi yiiwood cloud-cv ebrevdo leftnoteasy weitqi jltw muenchner jegkiralyfi iglesias kevcheng lab41 gabekron tchajed runrunliuliu viirya jcrabtree yinxusen redsuncmx gxhrid xxling albedium gitor hksonngan kesinger zuiwufenghua hchin zscgit rheiland hdngr tongming changguanghua fototo tempbottle fish444555 ringwraith zjffdu charuch wangdf62 chouqin realstolz invinciblejha lucentcosmos rortian scifipix k0s jingtaow pombredanne wangshaohua caio rusonwong lanwan wangby windeye dahurchalla zeegithub alcuin carlzhangxuan okurisama dappsclub openhero leepek xingin cfregly ummae alienfeel khaledammar irwenqiang neozhangthe1 drewnewell boorad timesofbadri bitisony hughmiao lqvito

powergraph's Issues

NA

GraphLab AMI is not available in the us-east region

The Graphlab AMIs are not available outside the us-west zone. Since some users do not have access to this zone or prefer using other zones, it is beneficial to copy the GraphLab AMI to other zone, e.g. the east zone.

Thanks,
-Khaled

Add a configure option that disables tcmalloc

Since on MAC OS there are sometimes linker problems.

The configure option should do the following:

We should compile fine with clang. That should not be a problem.
The segfault appears to be inside the dynamic library loader, and from the looks of the compile warnings, might be related to tcmalloc. Indeed we have found that sometimes tcmalloc does not work on certain Mac setups, but we have never managed to figure out why.

You can try disabling tcmalloc. Unfortunately, we do not have a configure option to disable tcmalloc, you will need to edit the CMakeLists.txt for that.

Comment / delete the following lines(should be lines 223 to 240)
if(APPLE)
set (tcmalloc_shared "--enable-shared=yes")
else()
set (tcmalloc_shared "--enable-shared=no")
endif()

ExternalProject_Add(libtcmalloc
PREFIX ${GraphLab_SOURCE_DIR}/deps/tcmalloc
URL http://gperftools.googlecode.com/files/gperftools-2.0.tar.gz
URL_MD5 13f6e8961bc6a26749783137995786b6
PATCH_COMMAND patch -N -p0 -i ${GraphLab_SOURCE_DIR}/patches/tcmalloc.patch || true
CONFIGURE_COMMAND <SOURCE_DIR>/configure --enable-frame-pointers --prefix=<INSTALL_DIR> ${tcmalloc_shared}
INSTALL_DIR ${GraphLab_SOURCE_DIR}/deps/local)

link_libraries(tcmalloc)

set(TCMALLOC-FOUND 1)
add_definitions(-DHAS_TCMALLOC)

Look for the following block of code (should be lines 481-499) and delete the two occurences of tcmalloc and libtcalloc

macro(requires_core_deps NAME)
target_link_libraries(${NAME}
${Boost_LIBRARIES}
z
tcmalloc
event event_pthreads
zookeeper_mt
json)
add_dependencies(${NAME} boost libevent libjson zookeeper libtcmalloc)
if(MPI_FOUND)
target_link_libraries(${NAME} ${MPI_LIBRARY} ${MPI_EXTRA_LIBRARY})
endif(MPI_FOUND)
if(HADOOP_FOUND)
target_link_libraries(${NAME} hdfs ${JAVA_JVM_LIBRARY})
add_dependencies(${NAME} hadoop)
endif(HADOOP_FOUND)
endmacro(requires_core_deps)

Problem with SVD singular vectors accuracy

Bug report by Carlos Del Cacho:

Hello Danny,

Thank you for your fast response.

After correcting the issue I told you about read.csv in R, I get the same singular values. However, the accuracy of the first singular value is still not very good, I can't match R in terms of classifier performance after training. Perhaps I am still invoking it wrong.

Here are the first 10 components of the first V vector on the matrix I sent you. My understanding is that values should match, disregarding signs.

GraphLab:

v[1:10,1]
[1] -0.29007777 -0.02705319 -0.03089252 -0.02594989 -0.03459277
[6] -0.03003008 -0.02826690 -0.03112386 -0.02758899 -0.02632362

R Lanczos:

irlba(m,nv=100)$v[1:10,1]
[1] 0.03409986 0.03272688 0.03304065 0.02822084 0.02978243 0.02544790
[7] 0.02730222 0.02760405 0.03618342 0.03369470

R exact SVD

svd(m)$v[1:10,1]
[1] -0.03409986 -0.03272688 -0.03304065 -0.02822084 -0.02978243
[6] -0.02544790 -0.02730222 -0.02760405 -0.03618342 -0.03369470

SLEPc
0.0340999
0.0327269
0.0330406
0.0282208
0.0297824
0.0254479
0.0273022
0.0276041
0.0361834
0.0336947

ncpus does not work if NO_OPENMP is true

In linux,
configure graphlab with using --no_openmp
running graphlab pagerank like this :

mpiexec -n 5 -hostfile ~/machines /home/zork/Dev-pla/graphlab-trace/release/toolkits/graph_analytics/pagerank \
--graph /home/zork/graph/twitter_rv  \
--format snap --iterations=10 --ncpus=10

will only use 1 cpu.

Missing libraries in linking instructions?

I have installed and linked against GraphLab the hello world application working outside GraphLab's source tree (as described in the second point of the "writing your own apps" section in the README).

I found that more libraries than the ones indicated in the README were required to compile successfully. I wonder if this is normal and, in case it is, it could be worth to update these instructions.

The line in the README is:

g++ -pthread -lz -ltcmalloc -levent -levent_pthreads -ljson -lboost_filesystem -lboost_program_options -lboost_system -lboost_iostreams -lboost_date_time -lhdfs -lgraphlab hello_world.cpp

In my case I also had to use the following flags:

-lmpi -lmpi++ -lzookeeper_mt -lzookeeper_st -lboost_context

My machine is running Ubuntu 12.04, Open MPI 1.4.3, configured with --no_jvm, and the rest of the libraries' versions are the ones bundled with cmake.

Thanks in advance!

LDA likelihood printing stops after burn in period when web server is started

It will be great to fix this and have the likelihood estimation continue as long as the LDA procedure is run.

Joey said this may be related to a second engine started.

Collapsed Variational Bayes (CVB0) for Topic Modeling

The current collapsed Gibbs (CGS) sampler algorithm for topic modeling performs well but has the challenge of assessing convergence. It would be interesting to try CVB0 updates instead which would simplify convergence assessment. In addition CVB0 actually fits the GraphLab abstraction slightly better. The challenge will be in reducing the memory footprint of CVB0 which naively requires ntopics * double * ntokens to store the variational approximations (rather than the int * ntokens required for CGS).

Typo in fiber documentation.

In the "Fiber Compatible Remote Requests" section, an example is given as::

... /* elsewhere /
graphlab::remote_future future = fiber_remote_request(1, / call to machine 1 */
add_one,
1);

This should be request_future, not remote_future

Ubuntu 14.04 build issue

There is a build issue with boost on Ubuntu 14.04 g++ and gcc versions where the downloaded boost version mis-communicates with g++ whether or not 64 bit ints are explicitly defined, causing a build error. There is a patch but that patch is not yet published to the downloaded link. This causes a build failure that can not be prevented before hand.

dmcennis/graphlab GraphRAT branch has a script that will patch both versions of the offending header file with the appropriate boost patch, but only works after the build has already failed. Hopefully the upstream releases a new link soon...

Extensions toolkit fails to compile

Its CMakeLists.txt says that it requires C++11, but when the project is configured to be compiled with C++11, this toolkit fails to compile.

"set_ingress_method" may choose incompatible method !

Hi,

The set_ingress_method method in distributed_graph.hpp can choose "PDS" by mistake because function "sharding_constraint::is_pds_compatible(num_shards, p)" does not check if p is a prime number. !

static bool is_pds_compatible(size_t num_shards, int& p) {
p = floor(sqrt(num_shards-1));
return (p>0 && ((p*p+p+1) == (int)num_shards));
}

The source code of is_pds_compatible only checks if the p^2 + p + 1 equation is satisfied with no concerns for the second condition "p should be a prime number". For example, using auto ingress for a cluster of 21 machines should lead to Oblivious but GraphLab suggests PDS and then fail with error:

ERROR: generate_pds.hpp(get_pds:50): Fail to generate pds for p = 4
ERROR: sharding_constraint.hpp(sharding_constraint:96): Check failed: joint_nbr_cache[i][j].size()>0 [0 > 0]

Thanks,
-Khaled

using set --ncpus parameter will produce wrong results on pagerank application

PageRank Performance Regression

There is a performance regression in PageRank between v2.2 and v2.1 which appears in both single machine and distributed machine deployments. The observed slowdown is about 2-3x.

This can be tested with:

mpiexec -f ~/mpd.hosts -n 1 ./pagerank --graph ~/data/nfs/input/soc-LiveJournal.txt --format tsv --ncpus 16 --engine synchronous
mpiexec -f ~/mpd.hosts -n 4 ./pagerank --graph ~/data/nfs/input/soc-LiveJournal.txt --format tsv --ncpus 16 --engine synchronous

Enable/fix likelihood calculation in cgs_lda

The likelihood calculation for the collapsed Gibbs sampler appears to be unstable and so it has been disabled. Here are the correct calculations based on my matlab code:

function llik = eval_llik(counts, n_td, n_wt, alpha, beta)
[ndocs, nvocab] = size(counts);
[ntopics, ~] = size(n_td);

llik_w_given_z = ...
  ntopics * (gammaln(nvocab * beta) - nvocab * gammaln(beta)) + ...
  sum((sum(gammaln(n_wt + beta)) - gammaln( sum(n_wt) + nvocab*beta)));

llik_z = ...
  ndocs * (gammaln(ntopics * alpha) - ntopics * gammaln(alpha)) + ...
  sum(sum(gammaln(n_td + alpha)) - gammaln(sum(n_td) + ntopics * alpha));

llik = llik_w_given_z + llik_z;

end

It would be helpful to have the likelihood calculation re-enabled for diagnostics.

broken download link: opencv 2.4.0

in file {source-root}/CMakeLists.txt
line 313,

There is a line of code for downloading opencv 2.4.0,
URL http://sourceforge.net/projects/opencvlibrary/files/opencv-unix/2.4.0/OpenCV-2.4.0.tar.bz2/download

But it needs to be fixed, because there is no more file in that link.

graph analytics toolkit documentation links are broken

While the direct links to the graph analytics documentation work, links to it are broken in at least two places. First, in http://docs.graphlab.org/ it does not show up in the GraphLab Toolkits foldout. Second, in http://docs.graphlab.org/toolkits.html it refers to a graph_analytics section, but it is plain text and not a link.

c++: error: unrecognized command line option '-fast'

Hi,

I'm compiling graphlab on a Mac, running OS X 10.9.4 Mavericks.

I got the following compilation error:

libjson version: 7.6.0 target: OS: Darwin

c++: error: unrecognized command line option '-fast'
make[3]: *** [Objects_static/internalJSONNode.o] Error 1
make[2]: *** [../deps/json/src/libjson-stamp/libjson-build] Error 2
make[1]: *** [CMakeFiles/libjson.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs.

Any idea what's wrong?

Thanks in advance!

Build fails due to "using 'typename' outside of template"

I'm building from the latest Git version.

The causative error seems to be:

/cluster/home/anovak/build/graphlab/tests/distributed_chandy_misra_test.cpp:202: error: using 'typename' outside of template

The full build log is at:
http://pastebin.com/8MapHzg4

PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs

Hi, GraphLab Experts,

This email is aimed at a first time disclosure of project PowerLyra, which is a new hybrid graph analytics engine based on GraphLab 2.2 (PowerGraph).

As you can see, natural graphs with skewed distribution raise unique challenges to graph computation and partitioning. Existing graph analytics frameworks usually use a “one size fits all” design that uniformly processes all vertices and result in suboptimal performance for natural graphs,
which either suffer from notable load imbalance and high contention for high-degree vertices (e.g., Pregel and GraphLab), or incur high communication cost among vertices even for low-degree vertices (e.g., PowerGraph).

We argued that skewed distribution in natural graphs also calls for differentiated processing of high-degree and low-degree vertices. We then developed PowerLyra, a new graph analytics engine that embraces the best of both worlds of existing frameworks, by dynamically applying different computation and partition strategies for different vertices. PowerLyra uses Pregel/GraphLab like computation models for process low-degree vertices to minimize computation, communication and synchronization overhead, and uses PowerGraph-like computation model for process high-degree vertices to reduce load imbalance and contention. To seamless support all PowerLyra application, PowerLyra further introduces an adaptive unidirectional graph communication.

PowerLyra additionally proposes a new hybrid graph cut algorithm that embraces the best of both worlds in edge-cut and vertex-cut,
which adopts edge-cut for low-degree vertices and vertex-cut for high-degree vertices. Theoretical analysis shows that the expected replication factor of random hybrid-cut is always better than both random vertex-cut and edge-cut. For skewed power-law graph, empirical validation shows that random hybrid-cut also decreases the replication factor of current default heuristic vertex-cut (Grid) from 5.76X to 3.59X and from 18.54X to 6.76X for constant 2.2 and 1.8 of synthetic graph respectively. We also develop a new distributed greedy heuristic hybrid-cut algorithm, namely Ginger, inspired by Fennel (a greedy streaming edge-cut algorithm for a single machine). Compared to Gird vertex-cut, Ginger can reduce the replication factor by up to 2.92X (from 2.03X) and 3.11X (from 1.26X) for synthetic and real-world graphs accordingly.

Finally, PowerLyra adopts locality-conscious data layout optimization in graph ingress phase to mitigate poor locality during vertex communication. we argue that a small increase of graph ingress time (less than 10% for power-law graph and 5% for real-world graph) is more worthwhile for an often larger speedup in execution time (usually more than 10% speedup, specially 21% for Twitter follow graph).

Right now, PowerLyra is implemented as an execution engine and graph partitions of GraphLab, and can seamlessly support all GraphLab applications. A detail evaluation on 48-node cluster using three different graph algorithms (PageRank, Approximate Diameter and Connected Components) show that PowerLyra outperforms current synchronous engine with Grid partition of PowerGraph (Jul. 8, 2013. commit:fc3d6c6)
by up to 5.53X (from 1.97X) and 3.26X (from 1.49X) for real-world (Twitter, UK-2005, Wiki, LiveJournal and WebGoogle) and synthetic (10-million vertex power-law graph ranging from 1.8 to 2.2) graphs accordingly, due to significantly reduced replication factor, less communication cost and improved load balance.

The website of PowerLyra: http://ipads.se.sjtu.edu.cn/projects/powerlyra.html

The latest release has ported to GraphLab 2.2 (Oct. 22, 2013. commit:e8022e6), which aims to provide best compatibility with minimum changes to framework (Perhaps, only add a "type" field to vertex_record.). But this version has no locality-conscious graph layout optimisation now. You can check out the branch from IPADS's gitlab server: git clone http://ipads.se.sjtu.edu.cn:1312/opensource/powerlyra.git

If you are interested in estimating or working with full PowerLyra, you can obtain a snapshot from Sep. 25, 2013, which is based on GraphLab 2.2 (Jul. 8, 2013. commit:fc3d6c6). Snapshot: http://ipads.se.sjtu.edu.cn/projects/powerlyra/powerlyra-snapshot-0.8-32685a.tar.gz (MD5: 32685a65d6edc2e52d791a2cffef1dfa)

If you have interests in trying, you can first refer to the documentation and tutorials from GraphLab.org, which provides step-to-step details on building, configuring and running. Further, you need select our PowerLyra engines and partitions in application running (see "quick start" in website of PowerLyra).

Any comments are welcome!

Rong Chen
Institute of Parallel and Distributed Systems,
Shanghai Jiao Tong University, China
http://ipads.se.sjtu.edu.cn/

missing pmf.cpp

I don't see this in the code, but I do see it in documentation:

http://select.cs.cmu.edu/code/graphlab/pmf.html

variously as PMF and ./pmf

Is there a wrapper function that does cross-validation?

This is for the collaborative_filtering toolkit.

Getting error when loading graph from a binary

I build a small distributed graph and then save it to a binary file with the save_binary call. I then try to load the binary file; however, I get the error

dynamic_local_graph.hpp(finalize:334): Check failed: _csr_storage.num_values()==edges.size() [0 == 8]

I was testing on a really simple graph with 8 edges, which explains why edges.size() is 8. But I am not sure what exactly csr_storage.num_values() is. I am running this on Ubuntu 12.04.

GraphLab doesn't build under Mac OS X 10.9.4 and Java 8. Asks for OpenMP too.

Hi, I encounter problems building GraphLab in Mac OS X 10.9.4 and Java 8. You can see here a file called jni_md.h is missing. I managed to find this file in: /Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/include/darwin/jni_md.h then I created a link in the /Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/include folder and that solved the problem.

Then I continued the compilation and I stacked on this problem, for some reason GraphLab is asking me for OpenMP, but in the configuration time, GraphLab detects that I don't support OpenMP.

What should I do?

Warp system documentation

In the basic tutorial it is mentioned:

All of GraphLab lives in the graphlab namespace. You may use

using namespace graphlab;

if you wish, but we recommend against it.

Then, when one progresses to the warp tutorial, the code is written with graphlab in the namespace. It is a bit confusing for a tutorial. If you wish I can change it and submit a pull request.

Jesús.

error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’

Just checked out 2.2 and got this error building apps/twitterrank/CMakeFiles/twitterrank.dir/twitterrank.cpp.o

[ 98%] Building CXX object apps/twitterrank/CMakeFiles/twitterrank.dir/twitterrank.cpp.o
/usr/local/projects/graphlab2/graphlab/apps/cascades/cascades.cpp: In function ‘int main(int, char**)’:
/usr/local/projects/graphlab2/graphlab/apps/cascades/cascades.cpp:239:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
In file included from /usr/local/projects/graphlab2/graphlab/apps/ldademo/ldademo.cpp:24:0:
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::set_param_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:150:39: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:151:11: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:158:39: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:159:11: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::add_topic_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:218:39: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:219:11: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::lock_word_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:237:39: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:238:11: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:260:41: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:261:13: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
In file included from /usr/local/projects/graphlab2/graphlab/apps/twitterrank/twitterrank.cpp:27:0:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/pagerank.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > pagerank::weight_update_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/pagerank.hpp:97:33: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/pagerank.hpp:98:5: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp: In member function ‘virtual void lda::cgs_lda_vertex_program::scatter(graphlab::ivertex_program<graphlab::distributed_graph<lda::vertex_data, lda::edge_data>, lda::gather_type>::icontext_type&, const vertex_type&, graphlab::ivertex_program<graphlab::distributed_graph<lda::vertex_data, lda::edge_data>, lda::gather_type>::edge_type&) const’:
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:926:65: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
/usr/local/projects/graphlab2/graphlab/apps/ldademo/cgs_lda.hpp:940:80: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
In file included from /usr/local/projects/graphlab2/graphlab/apps/twitterrank/twitterrank.cpp:28:0:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::set_param_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:141:33: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:142:5: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:149:33: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:150:5: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::add_topic_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:209:33: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:210:5: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp: In function ‘std::pair<std::basic_string<char>, std::basic_string<char> > lda::lock_word_callback(std::map<std::basic_string<char>, std::basic_string<char> >&)’:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:228:33: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:229:5: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:251:35: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:252:7: error: ‘get_last_dc’ is not a member of ‘graphlab::dc_impl’
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp: In member function ‘virtual void lda::cgs_lda_vertex_program::scatter(graphlab::ivertex_program<graphlab::distributed_graph<lda::vertex_data, lda::edge_data>, lda::gather_type>::icontext_type&, const vertex_type&, graphlab::ivertex_program<graphlab::distributed_graph<lda::vertex_data, lda::edge_data>, lda::gather_type>::edge_type&) const’:
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:938:61: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
/usr/local/projects/graphlab2/graphlab/apps/twitterrank/cgs_lda.hpp:952:76: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-unused-local-typedefs" [enabled by default]
make[2]: *** [apps/ldademo/CMakeFiles/ldademo.dir/ldademo.cpp.o] Error 1
make[1]: *** [apps/ldademo/CMakeFiles/ldademo.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-unused-local-typedefs" [enabled by default]
make[2]: *** [apps/twitterrank/CMakeFiles/twitterrank.dir/twitterrank.cpp.o] Error 1
make[1]: *** [apps/twitterrank/CMakeFiles/twitterrank.dir/all] Error 2
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-unused-local-typedefs" [enabled by default]
Linking CXX executable label_propagation
[ 98%] Built target label_propagation
Linking CXX executable cascades
[ 98%] Built target cascades
make: *** [all] Error 2

Pagerank fails for a 2.2GB edges graph

With the following assertion:

�[1;32mINFO: mpi_tools.hpp(init:63): MPI Support was not compiled.
�[0m�[1;32mINFO: dc.cpp(init:573): Cluster of 1 instances created.
�[0m�[1;32mINFO: distributed_graph.hpp(set_ingress_method:3200): Automatically determine ingress method: grid
�[0mLoading graph in format: tsv
�[1;32mINFO: distributed_graph.hpp(load_from_posixfs:2189): Loading graph from file: ./num.tsv.dir
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 7742363 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 16209633 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 24663743 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 33109915 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 41622369 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 50180020 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 58714497 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 67235905 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 75748042 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 84392622 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 93208590 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 102012952 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 110826650 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 119658156 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 128472230 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 137280717 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 146094500 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 154910147 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 163723149 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 172539421 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 181351288 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 190168561 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 198990999 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 207805908 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 216626537 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 225440047 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 234260632 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 243079690 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 251896272 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 260715268 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 269534298 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 278359869 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 287182890 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 296001737 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 304842040 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 313651197 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 322474643 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 331295202 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 340116244 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 348929665 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 357755904 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 366574989 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 375399991 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 384226069 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 393048171 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 401870458 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 410693577 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 419512178 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 428337204 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 437171549 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 445988794 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 454808795 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 463628636 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 472459330 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 481298054 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 490132381 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 498990129 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 507825113 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 516664671 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 525504724 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 534346572 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 543195658 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 552056628 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 560944575 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 569846024 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 578744237 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 587662485 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 596600010 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 605535397 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 614473133 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 623416720 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 632353813 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 641290492 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 650254745 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 659188807 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 668120336 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 677050109 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 685983890 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 694921006 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 703848569 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 712774120 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 721702555 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 730637252 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 739566873 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 748506682 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 757441949 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 766377787 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 775309579 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 784239514 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 793176119 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 802106050 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 811042290 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 819973553 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 828904416 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 837836405 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 846770955 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 855705655 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 864636268 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 873569900 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 882497260 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 891423234 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 900353789 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 909285453 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 918221222 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 927153552 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 936084909 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 945023041 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 953955331 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 962880557 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 971822683 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 980762735 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 989701356 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 998638058 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1007571841 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1016503632 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1025434161 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1034357865 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1043284260 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1052216554 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1061150818 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1070093370 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1079030960 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1087964250 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1096899017 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1105829571 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1114760858 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1123691717 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1132620071 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1141551289 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1150485221 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1159417332 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1168349286 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1177285845 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1186219304 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1195152515 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1204086390 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1212958116 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1221792883 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1230619322 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1239440058 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1248269899 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1257097643 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1265926422 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1274748909 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1283568557 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1292405566 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1301225454 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1310058961 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1318884629 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1327708129 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1336529763 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1345349879 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1354170943 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1362994900 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1371818724 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1380640333 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1389469976 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1398290797 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1407113667 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1415934972 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1424758011 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1433578354 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1442403914 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1451231058 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1460054064 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1468873786 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1477697844 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1486532839 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1495361377 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1504192137 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1513034293 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1521861287 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1530687415 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1539515468 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1548335736 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1557160425 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1565977089 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1574799273 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1583620742 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1592438707 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1601259869 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1610090391 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1618914501 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1627746146 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1636572447 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1645396231 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1654217507 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1663042112 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1671872988 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1680701068 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1689526380 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1698351460 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1707171089 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1715993686 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1724818081 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1733640849 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1742465077 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1751296186 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1760115641 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1769046683 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1777874659 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1786708694 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1795525308 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1804345547 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1813161834 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1821985622 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1830797844 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1839621340 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1848441371 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1857261397 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1866077098 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1874896624 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1883719445 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1892539324 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1901359667 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1910182525 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1918995750 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1927814828 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1936633558 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1945446827 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1954266459 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1963086098 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1971896766 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1980708534 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1989520913 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 1998350180 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2007168029 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2015982229 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2024798129 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2033617483 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2042440955 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2051253171 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2060068909 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2068885894 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2077713225 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2086524672 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2095344367 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2104171316 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2112982881 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2121874632 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2130792405 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2139711871 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2148642740 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2157576857 Lines read
�[0mINFO: distributed_graph.hpp(load_from_stream:3236): 2166497521 Lines read
�[0mINFO: distributed_graph.hpp(finalize:702): Distributed graph: enter finalize
�[0m�[1;32mINFO: distributed_ingress_base.hpp(finalize:185): Finalizing Graph...
�[0mINFO: memory_info.cpp(log_usage:90): Memory Info: Post Flush
Heap: 51676 MB
Allocated: 51153.8 MB
�[0mINFO: distributed_ingress_base.hpp(finalize:232): Graph Finalize: constructing local graph
�[0mtcmalloc: large alloc 17349451776 bytes == 0xca5502000 @ 0x6826df 0x50e200
tcmalloc: large alloc 17349451776 bytes == 0x10b072c000 @ 0x6826df 0x50e200
tcmalloc: large alloc 1610620928 bytes == 0x151bb2a000 @ 0x6826df 0x51c90d
tcmalloc: large alloc 3221233664 bytes == 0x157bcac000 @ 0x6826df 0x51c90d
INFO: memory_info.cpp(log_usage:90): Memory Info: Finished populating local graph.
Heap: 90912 MB
Allocated: 36696.1 MB
�[0mINFO: distributed_ingress_base.hpp(finalize:277): Graph Finalize: finalizing local graph.
�[0mtcmalloc: large alloc 17349451776 bytes == 0x163bfae000 @ 0x6826df 0x50b268
tcmalloc: large alloc 8674729984 bytes == 0x1a47194000 @ 0x6826df 0x51fc4d
tcmalloc: large alloc 17349451776 bytes == 0x1c4ca8a000 @ 0x6826df 0x50b268
tcmalloc: large alloc 17349451776 bytes == 0x2057c74000 @ 0x6826df 0x50b268
tcmalloc: large alloc 34698895360 bytes == 0x2462e5a000 @ 0x6826df 0x5063ea
tcmalloc: large alloc 34698895360 bytes == 0x2c79224000 @ 0x6826df 0x5063ea
�[1;31mERROR: dynamic_csr_storage.hpp(wrap:81): Check failed: valueptr_vec[i]<value_vec.size() [18446744071562067971 < 2168680615]
�[0m

issue on parallel ingress using obvious heuristics

It seems that thers is no general thread-level parallel support for this:

file: ./graphlab-master/src/graphlab/graph/distributed_graph.hpp
1902 #ifdef _OPENMP
1903 #pragma omp parallel for
1904 #endif
1905 for(size_t i = 0; i < graph_files.size(); ++i) {
1906 if ((parallel_ingress && (i % rpc.numprocs() == rpc.procid()))

When loading from multiple files by multiple processes on 'obvious mode', openmp doesn't work correctly here.

printlock.lock() mutex assertion when using USE_TRACEPOINT performance monitoring

I am getting the following error:
dc_call_dispatch: dc: time spent issuing RPC calls
Events: 1262
Total: 671.31 ms
Mean: 0.531941 ms
Min: 0.0446786 ms
Max: 0.797284 ms
dc_receive_multiplexing: dc: time spent exploding a chunk
Events: 0
Total: 0 ms
[Thread 0x7fffefe9f700 (LWP 10122) exited]
[Thread 0x7fffeee9d700 (LWP 10124) exited]
[Thread 0x7fffee69c700 (LWP 10125) exited]
[Thread 0x7fffef69e700 (LWP 10123) exited]
ERROR: mutex.hpp(lock:69): Check failed: !error

Program received signal SIGABRT, Aborted.
0x00007ffff5965425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) where
#0 0x00007ffff5965425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff5968b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x000000000076b7bf in graphlab::mutex::lock (this=0xd22f00) at /home/ubuntu/graphlab/src/graphlab/parallel/mutex.hpp:69
#3 0x00000000009043ad in graphlab::trace_count::~trace_count (this=0xd205a0, __in_chrg=) at /home/ubuntu/graphlab/src/graphlab/util/tracepoint.cpp:65
#4 0x00007ffff596a901 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007ffff596a985 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x00007ffff5950774 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x000000000075b4c9 in _start ()

(gdb) f 3
#3 0x00000000009043ad in graphlab::trace_count::~trace_count (this=0xd205a0, __in_chrg=) at /home/ubuntu/graphlab/src/graphlab/util/tracepoint.cpp:65

65 printlock.lock();

It seems that the printlock is already locked - or the mutex is in error state from some other reason.

The way to reproduce this error is to enable USE_TRACEPOINT
and run
./svd smallnetflix/ --rows=95527 --cos=3562 --nv=6 --nsv=2 --max_iter=2=
where the folder smallnetflix includes the smallnetflix_mm.train input file.

Thanks!

ec2 tutorial demo is broken

Yue Zhao reported the following error:

./gl-ec2 -i /.ssh/amazonec2.pem -z us-east-1a -s 1 launch launchtest
Setting up security groups...
Checking for running cluster...
GraphLab AMI for Standard Instances: ami-108d1c79
Launching instances...
Launched slaves, regid = r-2ff0b74b
Launched master, regid = r-57f0b733
Waiting for instances to start up...
Waiting 120 more seconds...
Copying SSH key /Users/bickson/.ssh/amazonec2.pem to master...
Copy hostfile to master...
Searching for existing cluster launchtest...
Found 1 master(s), 1 slaves, 0 ZooKeeper nodes
lost connection
Traceback (most recent call last):
File "./gl_ec2.py", line 700, in
main()
File "./gl_ec2.py", line 508, in main
setup_cluster(conn, master_nodes, slave_nodes, zoo_nodes, opts, cluster_name, True)
File "./gl_ec2.py", line 369, in setup_cluster
scp(master, opts, "machines", '/machines')
File "./gl_ec2.py", line 482, in scp
(opts.identity_file, local_file, host, dest_file), shell=True)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 511, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'scp -q -o StrictHostKeyChecking=no -i /Users/bickson/.ssh/amazonec2.pem 'machines' '[email protected]:~/machines'' returned non-zero exit status 1

Structure of the build environment is not clear in tutorial.

In the graphlab tutorial at http://docs.graphlab.org/using_graphlab.html, the build environment is not explained at all. It should at least contain or reference the information in the main readme file about using CMake and the release / debug directories.

Allow manually disabling MPI

For some platforms and many use cases it may be desirable to run GraphLab as a single machine multicore platform. In these situations incompatibility with existing MPI installations (or bugs in existing MPI installations) could lead to issues.

Strange line in local_graph_test.cxx

Line 545 doesn't seem to make sense:

 vertex_id_type dst = (dst + HASH_OFFSET)  % nverts;

given dst has not been defined yet.

can't run in machines which have more than 64 cores

The program written with graphlab apis can't run in machines which have more than 64 cores. The error message is :
ERROR: fiber_control.cpp(launch:226): Check failed: affinity.popcount() > 0 [0 > 0]

In fiber_control.hpp, I find a definition "typedef fixed_dense_bitset<64> affinity_type". I guess this error is result from this fixed length. Maybe you can change 64 to a larger number or make some other modifications.

how to generate my own movielens dataset

According to the Index of /code/graphlab/datasets,
URL:http://www.select.cs.cmu.edu/code/graphlab/datasets/,there is movielens_mme dataset,
but Now I need movielens dataset with timestamp
how can i generate it?
could anybody help me?

GraphLab 2.2 requires OpenMPI 1.7.2 or greater

In both the tutorial:

http://graphlab.org/tutorials-2/graphlab-cluster-deployment-quick-start/

and the configure checks there is no mention of the fact that GraphLAB 2.2 seems to assume OpenMPI version 1.7.2 and not the actual OpenMPI official stable release of 1.6.2. This is apparently due to the event library conflicts discussed here:

https://groups.google.com/d/msg/graphlabapi/2K06PWBZUYk/EjhlQXJ2pNgJ

Quoted here:

"I suspected as much. We have an annoying compatibility problem with
OpenMPI 1.5 and 1.6 due to a bug in the OpenMPI build.
OpenMPI uses some parts of libevent, and the OpenMPI library incorrectly
exports some functions from libevent. This conflicts with the libevent we
use in our code.

Unfortunately I can't think of a simple solution to this problem...
Does anyone else have any ideas?

If you control the server/cluster, a simple workaround will be to install
OpenMPI 1.7 which fixes the bug, or MPICH2.

This should be fixed in the tutorial by clearly specifying which version is required and it should also be fixed in the configure script by having it check the OpenMPI version. Unfortunately I can't figure out where all of the M4 autoconf scripts are to fix configure or else I'd do it and send the pull request in.

Pure Virtual Function Called

I have a rather simple vertex program which runs completely fine with the synchronous engine. However, it crashes with the error "Pure Virtual Function Called" if I use the asynchronous engine. Maybe I have somehow missed that there is an additional method to be implemented when using the latter engine?

The code of the vertex program can be found here https://github.com/iglesias/graphlab-benchmark/blob/master/benchmark.cpp#L125

Here is GDB's backtrace https://gist.github.com/iglesias/7378890.

No meaning code in / toolkits / graph_analytics / pagerank.cpp

/* The scatter edges depend on whether the pagerank has converged */
edge_dir_type scatter_edges(icontext_type& context,
const vertex_type& vertex) const {
// If an iteration counter is set then
if (ITERATIONS) return graphlab::NO_EDGES;
// In the dynamic case we run scatter on out edges if the we need
// to maintain the delta cache or the tolerance is above bound.
if(USE_DELTA_CACHE || std::fabs(last_change) > TOLERANCE ) {
return graphlab::OUT_EDGES;
} else {
return graphlab::NO_EDGES;
}
}

/* The scatter function just signal adjacent pages */
void scatter(icontext_type& context, const vertex_type& vertex,
edge_type& edge) const {
if(USE_DELTA_CACHE) {
context.post_delta(edge.target(), last_change);
}

if(last_change > TOLERANCE || last_change < -TOLERANCE) {
    context.signal(edge.target());
} else {
  context.signal(edge.target()); //, std::fabs(last_change));
}

}

The if/else code in scatter() is duplicated.

Perhaps, it should be like following or anything else.

void scatter(icontext_type& context, const vertex_type& vertex,
edge_type& edge) const {
if(USE_DELTA_CACHE) {
context.post_delta(edge.target(), last_change);
if(last_change > TOLERANCE || last_change < -TOLERANCE)
context.signal(edge.target());
} else {
context.signal(edge.target()); //, std::fabs(last_change));
}
}

Compile error on Ubuntu 12.04 32bit

I am getting the following error:
make[1]: 正在进入目录 /usr/local/graphlab/release/CMakeFiles/CMakeTmp' /usr/bin/cmake -E cmake_progress_report /usr/local/graphlab/release/CMakeFiles/CMakeTmp/CMakeFiles 1 Building CXX object CMakeFiles/cmTryCompileExec.dir/src.cxx.o /usr/bin/c++ -DHAS_CRC32 -O3 -Wno-unused-local-typedefs -Wno-attributes -march=native -mtune=native -Wall -g -fopenmp -o CMakeFiles/cmTryCompileExec.dir/src.cxx.o -c /usr/local/graphlab/release/CMakeFiles/CMakeTmp/src.cxx /usr/local/graphlab/release/CMakeFiles/CMakeTmp/src.cxx: 在函数‘int main(int, char**)’中: /usr/local/graphlab/release/CMakeFiles/CMakeTmp/src.cxx:1:68: 错误： ‘__builtin_ia32_crc32di’在此作用域中尚未声明在全局域： cc1plus: 警告：无法识别的命令行选项“-Wno-unused-local-typedefs” [默认启用] make[1]: *** [CMakeFiles/cmTryCompileExec.dir/src.cxx.o] 错误 1 make[1]:正在离开目录/usr/local/graphlab/release/CMakeFiles/CMakeTmp'
make: *** [cmTryCompileExec/fast] 错误 2

uname -a
Linux ray-pc 3.2.0-57-generic-pae #87-Ubuntu SMP Tue Nov 12 21:57:43 UTC 2013 i686 i686 i386 GNU/Linux

g++ -v
使用内建 specs。
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-linux-gnu/4.6/lto-wrapper
目标：i686-linux-gnu
配置为：../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --enable-targets=all --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=i686-linux-gnu --host=i686-linux-gnu --target=i686-linux-gnu
线程模型：posix
gcc 版本 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)

Tutorial needs to be clear that v2.2 needs to be used, or brought up-to-date with graphlab-code.

The current tutorial doesn't work off of the master branch of the github repository (found trying to coach a friend through the installation and starting examples.

Compile error on OSX 10.9

Hi,

I'm getting this error while compiling graphlab (master) on OSX 10.9:

In file included from /Users/hstm/Development/src/graphlab/deps/local/include/opencv2/contrib/retina.hpp:76:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/c++/v1/valarray:4257:55: error:
'value_type' is a private member of
'graphlab::discrete_domain<4>::ConstIterator'
__val_expr<_BinaryOp<not_equal_to

Thanks,

Helge

graphlab hits LLVM bug on older 64-bit Mac

What I suspect is the root cause: http://llvm.org/bugs/show_bug.cgi?id=14947

I have a full email written out that I sent to [email protected], but it bounced. Is the list broken, or is the email address on the download page bad?

I transcribe the email below.

Hey folks,

See the information you requested below.  (P.S. I sit at UW Allen Center in CSE 434 if you happen to be around and want to debug in person.)

Thanks,
Dan

1.
OS X, latest XCode installed, use macports as package manager. ./configure worked fine. Then I went into the graph_analytics subdirectory and ran make -j3 and got the following error: (The output is actually from make since I wanted to turn of parallelism for the clearest possible output.)
[ 87%] Building CXX object toolkits/graph_analytics/CMakeFiles/approximate_diameter.dir/approximate_diameter.cpp.o
fatal error: error in backend: Cannot select: intrinsic
      %llvm.x86.sse42.crc32.64.64
make[2]: *** [toolkits/graph_analytics/CMakeFiles/approximate_diameter.dir/approximate_diameter.cpp.o] Error 1
make[1]: *** [toolkits/graph_analytics/CMakeFiles/approximate_diameter.dir/all] Error 2
make: *** [all] Error 2

2. Mac OS X 10.8.4

3. % uname -a
Darwin dhm.dyn.cs.washington.edu 12.4.0 Darwin Kernel Version 12.4.0: Wed May  1 17:57:12 PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64

4.
MacBook Pro 13-inch, Mid 2010. Processor: 2.66 GHz Intel Core 2 Duo / Memory 8 GB 1067 MHz DDR3 / Graphics NVIDIA GeForce 320M 256 MB

5. % g++ -v
Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2336.11~28/src/configure --disable-checking --enable-werror -prefix=/Applications/Xcode.app/Contents/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2336.11~28/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)

% clang++ -v
Apple clang version 4.0 (tags/Apple/clang-421.0.60) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix


6. graphlabapi/config.log and graphlabapi/configure.deps are attached.

7. Git Log:

commit c9b5637729f7ee7d8dc9c3ac2bb68c127b43704e
Merge: 6d8ac8a bba8592
Author: Yucheng Low <[email protected]>
Date:   Thu Aug 8 09:59:24 2013 -0700

    Merge pull request #11 from ylow/master

    Removed old research-experimental-legacy-stuff from apps and ext-apis.

Running kmeans executable in distributed Environment

Graphab offers a kmeans executable to perform clustering, I've tried that on a single node and it works perfectly. My question is, how can I do that in a distributed environment?
I've created two virtual machine, they are on the same network and the ipaddresd of each one is reachable from the other (I've test using ping), each machine has the kmeans executuable compiled from the graphlab source.
I've see in the official documentation that the command for running the kmeans distributed is :

mpiexec -n [N machines] --hostfile [host file] ./kmeans ....

How a hostfile should be??
Someone has ever ran kmeans using mpi?

Thanks in advance for the help.

Remove unnecessary null pointer checks

An extra null pointer check is not needed in functions like the following.

zookeeper download url is bad

http://apache.cs.utah.edu/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz

This is the download url for zookeeper in graphlab/CMakeLists.txt, line 383. It results in a 404. Evidently the mirror no longer hosts this version of zookeeper, although they do host the next version, 3.4.6.

This error halts the general make.

Interop GraphLab with ScalaNLP, AlgeBird, Spire, Scala Notebook and Saddle

Hi,

Is it possible to contact the respective teams to have interop and eliminate duplication with a view to eliminate fragmentation.

Multiple independent projects with wast effort in duplication and also fragment the community.

Suminda

User Guide's local deployment example getting error

Using GraphLab V1.0.1, I'm following the User Guide's local deployment example at: http://graphlab.com/learn/userguide/index.html#Deployment

After the job is finished, check execution.log and see this error in there:
...
[INFO] Task completed: train
[INFO] Task started: recommend
[ERROR] Exception raised from task: 'recommend' code: 'model'
Exception: ("Unable to complete task successfully, Exception raised, trace: Traceback (most recent call last):\nKeyError: 'model'\n", KeyError('model',))
[INFO] Stopping the server connection.
[INFO] GraphLab server shutdown

User Guide's EC2 deployment example getting error

Using GraphLab V1.0.1, I'm following the User Guide's EC2 deployment example at: http://graphlab.com/learn/userguide/index.html#Deployment

Getting the error because 'conn' is None:

[INFO] Preparing using environment: ec2
[INFO] Beginning Job Validation.

[INFO] Validation complete. Job: 'ec2-exec' ready for execution

AttributeError Traceback (most recent call last)
in ()
1 # spin up an EC2 instance to run this work
2 job_ec2 = gl.deploy.job.create(tasks_with_bindings, name='ec2-exec',
----> 3 environment=ec2)

/Users/ttam/anaconda/lib/python2.7/site-packages/graphlab/deploy/job.pyc in create(tasks, name, environment, function, function_arguments, required_packages)
338
339 LOGGER.info("Validation complete. Job: '%s' ready for execution" % name)
--> 340 job = env.run(_session, cloned_artifacts, name, environment)
341 _session.register(job)
342 job.save() # save the job once prior to returning.

/Users/ttam/anaconda/lib/python2.7/site-packages/graphlab/deploy/_executionenvironment.pyc in run(self, session, tasks, name, environment)
66 """
67 job = _job.Job(name, tasks=tasks, environment=environment)
---> 68 return self.run_job(job, session)
69
70

/Users/ttam/anaconda/lib/python2.7/site-packages/graphlab/deploy/_executionenvironment.pyc in run_job(self, job, session)
299 job._serialize(serialized_job_file_path)
300
--> 301 commander = Ec2ExecutionEnvironment._start_commander_host(job.environment, credentials)
302 post_url = "http://%s:9004/submit" % commander.public_dns_name
303 LOGGER.debug("Sending %s to %s" % (serialized_job_file_path, post_url))

/Users/ttam/anaconda/lib/python2.7/site-packages/graphlab/deploy/_executionenvironment.pyc in _start_commander_host(environment, credentials)
266 security_group_name = environment.security_group,
267 tags = environment.tags, user_data = user_data,
--> 268 credentials = credentials)
269 return commander
270

/Users/ttam/anaconda/lib/python2.7/site-packages/graphlab/connect/aws/_ec2.pyc in _ec2_factory(instance_type, region, CIDR_rule, security_group_name, tags, user_data, credentials, ami_service_parameters, num_hosts)
419 # Does the security group already exist?
420 security_group = None
--> 421 for sg in conn.get_all_security_groups():
422 if(security_group_name == sg.name):
423 security_group = sg

AttributeError: 'NoneType' object has no attribute 'get_all_security_groups'

GraphLab does not run when there is only 1 core

ERROR: fiber_control.cpp(launch:229): Check failed b<nworkers [1 < 1]

gl_ec2.py script fails silently when private key permissions are not 400

error message is:

ubuntu@ip-10-236-158-207:~/graphlab/scripts/ec2$ ./gl-ec2 -i ~/yxzhao02.pem -k yxzhao02 -s 1 launch launchtest
Setting up security groups...
Checking for running cluster...
GraphLab AMI for Standard Instances: ami-108d1c79
Launching instances...
Launched slaves, regid = r-77bf7313
Launched master, regid = r-ee76598c
Waiting for instances to start up...
Waiting 120 more seconds...
Copying SSH key /home/ubuntu/yxzhao02.pem to master...
Copy hostfile to master...

Searching for existing cluster launchtest...
Found 3 master(s), 6 slaves, 0 ZooKeeper nodes
lost connection
Traceback (most recent call last):
File "./gl_ec2.py", line 700, in
main()
File "./gl_ec2.py", line 508, in main
setup_cluster(conn, master_nodes, slave_nodes, zoo_nodes, opts, cluster_name, True)
File "./gl_ec2.py", line 369, in setup_cluster
scp(master, opts, "machines", '~/machines')
File "./gl_ec2.py", line 482, in scp
(opts.identity_file, local_file, host, dest_file), shell=True)
File "/usr/lib/python2.7/subprocess.py", line 511, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'scp -q -o StrictHostKeyChecking=no -i /hom

Excessive calls to graph finalize()

Graph finalization needs a "fast path" to avoid performing the full finalization communication path even when there are no vertices/edges added. This will improve performance a lot for large distributed graphs when there are repeated calls to engine start(), or save() functions.