ldbc / ldbc_snb_datagen_hadoop Goto Github PK
View Code? Open in Web Editor NEWThe Hadoop-based variant of the SNB Datagen
Home Page: https://ldbcouncil.org/benchmarks/snb
License: Apache License 2.0
The Hadoop-based variant of the SNB Datagen
Home Page: https://ldbcouncil.org/benchmarks/snb
License: Apache License 2.0
The files still have the GPLv3 license headers, which are deprecated.
hi, the following error occurs when running run.sh:
Downloading from simulation: http://www.simulation.tudelft.nl/maven/org/apache/mahout/mahout-collections/1.0/mahout-collections-1.0.jar
Downloading from simulation: http://www.simulation.tudelft.nl/maven/com/github/rwl/optimization/1.3/optimization-1.3.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13:52 min
[INFO] Finished at: 2024-01-16T19:25:30+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project ldbc_snb_datagen: Could not resolve dependencies for project ldbc.snb.datagen:ldbc_snb_datagen:jar:1.0.0: The following artifacts could not be resolved: junit:junit:jar:4.12, org.hamcrest:hamcrest-core:jar:1.3, xerces:xercesImpl:jar:2.9.1, xml-apis:xml-apis:jar:1.3.04, xalan:xalan:jar:2.7.1, xalan:serializer:jar:2.7.1, org.jdom:jdom:jar:1.1.3, org.apache.hadoop:hadoop-client:jar:3.2.1, org.apache.hadoop:hadoop-common:jar:3.2.1, com.google.guava:guava:jar:27.0-jre, com.google.guava:failureaccess:jar:1.0, com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava, org.checkerframework:checker-qual:jar:2.5.2, com.google.errorprone:error_prone_annotations:jar:2.2.0, com.google.j2objc:j2objc-annotations:jar:1.1, org.codehaus.mojo:animal-sniffer-annotations:jar:1.17, commons-cli:commons-cli:jar:1.2, org.apache.httpcomponents:httpclient:jar:4.5.6, org.apache.httpcomponents:httpcore:jar:4.4.10, commons-codec:commons-codec:jar:1.11, commons-io:commons-io:jar:2.5, commons-net:commons-net:jar:3.6, commons-collections:commons-collections:jar:3.2.2, org.eclipse.jetty:jetty-servlet:jar:9.3.24.v20180605, org.eclipse.jetty:jetty-security:jar:9.3.24.v20180605, org.eclipse.jetty:jetty-webapp:jar:9.3.24.v20180605, org.eclipse.jetty:jetty-xml:jar:9.3.24.v20180605, javax.servlet.jsp:jsp-api:jar:2.1, com.sun.jersey:jersey-servlet:jar:1.19, commons-logging:commons-logging:jar:1.1.3, log4j:log4j:jar:1.2.17, commons-beanutils:commons-beanutils:jar:1.9.3, org.apache.commons:commons-configuration2:jar:2.1.1, org.apache.commons:commons-lang3:jar:3.7, org.apache.commons:commons-text:jar:1.4, org.slf4j:slf4j-api:jar:1.7.25, org.apache.avro:avro:jar:1.7.7, org.codehaus.jackson:jackson-core-asl:jar:1.9.13, org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13, com.thoughtworks.paranamer:paranamer:jar:2.3, org.xerial.snappy:snappy-java:jar:1.0.5, com.google.re2j:re2j:jar:1.1, com.google.protobuf:protobuf-java:jar:2.5.0, org.apache.hadoop:hadoop-auth:jar:3.2.1, com.nimbusds:nimbus-jose-jwt:jar:4.41.1, com.github.stephenc.jcip:jcip-annotations:jar:1.0-1, net.minidev:json-smart:jar:2.3, net.minidev:accessors-smart:jar:1.2, org.apache.curator:curator-framework:jar:2.13.0, org.apache.curator:curator-client:jar:2.13.0, org.apache.curator:curator-recipes:jar:2.13.0, com.google.code.findbugs:jsr305:jar:3.0.0, org.apache.htrace:htrace-core4:jar:4.1.0-incubating, org.apache.commons:commons-compress:jar:1.18, org.apache.kerby:kerb-simplekdc:jar:1.0.1, org.apache.kerby:kerb-client:jar:1.0.1, org.apache.kerby:kerby-config:jar:1.0.1, org.apache.kerby:kerb-core:jar:1.0.1, org.apache.kerby:kerby-pkix:jar:1.0.1, org.apache.kerby:kerby-asn1:jar:1.0.1, org.apache.kerby:kerby-util:jar:1.0.1, org.apache.kerby:kerb-common:jar:1.0.1, org.apache.kerby:kerb-crypto:jar:1.0.1, org.apache.kerby:kerb-util:jar:1.0.1, org.apache.kerby:token-provider:jar:1.0.1, org.apache.kerby:kerb-admin:jar:1.0.1, org.apache.kerby:kerb-server:jar:1.0.1, org.apache.kerby:kerb-identity:jar:1.0.1, org.apache.kerby:kerby-xdr:jar:1.0.1, com.fasterxml.jackson.core:jackson-databind:jar:2.9.8, com.fasterxml.jackson.core:jackson-core:jar:2.9.8, org.codehaus.woodstox:stax2-api:jar:3.1.4, com.fasterxml.woodstox:woodstox-core:jar:5.0.3, dnsjava:dnsjava:jar:2.1.7, org.apache.hadoop:hadoop-hdfs-client:jar:3.2.1, com.squareup.okhttp:okhttp:jar:2.7.5, com.squareup.okio:okio:jar:1.6.0, com.fasterxml.jackson.core:jackson-annotations:jar:2.9.8, org.apache.hadoop:hadoop-yarn-api:jar:3.2.1, javax.xml.bind:jaxb-api:jar:2.2.11, org.apache.hadoop:hadoop-yarn-client:jar:3.2.1, org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.2.1, org.apache.hadoop:hadoop-yarn-common:jar:3.2.1, javax.servlet:javax.servlet-api:jar:3.1.0, org.eclipse.jetty:jetty-util:jar:9.3.24.v20180605, com.sun.jersey:jersey-core:jar:1.19, javax.ws.rs:jsr311-api:jar:1.1.1, com.sun.jersey:jersey-client:jar:1.19, com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.9.8, com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.9.8, com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.9.8, org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:3.2.1, org.apache.hadoop:hadoop-mapreduce-client-common:jar:3.2.1, org.apache.hadoop:hadoop-annotations:jar:3.2.1, ca.umontreal.iro:ssj:jar:2.5, jfree:jfreechart:jar:1.0.13, org.apache.mahout:mahout-collections:jar:1.0, jfree:jcommon:jar:1.0.16, colt:colt:jar:1.2.0, concurrent:concurrent:jar:1.3.4, com.github.rwl:optimization:jar:1.3, com.google.code.gson:gson:jar:2.2.4, org.codehaus.groovy:groovy:jar:2.1.6, antlr:antlr:jar:2.7.7, org.ow2.asm:asm-tree:jar:4.0, org.ow2.asm:asm-commons:jar:4.0, org.ow2.asm:asm:jar:4.0, org.ow2.asm:asm-util:jar:4.0, org.ow2.asm:asm-analysis:jar:4.0, org.codehaus.groovy:groovy-templates:jar:2.1.6, org.codehaus.groovy:groovy-xml:jar:2.1.6, org.codehaus.groovy:groovy-jsr223:jar:2.1.6, org.apache.commons:commons-math3:jar:3.4.1, org.roaringbitmap:RoaringBitmap:jar:0.6.18: Could not transfer artifact junit:junit:jar:4.12 from/to central (https://repo.maven.org/maven2): Transfer failed for https://repo.maven.org/maven2/junit/junit/4.12/junit-4.12.jar: Unknown host repo.maven.org -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
JAR does not exist or is not a normal file: /home/smc/software/ldbc_snb_datagen_hadoop/target/ldbc_snb_datagen-1.0.0-jar-with-dependencies.jar
I opened https://repo.maven.org/maven2 in the browser, but it also cannot be opened. How can I solve this problem? look forward to your reply.
Hi!
I tried to follow the instructions to generate a dataset with a different scale factor (250) instead of 1. I copied the params-csv-composite.ini
into params.ini
and tried to follow https://github.com/ldbc/ldbc_snb_datagen_hadoop#pseudo-distributed-hadoop-node.
I changed to HADOOP_CLIENT_OPTS="-Xmx100G".
Running the bash script gives me the following:
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 33.642 s
[INFO] Finished at: 2021-09-28T19:08:17Z
[INFO] ------------------------------------------------------------------------
Reading scale factors..
Available scale factor configuration set snb.interactive.0.1
Available scale factor configuration set snb.interactive.0.3
Available scale factor configuration set snb.interactive.1
Available scale factor configuration set snb.interactive.3
Available scale factor configuration set snb.interactive.10
Available scale factor configuration set snb.interactive.30
Available scale factor configuration set snb.interactive.100
Available scale factor configuration set snb.interactive.300
Available scale factor configuration set snb.interactive.1000
Available scale factor configuration set graphalytics.1
Available scale factor configuration set graphalytics.3
Available scale factor configuration set graphalytics.10
Available scale factor configuration set graphalytics.30
Available scale factor configuration set graphalytics.100
Available scale factor configuration set graphalytics.300
Available scale factor configuration set graphalytics.1000
Available scale factor configuration set graphalytics.3000
Available scale factor configuration set graphalytics.10000
Available scale factor configuration set graphalytics.30000
Number of scale factors read 19
Applied configuration of scale factor snb.interactive.250
null
Error during execution
java.lang.NullPointerException
Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException
at ldbc.snb.datagen.util.ConfigParser.readConfig(ConfigParser.java:165)
at ldbc.snb.datagen.util.ConfigParser.readConfig(ConfigParser.java:133)
at ldbc.snb.datagen.LdbcDatagen.main(LdbcDatagen.java:341)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.NullPointerException
at ldbc.snb.datagen.util.ConfigParser.readConfig(ConfigParser.java:148)
... 8 more
Somehow it seems like there is a problem parsing the parameters file. Anyone knows something about this that can help?
~/git-projects/ldbc_snb_datagen_hadoop$ docker run -it --rm --mount type=bind,source="$(pwd)/",target="/opt/ldbc_snb_datagen/out" --mount type=bind,source="$(pwd)/params.
ini",target="/opt/ldbc_snb_datagen/params.ini" ldbc/datagen bash
root@5b0d687849c5:/opt/ldbc_snb_datagen# ll
total 136
drwxr-xr-x 11 root root 4096 Dec 6 11:31 ./
drwxr-xr-x 4 root root 4096 Dec 6 09:23 ../
drwxrwxrwx 2 root root 4096 Dec 1 15:46 .circleci/
drwxrwxrwx 8 root root 4096 Dec 1 15:46 .git/
-rwxrwxrwx 1 root root 661 Dec 1 15:46 .gitignore*
-rwxrwxrwx 1 root root 568 Dec 1 15:46 .project*
-rwxrwxrwx 1 root root 2475 Dec 1 15:46 CHANGELOG.txt*
-rwxrwxrwx 1 root root 344 Dec 1 15:46 Dockerfile*
-rwxrwxrwx 1 root root 11560 Dec 1 15:46 LICENSE.txt*
-rwxrwxrwx 1 root root 616 Dec 1 15:46 NOTICE.txt*
-rwxrwxrwx 1 root root 5808 Dec 1 15:46 README.md*
drwxrwxrwx 2 root root 4096 Dec 1 15:46 base-docker-image/
-rwxrwxrwx 1 root root 4614 Dec 1 15:46 check-md5sums-csv-basic.sh*
-rwxrwxrwx 1 root root 2609 Dec 1 15:46 check-md5sums-csv-composite-merge-foreign.sh*
-rwxrwxrwx 1 root root 4352 Dec 1 15:46 check-md5sums-csv-composite.sh*
-rwxrwxrwx 1 root root 2867 Dec 1 15:46 check-md5sums-csv-merge-foreign.sh*
-rwxrwxrwx 1 root root 612 Dec 1 15:46 check-md5sums-ttl.sh*
-rwxrwxrwx 1 root root 595 Dec 1 15:46 docker_run.sh*
-rwxrwxrwx 1 root root 860 Dec 1 15:46 generate-update-streams.sh*
-rwxrwxrwx 1 root root 4609 Dec 1 15:46 graphalytics-generate-old.sh*
drwxrwxrwx 1 root root 512 Dec 6 11:21 out/
drwxrwxrwx 2 root root 4096 Dec 1 15:46 paramgenerator/
-rwxrwxrwx 1 root root 494 Dec 1 16:26 params.ini*
-rwxrwxrwx 1 root root 3744 Dec 1 15:46 pom.xml*
-rwxrwxrwx 1 root root 1636 Dec 1 15:46 run.sh*
drwxrwxrwx 4 root root 4096 Dec 1 15:46 src/
drwxr-xr-x 9 root root 4096 Dec 6 09:24 target/
drwxrwxrwx 2 root root 4096 Dec 1 15:46 test/
-rwxrwxrwx 1 root root 577 Dec 1 15:46 test_params.ini*
drwxrwxrwx 4 root root 4096 Dec 1 15:46 tools/
Some CI/CD specific files, git files, IDE specific, etc.
We could get rid of them using .dockerignore.
Grab & extract Hadoop.
Set the HADOOP_HOME
and HADOOP_CLIENT_OPTS
environment variables in the ~/.bashrc
or similar file.
Set a temp directory with ample free space:
$HADOOP_HOME/etc/hadoop/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/path/to/dir</value>
</property>
Generate data:
#!/bin/bash
set -eu
rm -rf social_network/
rm -f datagen.log
export HADOOP_CLIENT_OPTS="-Xmx900G"
# set serializer to be one of:
# - CsvBasic
# - CsvComposite
# - CsvMergeForeign
# - CsvCompositeMergeForeign
SERIALIZER=CsvMergeForeign
for SF in 0.1 0.3 1 3 10 30 100 300 1000; do
echo "=> SF: ${SF}" | tee -a datagen.log
rm -rf /tmp/hadoop*
echo > params.ini
echo ldbc.snb.datagen.generator.scaleFactor:snb.interactive.${SF} >> params.ini
# dateformat
echo ldbc.snb.datagen.serializer.dateFormatter:ldbc.snb.datagen.util.formatter.LongDateFormatter >> params.ini
# no update streams, no serializers
echo ldbc.snb.datagen.parametergenerator.parameters:false >> params.ini
echo ldbc.snb.datagen.serializer.updateStreams:false >> params.ini
# serializers
echo ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.${SERIALIZER}DynamicActivitySerializer >> params.ini
echo ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.${SERIALIZER}DynamicPersonSerializer >> params.ini
echo ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.${SERIALIZER}StaticSerializer >> params.ini
./run.sh
cp params.ini social_network/
mv social_network/ social_network-${SERIALIZER}-sf${SF}
done
Make sure the filenames are correct.
Compress with:
export ZSTD_NBTHREADS=`nproc`
tar --zstd -cf social_network-<...>-sf${SF}.tar.zst social_network-<...>-sf${SF}/
Hi, I'm sorry for disturbing you for something.I tried some times for the guidance in https://github.com/ldbc/ldbc_snb_datagen_hadoop/wiki/Spark,but it didn't work when I set the generator.dateFormatter
,so can you give something more clear about the parameters? Thank you very much.
v0.3.5 should be released (a version that's in sync with the driver/spec/interactive repository)
ldbc.snb.datagen.parametergenerator.parameters:true
I use hadoop to generate data sets.
When set ldbc.snb.datagen.generator.scaleFactor:snb.interactive.1, the data sets have substitution_parameters folder and updateStream_0_0_person.csv updateStream_0_0_forum.csv file.
But when I ldbc.snb.datagen.generator.scaleFactor:snb.interactive.300, the substitution_parameters folder and update file have not.
I need generate params set.
running on a distributed Hadoop cluster
************************************************
* Sorting update streams *
************************************************
23/03/23 10:31:15 INFO client.AHSProxy: Connecting to Application History server at hadoop011/10.21.33.11:13200
23/03/23 10:31:15 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
23/03/23 10:31:16 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hdfs/.staging/job_1679366819090_0118
23/03/23 10:31:16 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/hdfs/.staging/job_1679366819090_0118
Error during execution
Input path does not exist: hdfs://hadoop3-cluster/user/hdfs/hadoop/temp_updateStream_person_0_0
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://hadoop3-cluster/user/hdfs/hadoop/temp_updateStream_person_0_0
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:332)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:274)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:396)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
at ldbc.snb.datagen.hadoop.serializer.HadoopUpdateStreamSorterAndSerializer.run(HadoopUpdateStreamSorterAndSerializer.java:130)
at ldbc.snb.datagen.LdbcDatagen.runGenerateJob(LdbcDatagen.java:204)
at ldbc.snb.datagen.LdbcDatagen.main(LdbcDatagen.java:344)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
ldbc.snb.datagen.generator.scaleFactor:snb.interactive.100
ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvBasicDynamicActivitySerializer
ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvBasicDynamicPersonSerializer
ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvBasicStaticSerializer
Problem:
Cannot build datagen-base image.
Suspected cause:
Missing maven dependencies
To reproduce:
Start in the root directory of the project (branch: main)
Run:
cd base-docker-image/
docker build . --tag ldbc/datagen_base:
Tail of the execution log:
...
:: loading settings :: url = jar:file:/root/.m2/repository/org/walkmod/walkmod-cmd/3.0.0/walkmod-cmd-3.0.0.jar!/ivysettings.xml
:: loading settings :: url = jar:file:/root/.m2/repository/org/apache/ivy/ivy/2.4.0-rc1/ivy-2.4.0-rc1.jar!/org/apache/ivy/core/settings/ivysettings.xml
:: resolving dependencies :: org.walkmod#walkmod-pmd-plugin-caller;working
confs: [default]
:: resolution report :: resolve 11745ms :: artifacts dl 0ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
module not found: org.walkmod#walkmod-pmd-plugin;latest.integration
==== maven2: tried
http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].pom
-- artifact org.walkmod#walkmod-pmd-plugin;latest.integration!walkmod-pmd-plugin.jar:
http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].jar
==== jboss: tried
-- artifact org.walkmod#walkmod-pmd-plugin;latest.integration!walkmod-pmd-plugin.jar:
http://repository.jboss.com/maven2/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].jar
==== springsource-external: tried
http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].pom
-- artifact org.walkmod#walkmod-pmd-plugin;latest.integration!walkmod-pmd-plugin.jar:
http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].jar
==== springsource-release: tried
http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].pom
-- artifact org.walkmod#walkmod-pmd-plugin;latest.integration!walkmod-pmd-plugin.jar:
http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].jar
==== springio-release: tried
http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].pom
-- artifact org.walkmod#walkmod-pmd-plugin;latest.integration!walkmod-pmd-plugin.jar:
http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].jar
module not found: org.walkmod#walkmod-maven-plugin;latest.integration
==== maven2: tried
http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].pom
-- artifact org.walkmod#walkmod-maven-plugin;latest.integration!walkmod-maven-plugin.jar:
http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].jar
==== jboss: tried
-- artifact org.walkmod#walkmod-maven-plugin;latest.integration!walkmod-maven-plugin.jar:
http://repository.jboss.com/maven2/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].jar
==== springsource-external: tried
http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].pom
-- artifact org.walkmod#walkmod-maven-plugin;latest.integration!walkmod-maven-plugin.jar:
http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].jar
==== springsource-release: tried
http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].pom
-- artifact org.walkmod#walkmod-maven-plugin;latest.integration!walkmod-maven-plugin.jar:
http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].jar
==== springio-release: tried
http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].pom
-- artifact org.walkmod#walkmod-maven-plugin;latest.integration!walkmod-maven-plugin.jar:
http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].jar
module not found: org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)
==== maven2: tried
http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].pom
-- artifact org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)!walkmod-javalang-plugin.jar:
http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].jar
==== jboss: tried
-- artifact org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)!walkmod-javalang-plugin.jar:
http://repository.jboss.com/maven2/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].jar
==== springsource-external: tried
http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].pom
-- artifact org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)!walkmod-javalang-plugin.jar:
http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].jar
==== springsource-release: tried
http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].pom
-- artifact org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)!walkmod-javalang-plugin.jar:
http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].jar
==== springio-release: tried
http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].pom
-- artifact org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)!walkmod-javalang-plugin.jar:
http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: org.walkmod#walkmod-pmd-plugin;latest.integration: not found
:: org.walkmod#walkmod-maven-plugin;latest.integration: not found
:: org.walkmod#walkmod-javalang-plugin;[2.0, 4.0): not found
::::::::::::::::::::::::::::::::::::::::::::::
:::: ERRORS
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/maven-metadata.xml
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/maven-metadata.xml
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/maven-metadata.xml
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/
SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
WARN [main] - unresolved dependency: org.walkmod#walkmod-pmd-plugin;latest.integration: not found
WARN [main] - unresolved dependency: org.walkmod#walkmod-maven-plugin;latest.integration: not found
WARN [main] - unresolved dependency: org.walkmod#walkmod-javalang-plugin;[2.0, 4.0): not found
ERROR [main] - /opt/ldbc_snb_datagen/walkmod.xml is invalid. Please, execute walkmod with -e to see the details.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:25 min
[INFO] Finished at: 2021-12-06T09:47:56+00:00
[INFO] Final Memory: 21M/192M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.walkmod.maven.plugins:walkmod-maven-plugin:1.0.3:apply (default) on project ldbc_snb_datagen: Execution default of goal org.walkmod.maven.plugins:walkmod-maven-plugin:1.0.3:apply failed. NullPointerException -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
Error: error building at STEP "RUN mvn -DskipTests clean assembly:assembly": error while running runtime: exit status 1
Due to open items like ldbc/ldbc_snb_datagen_spark#206 with dev
, I am currently running with the stable
branch (commit hash d6620b9) on a 8-node Hadoop cluster (Hadoop 3.3.0, Centos 7.5, Java 8). There seems to be some weird issue with the output folders: initially, the data generation seemed to succeed, but unfortunately the dynamic
folder did not seem to have been produced. After trying some things, I thought I'd set ldbc.snb.datagen.serializer.outputDir
to /ldbc_dataset/sf1
, the generation consistently fails at the Person Serializer
job, with the below stack. I have checked Hadoop permissions, ownership and groups - nothing helped.
Do you see anything here which I can do?
2020-10-15 19:02:00,714 INFO mapreduce.Job: Task Id : attempt_1602777928494_0089_r_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.io.IOException: Mkdirs failed to create /ldbc_dataset/sf1/social_network/dynamic (exists=false, cwd=file:/mydata4/hadoop/yarn/local/usercache/centos/appcache/application_1602777928494_0089/container_1602777928494_0089_01_000004)
at ldbc.snb.datagen.hadoop.serializer.HadoopPersonSortAndSerializer$HadoopDynamicPersonSerializerReducer.setup(HadoopPersonSortAndSerializer.java:97)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.io.IOException: Mkdirs failed to create /ldbc_dataset/sf1/social_network/dynamic (exists=false, cwd=file:/mydata4/hadoop/yarn/local/usercache/centos/appcache/application_1602777928494_0089/container_1602777928494_0089_01_000004)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:473)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:458)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1164)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1144)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1103)
at ldbc.snb.datagen.hadoop.writer.HdfsWriter.<init>(HdfsWriter.java:66)
at ldbc.snb.datagen.hadoop.writer.HdfsCsvWriter.<init>(HdfsCsvWriter.java:49)
at ldbc.snb.datagen.serializer.snb.csv.CsvSerializer.initialize(CsvSerializer.java:23)
at ldbc.snb.datagen.serializer.LdbcSerializer.initialize(LdbcSerializer.java:20)
at ldbc.snb.datagen.hadoop.serializer.HadoopPersonSortAndSerializer$HadoopDynamicPersonSerializerReducer.setup(HadoopPersonSortAndSerializer.java:91)
... 8 more
Running Parameter Generation
Generating Interactive Parameters
Error during execution
Cannot run program "python2" (in directory "."): error=2, No such file or directory
Exception in thread "main" java.io.IOException: Cannot run program "python2" (in directory "."): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at ldbc.snb.datagen.LdbcDatagen.runGenerateJob(LdbcDatagen.java:305)
at ldbc.snb.datagen.LdbcDatagen.main(LdbcDatagen.java:344)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 8 more
this exception occurs when Serializing static graph.
I don't know what directory it refers.
running on my macbookpro2019 64g ram
I'm currently generating "all combinations" of update streams and update partitions of powers of 2 and a few number in between: 1 2 4 8 16 24 32 48 64 96 128 192 256 384 512 768 1024.
When doing this exercise, it makes sense to turn off the parameter generator by removing the call in src/main/java/ldbc/snb/datagen/LdbcDatagen.java
:
299: ProcessBuilder pb2 = new ProcessBuilder(conf.get("ldbc.snb.datagen.parametergenerator.python"), "paramgenerator/generateparams.py", "./", conf
300- .get("ldbc.snb.datagen.serializer.outputDir") + "/substitution_parameters");
301- pb2.directory(new File("./"));
302- File logInteractive = new File("parameters_interactive.log");
303- pb2.redirectErrorStream(true);
304- pb2.redirectOutput(ProcessBuilder.Redirect.appendTo(logInteractive));
305- Process p2 = pb2.start();
306- int r2 = p2.waitFor();
307- if (r2 != 0) {
308- throw new RuntimeException("Parameter generation failed with exit code " + r2);
309- }
There's a script to reproduce this. Note that if you include all combinations, this will literally take a month to run.
#!/bin/bash
set -eu
rm -rf social_network/
rm -f datagen.log
export HADOOP_CLIENT_OPTS="-Xmx900G"
for SF in 1 3 10 30 100 300 1000; do
echo "=> SF: ${SF}" | tee -a datagen.log
for NUMPART in 1 2 4 8 16 32 64 128 256 512 1024 24 48 96 192 384 768; do
echo "--> NUMPART: ${NUMPART}" | tee -a datagen.log
echo > params.ini
echo ldbc.snb.datagen.generator.scaleFactor:snb.interactive.${SF} >> params.ini
echo ldbc.snb.datagen.serializer.numUpdatePartitions:${NUMPART} >> params.ini
echo ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvBasicDynamicActivitySerializer >> params.ini
echo ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvBasicDynamicPersonSerializer >> params.ini
echo ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvBasicStaticSerializer >> params.ini
./run.sh
cp params.ini social_network/
# drop data files, we only need the update streams
rm -rf social_network/{static,dynamic}
mv social_network/ social_network-sf${SF}-numpart-${NUMPART}/
export ZSTD_NBTHREADS=`nproc`
tar --zstd -cf social_network-sf${SF}-numpart-${NUMPART}.tar.zst social_network-sf${SF}-numpart-${NUMPART}/
rm -rf social_network-sf${SF}-numpart-${NUMPART}/
done
done
Hi there 👋 Thanks for the work on this benchmark!
I'm currently attempting to use this benchmark to measure the correctness of some systems.
I'm not using the driver, and I'm instead generating the dataset directly, and running the (interactive) queries manually.
I was wondering if you could provide any pointers on how to obtain a validation dataset?
I found this wiki page, but it seems to suggest that a validation dataset for the interactive queries should already exist somewhere:
It is the responsibility of the workload developer to also provide a validation dataset and validation operation set, which others can then use to test their database connector implementations.
Is there a place where I can find this dataset and operation set? Or can I generate it myself somehow?
I found this file that appears to be a validation operation set, but it is unclear what dataset this should be run on, or what scale factor was used to generate the required dataset.
Thanks in advance for the help.
I find that we can get the raw interactive snb data in the ldbc_snb_datagen_spark repo. In the raw interactive snb data, there are columns like creationdata, deletiondate and explicitlydeleted. So how can I get the raw data using ldb_snb_datagen_hadoop
?
Currently, the return value of the paramgen process (python2 script) is not checked. If it fails, the run should fail as well.
In the updateStream_0_26_forum.csv file of the dataset, when listing all updates related to post-70368991551179 through
$ grep -w 70368991551179 updateStream_0_26_forum.csv
1349790181752|1278281267146|2|6597069905389|70368991551179|1349790181752
1349790181752|1289052676356|6|70368991551179||1349790181752|24.32.177.170|Internet Explorer|uz|About Niki Lauda, er Formula One racing driver and three-time F1 World Champion. More recently an aviati|104|10995116536216|21990277706324|57|164
1349790237878|1291770543934|2|10995116326590|70368991551179|1349790237878
1349790272484|1327084360170|2|26388279332421|70368991551179|1349790272484
1349790380456|1322943558393|2|24189256307723|70368991551179|1349790380456
(Vertex post-70368991551179 doesn't appear in any other streams and the initial snapshot.)
The first insert query related to post-70368991551179 is adding a like edge (INS2) to this post, and the second one is adding the post (INS6).
Though these two queries have a same timestamp, they will be issued by driver sequentially.
So there will be an adding edge query to the post vertex before the vertex is added.
Similar problem also found in the sf300 dataset when adding like edge to a comment (INS3).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.