Giter Club home page Giter Club logo

ldbc_snb_datagen_hadoop's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ldbc_snb_datagen_hadoop's Issues

Transfer failed for https://repo.maven.org/maven2/junit/junit/4.12/junit-4.12.jar

hi, the following error occurs when running run.sh:
Downloading from simulation: http://www.simulation.tudelft.nl/maven/org/apache/mahout/mahout-collections/1.0/mahout-collections-1.0.jar
Downloading from simulation: http://www.simulation.tudelft.nl/maven/com/github/rwl/optimization/1.3/optimization-1.3.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13:52 min
[INFO] Finished at: 2024-01-16T19:25:30+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project ldbc_snb_datagen: Could not resolve dependencies for project ldbc.snb.datagen:ldbc_snb_datagen:jar:1.0.0: The following artifacts could not be resolved: junit:junit:jar:4.12, org.hamcrest:hamcrest-core:jar:1.3, xerces:xercesImpl:jar:2.9.1, xml-apis:xml-apis:jar:1.3.04, xalan:xalan:jar:2.7.1, xalan:serializer:jar:2.7.1, org.jdom:jdom:jar:1.1.3, org.apache.hadoop:hadoop-client:jar:3.2.1, org.apache.hadoop:hadoop-common:jar:3.2.1, com.google.guava:guava:jar:27.0-jre, com.google.guava:failureaccess:jar:1.0, com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava, org.checkerframework:checker-qual:jar:2.5.2, com.google.errorprone:error_prone_annotations:jar:2.2.0, com.google.j2objc:j2objc-annotations:jar:1.1, org.codehaus.mojo:animal-sniffer-annotations:jar:1.17, commons-cli:commons-cli:jar:1.2, org.apache.httpcomponents:httpclient:jar:4.5.6, org.apache.httpcomponents:httpcore:jar:4.4.10, commons-codec:commons-codec:jar:1.11, commons-io:commons-io:jar:2.5, commons-net:commons-net:jar:3.6, commons-collections:commons-collections:jar:3.2.2, org.eclipse.jetty:jetty-servlet:jar:9.3.24.v20180605, org.eclipse.jetty:jetty-security:jar:9.3.24.v20180605, org.eclipse.jetty:jetty-webapp:jar:9.3.24.v20180605, org.eclipse.jetty:jetty-xml:jar:9.3.24.v20180605, javax.servlet.jsp:jsp-api:jar:2.1, com.sun.jersey:jersey-servlet:jar:1.19, commons-logging:commons-logging:jar:1.1.3, log4j:log4j:jar:1.2.17, commons-beanutils:commons-beanutils:jar:1.9.3, org.apache.commons:commons-configuration2:jar:2.1.1, org.apache.commons:commons-lang3:jar:3.7, org.apache.commons:commons-text:jar:1.4, org.slf4j:slf4j-api:jar:1.7.25, org.apache.avro:avro:jar:1.7.7, org.codehaus.jackson:jackson-core-asl:jar:1.9.13, org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13, com.thoughtworks.paranamer:paranamer:jar:2.3, org.xerial.snappy:snappy-java:jar:1.0.5, com.google.re2j:re2j:jar:1.1, com.google.protobuf:protobuf-java:jar:2.5.0, org.apache.hadoop:hadoop-auth:jar:3.2.1, com.nimbusds:nimbus-jose-jwt:jar:4.41.1, com.github.stephenc.jcip:jcip-annotations:jar:1.0-1, net.minidev:json-smart:jar:2.3, net.minidev:accessors-smart:jar:1.2, org.apache.curator:curator-framework:jar:2.13.0, org.apache.curator:curator-client:jar:2.13.0, org.apache.curator:curator-recipes:jar:2.13.0, com.google.code.findbugs:jsr305:jar:3.0.0, org.apache.htrace:htrace-core4:jar:4.1.0-incubating, org.apache.commons:commons-compress:jar:1.18, org.apache.kerby:kerb-simplekdc:jar:1.0.1, org.apache.kerby:kerb-client:jar:1.0.1, org.apache.kerby:kerby-config:jar:1.0.1, org.apache.kerby:kerb-core:jar:1.0.1, org.apache.kerby:kerby-pkix:jar:1.0.1, org.apache.kerby:kerby-asn1:jar:1.0.1, org.apache.kerby:kerby-util:jar:1.0.1, org.apache.kerby:kerb-common:jar:1.0.1, org.apache.kerby:kerb-crypto:jar:1.0.1, org.apache.kerby:kerb-util:jar:1.0.1, org.apache.kerby:token-provider:jar:1.0.1, org.apache.kerby:kerb-admin:jar:1.0.1, org.apache.kerby:kerb-server:jar:1.0.1, org.apache.kerby:kerb-identity:jar:1.0.1, org.apache.kerby:kerby-xdr:jar:1.0.1, com.fasterxml.jackson.core:jackson-databind:jar:2.9.8, com.fasterxml.jackson.core:jackson-core:jar:2.9.8, org.codehaus.woodstox:stax2-api:jar:3.1.4, com.fasterxml.woodstox:woodstox-core:jar:5.0.3, dnsjava:dnsjava:jar:2.1.7, org.apache.hadoop:hadoop-hdfs-client:jar:3.2.1, com.squareup.okhttp:okhttp:jar:2.7.5, com.squareup.okio:okio:jar:1.6.0, com.fasterxml.jackson.core:jackson-annotations:jar:2.9.8, org.apache.hadoop:hadoop-yarn-api:jar:3.2.1, javax.xml.bind:jaxb-api:jar:2.2.11, org.apache.hadoop:hadoop-yarn-client:jar:3.2.1, org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.2.1, org.apache.hadoop:hadoop-yarn-common:jar:3.2.1, javax.servlet:javax.servlet-api:jar:3.1.0, org.eclipse.jetty:jetty-util:jar:9.3.24.v20180605, com.sun.jersey:jersey-core:jar:1.19, javax.ws.rs:jsr311-api:jar:1.1.1, com.sun.jersey:jersey-client:jar:1.19, com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.9.8, com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.9.8, com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.9.8, org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:3.2.1, org.apache.hadoop:hadoop-mapreduce-client-common:jar:3.2.1, org.apache.hadoop:hadoop-annotations:jar:3.2.1, ca.umontreal.iro:ssj:jar:2.5, jfree:jfreechart:jar:1.0.13, org.apache.mahout:mahout-collections:jar:1.0, jfree:jcommon:jar:1.0.16, colt:colt:jar:1.2.0, concurrent:concurrent:jar:1.3.4, com.github.rwl:optimization:jar:1.3, com.google.code.gson:gson:jar:2.2.4, org.codehaus.groovy:groovy:jar:2.1.6, antlr:antlr:jar:2.7.7, org.ow2.asm:asm-tree:jar:4.0, org.ow2.asm:asm-commons:jar:4.0, org.ow2.asm:asm:jar:4.0, org.ow2.asm:asm-util:jar:4.0, org.ow2.asm:asm-analysis:jar:4.0, org.codehaus.groovy:groovy-templates:jar:2.1.6, org.codehaus.groovy:groovy-xml:jar:2.1.6, org.codehaus.groovy:groovy-jsr223:jar:2.1.6, org.apache.commons:commons-math3:jar:3.4.1, org.roaringbitmap:RoaringBitmap:jar:0.6.18: Could not transfer artifact junit:junit:jar:4.12 from/to central (https://repo.maven.org/maven2): Transfer failed for https://repo.maven.org/maven2/junit/junit/4.12/junit-4.12.jar: Unknown host repo.maven.org -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
JAR does not exist or is not a normal file: /home/smc/software/ldbc_snb_datagen_hadoop/target/ldbc_snb_datagen-1.0.0-jar-with-dependencies.jar
I opened https://repo.maven.org/maven2 in the browser, but it also cannot be opened. How can I solve this problem? look forward to your reply.

Error generating an SNB dataset with a custom scale factor

Hi!
I tried to follow the instructions to generate a dataset with a different scale factor (250) instead of 1. I copied the params-csv-composite.ini into params.ini and tried to follow https://github.com/ldbc/ldbc_snb_datagen_hadoop#pseudo-distributed-hadoop-node.
I changed to HADOOP_CLIENT_OPTS="-Xmx100G".

Running the bash script gives me the following:

[INFO] ------------------------------------------------------------------------
[INFO] Total time:  33.642 s
[INFO] Finished at: 2021-09-28T19:08:17Z
[INFO] ------------------------------------------------------------------------
Reading scale factors..
Available scale factor configuration set snb.interactive.0.1
Available scale factor configuration set snb.interactive.0.3
Available scale factor configuration set snb.interactive.1
Available scale factor configuration set snb.interactive.3
Available scale factor configuration set snb.interactive.10
Available scale factor configuration set snb.interactive.30
Available scale factor configuration set snb.interactive.100
Available scale factor configuration set snb.interactive.300
Available scale factor configuration set snb.interactive.1000
Available scale factor configuration set graphalytics.1
Available scale factor configuration set graphalytics.3
Available scale factor configuration set graphalytics.10
Available scale factor configuration set graphalytics.30
Available scale factor configuration set graphalytics.100
Available scale factor configuration set graphalytics.300
Available scale factor configuration set graphalytics.1000
Available scale factor configuration set graphalytics.3000
Available scale factor configuration set graphalytics.10000
Available scale factor configuration set graphalytics.30000
Number of scale factors read 19
Applied configuration of scale factor snb.interactive.250
null
Error during execution
java.lang.NullPointerException
Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException
        at ldbc.snb.datagen.util.ConfigParser.readConfig(ConfigParser.java:165)
        at ldbc.snb.datagen.util.ConfigParser.readConfig(ConfigParser.java:133)
        at ldbc.snb.datagen.LdbcDatagen.main(LdbcDatagen.java:341)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.NullPointerException
        at ldbc.snb.datagen.util.ConfigParser.readConfig(ConfigParser.java:148)
        ... 8 more

Somehow it seems like there is a problem parsing the parameters file. Anyone knows something about this that can help?

Docker image includes a lot of unnecessary files

~/git-projects/ldbc_snb_datagen_hadoop$ docker run -it --rm --mount type=bind,source="$(pwd)/",target="/opt/ldbc_snb_datagen/out" --mount type=bind,source="$(pwd)/params.
ini",target="/opt/ldbc_snb_datagen/params.ini" ldbc/datagen bash
root@5b0d687849c5:/opt/ldbc_snb_datagen# ll
total 136
drwxr-xr-x 11 root root  4096 Dec  6 11:31 ./
drwxr-xr-x  4 root root  4096 Dec  6 09:23 ../
drwxrwxrwx  2 root root  4096 Dec  1 15:46 .circleci/
drwxrwxrwx  8 root root  4096 Dec  1 15:46 .git/
-rwxrwxrwx  1 root root   661 Dec  1 15:46 .gitignore*
-rwxrwxrwx  1 root root   568 Dec  1 15:46 .project*
-rwxrwxrwx  1 root root  2475 Dec  1 15:46 CHANGELOG.txt*
-rwxrwxrwx  1 root root   344 Dec  1 15:46 Dockerfile*
-rwxrwxrwx  1 root root 11560 Dec  1 15:46 LICENSE.txt*
-rwxrwxrwx  1 root root   616 Dec  1 15:46 NOTICE.txt*
-rwxrwxrwx  1 root root  5808 Dec  1 15:46 README.md*
drwxrwxrwx  2 root root  4096 Dec  1 15:46 base-docker-image/
-rwxrwxrwx  1 root root  4614 Dec  1 15:46 check-md5sums-csv-basic.sh*
-rwxrwxrwx  1 root root  2609 Dec  1 15:46 check-md5sums-csv-composite-merge-foreign.sh*
-rwxrwxrwx  1 root root  4352 Dec  1 15:46 check-md5sums-csv-composite.sh*
-rwxrwxrwx  1 root root  2867 Dec  1 15:46 check-md5sums-csv-merge-foreign.sh*
-rwxrwxrwx  1 root root   612 Dec  1 15:46 check-md5sums-ttl.sh*
-rwxrwxrwx  1 root root   595 Dec  1 15:46 docker_run.sh*
-rwxrwxrwx  1 root root   860 Dec  1 15:46 generate-update-streams.sh*
-rwxrwxrwx  1 root root  4609 Dec  1 15:46 graphalytics-generate-old.sh*
drwxrwxrwx  1 root root   512 Dec  6 11:21 out/
drwxrwxrwx  2 root root  4096 Dec  1 15:46 paramgenerator/
-rwxrwxrwx  1 root root   494 Dec  1 16:26 params.ini*
-rwxrwxrwx  1 root root  3744 Dec  1 15:46 pom.xml*
-rwxrwxrwx  1 root root  1636 Dec  1 15:46 run.sh*
drwxrwxrwx  4 root root  4096 Dec  1 15:46 src/
drwxr-xr-x  9 root root  4096 Dec  6 09:24 target/
drwxrwxrwx  2 root root  4096 Dec  1 15:46 test/
-rwxrwxrwx  1 root root   577 Dec  1 15:46 test_params.ini*
drwxrwxrwx  4 root root  4096 Dec  1 15:46 tools/

Some CI/CD specific files, git files, IDE specific, etc.

We could get rid of them using .dockerignore.

Pre-generate LongDateFormatter variants

Grab & extract Hadoop.

Set the HADOOP_HOME and HADOOP_CLIENT_OPTS environment variables in the ~/.bashrc or similar file.

Set a temp directory with ample free space:

$HADOOP_HOME/etc/hadoop/core-site.xml
<property>
  <name>hadoop.tmp.dir</name>
  <value>/path/to/dir</value>
</property>

Generate data:

#!/bin/bash

set -eu

rm -rf social_network/
rm -f datagen.log

export HADOOP_CLIENT_OPTS="-Xmx900G"

# set serializer to be one of:
# - CsvBasic
# - CsvComposite
# - CsvMergeForeign
# - CsvCompositeMergeForeign
SERIALIZER=CsvMergeForeign

for SF in 0.1 0.3 1 3 10 30 100 300 1000; do
    echo "=> SF: ${SF}" | tee -a datagen.log

    rm -rf /tmp/hadoop*

    echo > params.ini
    echo ldbc.snb.datagen.generator.scaleFactor:snb.interactive.${SF} >> params.ini

    # dateformat
    echo ldbc.snb.datagen.serializer.dateFormatter:ldbc.snb.datagen.util.formatter.LongDateFormatter >> params.ini

    # no update streams, no serializers
    echo ldbc.snb.datagen.parametergenerator.parameters:false >> params.ini
    echo ldbc.snb.datagen.serializer.updateStreams:false >> params.ini

    # serializers
    echo ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.${SERIALIZER}DynamicActivitySerializer >> params.ini
    echo ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.${SERIALIZER}DynamicPersonSerializer >> params.ini
    echo ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.${SERIALIZER}StaticSerializer >> params.ini

    ./run.sh
    cp params.ini social_network/

    mv social_network/ social_network-${SERIALIZER}-sf${SF}
done

Make sure the filenames are correct.

Compress with:

export ZSTD_NBTHREADS=`nproc`
tar --zstd -cf social_network-<...>-sf${SF}.tar.zst social_network-<...>-sf${SF}/

SF300 not generate params set .

ldbc.snb.datagen.parametergenerator.parameters:true

I use hadoop to generate data sets.
When set ldbc.snb.datagen.generator.scaleFactor:snb.interactive.1, the data sets have substitution_parameters folder and updateStream_0_0_person.csv updateStream_0_0_forum.csv file.

But when I ldbc.snb.datagen.generator.scaleFactor:snb.interactive.300, the substitution_parameters folder and update file have not.

I need generate params set.

exception: Input path does not exist

running on a distributed Hadoop cluster

error log

************************************************
* Sorting update streams  *
************************************************
23/03/23 10:31:15 INFO client.AHSProxy: Connecting to Application History server at hadoop011/10.21.33.11:13200
23/03/23 10:31:15 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
23/03/23 10:31:16 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hdfs/.staging/job_1679366819090_0118
23/03/23 10:31:16 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/hdfs/.staging/job_1679366819090_0118
Error during execution
Input path does not exist: hdfs://hadoop3-cluster/user/hdfs/hadoop/temp_updateStream_person_0_0
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://hadoop3-cluster/user/hdfs/hadoop/temp_updateStream_person_0_0
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:332)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:274)
        at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:396)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
        at ldbc.snb.datagen.hadoop.serializer.HadoopUpdateStreamSorterAndSerializer.run(HadoopUpdateStreamSorterAndSerializer.java:130)
        at ldbc.snb.datagen.LdbcDatagen.runGenerateJob(LdbcDatagen.java:204)
        at ldbc.snb.datagen.LdbcDatagen.main(LdbcDatagen.java:344)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:232)

params.ini

ldbc.snb.datagen.generator.scaleFactor:snb.interactive.100

ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvBasicDynamicActivitySerializer
ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvBasicDynamicPersonSerializer
ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvBasicStaticSerializer

Problem building datagen-base image

Problem:
Cannot build datagen-base image.

Suspected cause:
Missing maven dependencies

To reproduce:
Start in the root directory of the project (branch: main)
Run:

cd base-docker-image/
docker build . --tag ldbc/datagen_base:

Tail of the execution log:


...

:: loading settings :: url = jar:file:/root/.m2/repository/org/walkmod/walkmod-cmd/3.0.0/walkmod-cmd-3.0.0.jar!/ivysettings.xml
:: loading settings :: url = jar:file:/root/.m2/repository/org/apache/ivy/ivy/2.4.0-rc1/ivy-2.4.0-rc1.jar!/org/apache/ivy/core/settings/ivysettings.xml
:: resolving dependencies :: org.walkmod#walkmod-pmd-plugin-caller;working
        confs: [default]
:: resolution report :: resolve 11745ms :: artifacts dl 0ms
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   3   |   0   |   0   |   0   ||   0   |   0   |
        ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
                module not found: org.walkmod#walkmod-pmd-plugin;latest.integration

        ==== maven2: tried

          http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-pmd-plugin;latest.integration!walkmod-pmd-plugin.jar:

          http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].jar

        ==== jboss: tried

          -- artifact org.walkmod#walkmod-pmd-plugin;latest.integration!walkmod-pmd-plugin.jar:

          http://repository.jboss.com/maven2/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].jar

        ==== springsource-external: tried

          http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-pmd-plugin;latest.integration!walkmod-pmd-plugin.jar:

          http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].jar

        ==== springsource-release: tried

          http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-pmd-plugin;latest.integration!walkmod-pmd-plugin.jar:

          http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].jar

        ==== springio-release: tried

          http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-pmd-plugin;latest.integration!walkmod-pmd-plugin.jar:

          http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-pmd-plugin/[revision]/walkmod-pmd-plugin-[revision].jar

                module not found: org.walkmod#walkmod-maven-plugin;latest.integration

        ==== maven2: tried

          http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-maven-plugin;latest.integration!walkmod-maven-plugin.jar:

          http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].jar

        ==== jboss: tried

          -- artifact org.walkmod#walkmod-maven-plugin;latest.integration!walkmod-maven-plugin.jar:

          http://repository.jboss.com/maven2/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].jar

        ==== springsource-external: tried

          http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-maven-plugin;latest.integration!walkmod-maven-plugin.jar:

          http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].jar

        ==== springsource-release: tried

          http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-maven-plugin;latest.integration!walkmod-maven-plugin.jar:

          http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].jar

        ==== springio-release: tried

          http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-maven-plugin;latest.integration!walkmod-maven-plugin.jar:

          http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-maven-plugin/[revision]/walkmod-maven-plugin-[revision].jar

                module not found: org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)

        ==== maven2: tried

          http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)!walkmod-javalang-plugin.jar:

          http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].jar

        ==== jboss: tried

          -- artifact org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)!walkmod-javalang-plugin.jar:

          http://repository.jboss.com/maven2/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].jar

        ==== springsource-external: tried

          http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)!walkmod-javalang-plugin.jar:

          http://repository.springsource.com/maven/bundles/external/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].jar

        ==== springsource-release: tried

          http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)!walkmod-javalang-plugin.jar:

          http://repository.springsource.com/maven/bundles/release/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].jar

        ==== springio-release: tried

          http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].pom

          -- artifact org.walkmod#walkmod-javalang-plugin;[2.0, 4.0)!walkmod-javalang-plugin.jar:

          http://repo.spring.io/libs-release-remote/org/walkmod/walkmod-javalang-plugin/[revision]/walkmod-javalang-plugin-[revision].jar

                ::::::::::::::::::::::::::::::::::::::::::::::

                ::          UNRESOLVED DEPENDENCIES         ::

                ::::::::::::::::::::::::::::::::::::::::::::::

                :: org.walkmod#walkmod-pmd-plugin;latest.integration: not found

                :: org.walkmod#walkmod-maven-plugin;latest.integration: not found

                :: org.walkmod#walkmod-javalang-plugin;[2.0, 4.0): not found

                ::::::::::::::::::::::::::::::::::::::::::::::


:::: ERRORS
        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/maven-metadata.xml

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-pmd-plugin/

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/maven-metadata.xml

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-maven-plugin/

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/maven-metadata.xml

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/

        SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/walkmod/walkmod-javalang-plugin/


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
 WARN [main] - unresolved dependency: org.walkmod#walkmod-pmd-plugin;latest.integration: not found
 WARN [main] - unresolved dependency: org.walkmod#walkmod-maven-plugin;latest.integration: not found
 WARN [main] - unresolved dependency: org.walkmod#walkmod-javalang-plugin;[2.0, 4.0): not found
ERROR [main] - /opt/ldbc_snb_datagen/walkmod.xml is invalid. Please, execute walkmod with -e to see the details.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:25 min
[INFO] Finished at: 2021-12-06T09:47:56+00:00
[INFO] Final Memory: 21M/192M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.walkmod.maven.plugins:walkmod-maven-plugin:1.0.3:apply (default) on project ldbc_snb_datagen: Execution default of goal org.walkmod.maven.plugins:walkmod-maven-plugin:1.0.3:apply failed. NullPointerException -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
Error: error building at STEP "RUN mvn -DskipTests clean assembly:assembly": error while running runtime: exit status 1

[stable branch] Person Serializer failing with `Mkdirs failed to create` exception

Due to open items like ldbc/ldbc_snb_datagen_spark#206 with dev, I am currently running with the stable branch (commit hash d6620b9) on a 8-node Hadoop cluster (Hadoop 3.3.0, Centos 7.5, Java 8). There seems to be some weird issue with the output folders: initially, the data generation seemed to succeed, but unfortunately the dynamic folder did not seem to have been produced. After trying some things, I thought I'd set ldbc.snb.datagen.serializer.outputDir to /ldbc_dataset/sf1, the generation consistently fails at the Person Serializer job, with the below stack. I have checked Hadoop permissions, ownership and groups - nothing helped.

Do you see anything here which I can do?

2020-10-15 19:02:00,714 INFO mapreduce.Job: Task Id : attempt_1602777928494_0089_r_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.io.IOException: Mkdirs failed to create /ldbc_dataset/sf1/social_network/dynamic (exists=false, cwd=file:/mydata4/hadoop/yarn/local/usercache/centos/appcache/application_1602777928494_0089/container_1602777928494_0089_01_000004)
        at ldbc.snb.datagen.hadoop.serializer.HadoopPersonSortAndSerializer$HadoopDynamicPersonSerializerReducer.setup(HadoopPersonSortAndSerializer.java:97)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.io.IOException: Mkdirs failed to create /ldbc_dataset/sf1/social_network/dynamic (exists=false, cwd=file:/mydata4/hadoop/yarn/local/usercache/centos/appcache/application_1602777928494_0089/container_1602777928494_0089_01_000004)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:473)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:458)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1164)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1144)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1103)
        at ldbc.snb.datagen.hadoop.writer.HdfsWriter.<init>(HdfsWriter.java:66)
        at ldbc.snb.datagen.hadoop.writer.HdfsCsvWriter.<init>(HdfsCsvWriter.java:49)
        at ldbc.snb.datagen.serializer.snb.csv.CsvSerializer.initialize(CsvSerializer.java:23)
        at ldbc.snb.datagen.serializer.LdbcSerializer.initialize(LdbcSerializer.java:20)
        at ldbc.snb.datagen.hadoop.serializer.HadoopPersonSortAndSerializer$HadoopDynamicPersonSerializerReducer.setup(HadoopPersonSortAndSerializer.java:91)
        ... 8 more

Cannot run program "python2" (in directory "."): error=2, No such file or directory

Running Parameter Generation
Generating Interactive Parameters
Error during execution
Cannot run program "python2" (in directory "."): error=2, No such file or directory
Exception in thread "main" java.io.IOException: Cannot run program "python2" (in directory "."): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at ldbc.snb.datagen.LdbcDatagen.runGenerateJob(LdbcDatagen.java:305)
at ldbc.snb.datagen.LdbcDatagen.main(LdbcDatagen.java:344)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 8 more

this exception occurs when Serializing static graph.
I don't know what directory it refers.
running on my macbookpro2019 64g ram

Generate update streams

I'm currently generating "all combinations" of update streams and update partitions of powers of 2 and a few number in between: 1 2 4 8 16 24 32 48 64 96 128 192 256 384 512 768 1024.

When doing this exercise, it makes sense to turn off the parameter generator by removing the call in src/main/java/ldbc/snb/datagen/LdbcDatagen.java:

299:            ProcessBuilder pb2 = new ProcessBuilder(conf.get("ldbc.snb.datagen.parametergenerator.python"), "paramgenerator/generateparams.py", "./", conf
300-                    .get("ldbc.snb.datagen.serializer.outputDir") + "/substitution_parameters");
301-            pb2.directory(new File("./"));
302-            File logInteractive = new File("parameters_interactive.log");
303-            pb2.redirectErrorStream(true);
304-            pb2.redirectOutput(ProcessBuilder.Redirect.appendTo(logInteractive));
305-            Process p2 = pb2.start();
306-            int r2 = p2.waitFor();
307-            if (r2 != 0) {
308-                throw new RuntimeException("Parameter generation failed with exit code " + r2);
309-            }

There's a script to reproduce this. Note that if you include all combinations, this will literally take a month to run.

#!/bin/bash

set -eu

rm -rf social_network/
rm -f datagen.log

export HADOOP_CLIENT_OPTS="-Xmx900G"

for SF in 1 3 10 30 100 300 1000; do
    echo "=> SF: ${SF}" | tee -a datagen.log

    for NUMPART in 1 2 4 8 16 32 64 128 256 512 1024 24 48 96 192 384 768; do
        echo "--> NUMPART: ${NUMPART}" | tee -a datagen.log

        echo > params.ini
        echo ldbc.snb.datagen.generator.scaleFactor:snb.interactive.${SF} >> params.ini
        echo ldbc.snb.datagen.serializer.numUpdatePartitions:${NUMPART} >> params.ini
        echo ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvBasicDynamicActivitySerializer >> params.ini
        echo ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvBasicDynamicPersonSerializer >> params.ini
        echo ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvBasicStaticSerializer >> params.ini

        ./run.sh
        cp params.ini social_network/
        # drop data files, we only need the update streams
        rm -rf social_network/{static,dynamic}
        mv social_network/ social_network-sf${SF}-numpart-${NUMPART}/

        export ZSTD_NBTHREADS=`nproc`
        tar --zstd -cf social_network-sf${SF}-numpart-${NUMPART}.tar.zst social_network-sf${SF}-numpart-${NUMPART}/
        rm -rf social_network-sf${SF}-numpart-${NUMPART}/
    done

done

Validation datasets

Hi there 👋 Thanks for the work on this benchmark!

I'm currently attempting to use this benchmark to measure the correctness of some systems.
I'm not using the driver, and I'm instead generating the dataset directly, and running the (interactive) queries manually.
I was wondering if you could provide any pointers on how to obtain a validation dataset?

I found this wiki page, but it seems to suggest that a validation dataset for the interactive queries should already exist somewhere:

It is the responsibility of the workload developer to also provide a validation dataset and validation operation set, which others can then use to test their database connector implementations.

Is there a place where I can find this dataset and operation set? Or can I generate it myself somehow?

I found this file that appears to be a validation operation set, but it is unclear what dataset this should be run on, or what scale factor was used to generate the required dataset.

Thanks in advance for the help.

Reversed order of add post (INS6) and add like to post (INS2)

In the updateStream_0_26_forum.csv file of the dataset, when listing all updates related to post-70368991551179 through

$ grep -w 70368991551179 updateStream_0_26_forum.csv
1349790181752|1278281267146|2|6597069905389|70368991551179|1349790181752
1349790181752|1289052676356|6|70368991551179||1349790181752|24.32.177.170|Internet Explorer|uz|About Niki Lauda, er Formula One racing driver and three-time F1 World Champion. More recently an aviati|104|10995116536216|21990277706324|57|164
1349790237878|1291770543934|2|10995116326590|70368991551179|1349790237878
1349790272484|1327084360170|2|26388279332421|70368991551179|1349790272484
1349790380456|1322943558393|2|24189256307723|70368991551179|1349790380456

(Vertex post-70368991551179 doesn't appear in any other streams and the initial snapshot.)

The first insert query related to post-70368991551179 is adding a like edge (INS2) to this post, and the second one is adding the post (INS6).

Though these two queries have a same timestamp, they will be issued by driver sequentially.

So there will be an adding edge query to the post vertex before the vertex is added.

Similar problem also found in the sf300 dataset when adding like edge to a comment (INS3).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.