Giter Club home page Giter Club logo

nebula-exchange's Introduction

NebulaGraph Exchange

中文版

NebulaGraph Exchange (referred to as Exchange) is an Apache Spark™ application used to migrate data in bulk from different sources to NebulaGraph in a distributed way(Spark). It supports a variety of batch or streaming data sources and allows direct writing to NebulaGraph through side-loading (SST Files).

Exchange supports Spark versions 2.2, 2.4, and 3.0 along with their respective toolkits named: nebula-exchange_spark_2.2, nebula-exchange_spark_2.4, and nebula-exchange_spark_3.0.

Note:

  • Exchange 3.4.0 does not support Apache Kafka and Apache Pulsar. Please use Exchange of version 3.0.0, 3.3.0, or 3.5.0 to load data from Apache Kafka or Apache Pulsar to NebulaGraph for now.
  • This repo covers only NebulaGraph 2.x and 3.x, for NebulaGraph v1.x, please use NebulaGraph Exchange v1.0.

Build or Download Exchange

  1. Build the latest Exchange

    $ git clone https://github.com/vesoft-inc/nebula-exchange.git
    $ cd nebula-exchange
    $ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_2.2 -am -Pscala-2.11 -Pspark-2.2
    $ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_2.4 -am -Pscala-2.11 -Pspark-2.4
    $ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true -pl nebula-exchange_spark_3.0 -am -Pscala-2.12 -Pspark-3.0

    After packaging, the newly generated JAR files can be found in the following path:

    • nebula-exchange/nebula-exchange_spark_2.2/target/ contains nebula-exchange_spark_2.2-3.0-SNAPSHOT.jar
    • nebula-exchange/nebula-exchange_spark_2.4/target/ contains nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar
    • nebula-exchange/nebula-exchange_spark_3.0/target/ contains nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar
  2. Download from the GitHub artifact

    Released Version:

    GitHub Releases or Downloads

    Snapshot Version:

    GitHub Actions Artifacts

Get Started

Here is an example command to run the Exchange:

$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar -c /path/to/application.conf

And when the source is Hive, run:

$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar -c /path/to/application.conf -h

Run the Exchange in Yarn-Cluster mode:

$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \
--master yarn-cluster \
--files application.conf \
--conf spark.driver.extraClassPath=./ \
--conf spark.executor.extraClassPath=./ \
nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar \
-c application.conf

Note: When using Exchange to generate SST files, please add spark.sql.shuffle.partition in --conf for Spark's shuffle operation:

$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \
--master local \
--conf spark.sql.shuffle.partitions=200 \
nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar \
-c application.conf

For more details, please refer to NebulaGraph Exchange Docs

Version Compatibility Matrix

Here is the version correspondence between Exchange and NebulaGraph:

Exchange Version Nebula Version Spark Version
nebula-exchange-2.0.0.jar 2.0.0, 2.0.1 2.4.*
nebula-exchange-2.0.1.jar 2.0.0, 2.0.1 2.4.*
nebula-exchange-2.1.0.jar 2.0.0, 2.0.1 2.4.*
nebula-exchange-2.5.0.jar 2.5.0, 2.5.1 2.4.*
nebula-exchange-2.5.1.jar 2.5.0, 2.5.1 2.4.*
nebula-exchange-2.5.2.jar 2.5.0, 2.5.1 2.4.*
nebula-exchange-2.6.0.jar 2.6.0, 2.6.1 2.4.*
nebula-exchange-2.6.1.jar 2.6.0, 2.6.1 2.4.*
nebula-exchange-2.6.2.jar 2.6.0, 2.6.1 2.4.*
nebula-exchange-2.6.3.jar 2.6.0, 2.6.1 2.4.*
nebula-exchange_spark_2.2-3.x.x.jar 3.x.x 2.2.*
nebula-exchange_spark_2.4-3.x.x.jar 3.x.x 2.4.*
nebula-exchange_spark_3.0-3.x.x.jar 3.x.x 3.0.*,3.1.*,3.2.*,3.3.*
nebula-exchange_spark_2.2-3.0-SNAPSHOT.jar nightly 2.2.*
nebula-exchange_spark_2.4-3.0-SNAPSHOT.jar nightly 2.4.*
nebula-exchange_spark_3.0-3.0-SNAPSHOT.jar nightly 3.0.*,3.1.*,3.2.*,3.3.*

Feature History

  1. Since 2.0 Exchange allows for the import of vertex data with both String and Integer type IDs.
  2. Since 2.0 Exchange also supports importing data of various types, including Null, Date, DateTime (using UTC instead of local time), and Time.
  3. Since 2.0 In addition to Hive on Spark, Exchange can import data from other Hive sources as well.
  4. Since 2.0 If there are failures during the data import process, Exchange supports recording and retrying the INSERT statement.
  5. Since 2.5 While SST import is supported by Exchange, property default values are not yet supported.
  6. Since 3.0 Exchange is compatible with Spark 2.2, Spark 2.4, and Spark 3.0.

Refer to application.conf as an example to edit the configuration file.

nebula-exchange's People

Contributors

nicole00 avatar cooper-lzy avatar darionyaphet avatar harrischu avatar laura-ding avatar jievince avatar jude-zhu avatar oldlady344 avatar codelone avatar wey-gu avatar whitewum avatar randomjoe211 avatar riverzzz avatar yixinglu avatar xiajingchun avatar guojun85 avatar eldinzhou avatar amber1990zhang avatar diligencelai avatar shinji-ikarig avatar ianhhhhhhhhe avatar zhongqishang avatar sophie-xie avatar dutor avatar ggzone avatar sworduo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.