Giter Club home page Giter Club logo

bytedance / bitsail Goto Github PK

View Code? Open in Web Editor NEW
1.6K 61.0 324.0 27.08 MB

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

Home Page: https://bytedance.github.io/bitsail/

License: Apache License 2.0

Java 99.49% Shell 0.13% TypeScript 0.23% SCSS 0.04% Dockerfile 0.07% Python 0.05%
flink big-data data-integration data-lake data-pipeline data-synchronization high-performance real-time

bitsail's Introduction

logo

English | 简体中文

Build License Join Slack Website

Introduction

BitSail is ByteDance's open source data integration engine which is based on distributed architecture and provides high performance. It supports data synchronization between multiple heterogeneous data sources, and provides global data integration solutions in batch, streaming, and incremental scenarios. At present, it serves almost all business lines in ByteDance, such as Douyin, Toutiao, etc., and synchronizes hundreds of trillions of data every day.

Official website of BitSail: https://bytedance.github.io/bitsail/

Why Do We Use BitSail

BitSail has been widely used and supports hundreds of trillions of large traffic. At the same time, it has been verified in various scenarios such as the cloud native environment of the volcano engine and the on-premises private cloud environment.

We have accumulated a lot of experience and made a number of optimizations to improve the function of data integration

  • Global Data Integration, covering batch, streaming and incremental scenarios

  • Distributed and cloud-native architecture, supporting horizontal scaling

  • High maturity in terms of accuracy, stability and performance

  • Rich basic functions, such as type conversion, dirty data processing, flow control, data lake integration, automatic parallelism calculation , etc.

  • Task running status monitoring, such as traffic, QPS, dirty data, latency, etc.

BitSail Use Scenarios

  • Mass data synchronization in heterogeneous data sources

  • Streaming and batch integration data processing capability

  • Data lake and warehouse integration data processing capability

  • High performance, high reliability data synchronization

  • Distributed, cloud-native architecture data integration engine

Features of BitSail

  • Low start-up cost and high flexibility

  • Stream-batch integration and Data lake-warehouse integration architecture, one framework covers almost all data synchronization scenarios

  • High-performance, massive data processing capabilities

  • DDL automatic synchronization

  • Type system, conversion between different data source types

  • Engine independent reading and writing interface, low development cost

  • Real-time display of task progress, under development

  • Real-time monitoring of task status

Architecture of BitSail

Source[Input Sources] -> Framework[Data Transmission] -> Sink[Output Sinks]

The data processing pipeline is as follows. First, pull the source data through Input Sources, then process it through the intermediate framework layer, and finally write the data to the target through Output Sinks

At the framework layer, we provide rich functions and take effect for all synchronization scenarios, such as dirty data collection, auto parallelism calculation, task monitoring, etc.

In data synchronization scenarios, it covers batch, streaming, and incremental data synchronization

In the Runtime layer, it supports multiple execution modes, such as yarn, local, and k8s is under development

Supported Connectors

DataSource Sub Modules Reader Writer
Assert -
ClickHouse - -
Doris -
Druid -
Elasticsearch -
Fake -
FTP/SFTP -
Hadoop -
HBase -
Hive -
Hudi -
LocalFileSystem -
JDBC MySQL
Oracle
PostgreSQL
SqlServer
Kafka -
Kudu -
LarkSheet -
MongoDB -
Print -
Redis -
RocketMQ -
SelectDB -

Documentation for Connectors.

Community Support

Slack

Join BitSail Slack channel via this link

Mailing List

Currently, BitSail community use Google Group as the mailing list provider. You need to subscribe to the mailing list before starting a conversation

Subscribe: Email to this address [email protected]

Start a conversation: Email to this address [email protected]

Unsubscribe: Email to this address [email protected]

WeChat Group

Welcome to scan this QR code and to join the WeChat group chat.

qr

Environment Setup

Link to Environment Setup.

Deployment Guide

Link to Deployment Guide.

BitSail Configuration

Link to Configuration Guide.

Contributing Guide

Link to Contributing Guide.

Contributors

Thanks all contributors

License

Apache 2.0 License.

bitsail's People

Contributors

aozeyu avatar ayonel avatar beyond-up avatar blockliu avatar dongliang-0 avatar garyli1019 avatar healchow avatar hityangfei avatar hk-lrzy avatar humengyu2012 avatar jake-00 avatar jliao07 avatar john8628 avatar klaus1995 avatar kyle-hawk avatar lfyzjck avatar lichang-bd avatar liugddx avatar liumengkai avatar liuxiaocs7 avatar love-star avatar lujg avatar manymango avatar niuxiangqian avatar qiluo-bd avatar xiamidavid00 avatar yanghuaigit avatar ysamchu avatar yucongcong654 avatar zeliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bitsail's Issues

[BitSail][Connector] Support Iceberg sink connector

Is your feature request related to a problem? Please describe

Iceberg is a popular table format for analytics, which is often used with Object Store like S3 and OSS. Currently BitSail lacks relative capabilities especially writing to Iceberg.

Describe the solution you'd like

A Iceberg sink with Object Store support.

Describe alternatives you've considered

N/A

Additional context

You can refer to Flink implementations for Iceberg connector.

[BitSail][Connector] Support PB format serializer in Kafka Source

Is your feature request related to a problem? Please describe

Currently, kafka legacy source only support JSON format and we want to support PB format.

Describe the solution you'd like

we need to move com.bytedance.bitsail.dump.datasink.file.parser.PbBytesParser to a common place and reuse by other connector, such as kafka source

Describe alternatives you've considered

Additional context

[BitSail][Core] Unified stream and batch data parsing module, such as BytesParser

Is your feature request related to a problem? Please describe

At present, the BytesParser of batch channel and stream channel are implemented separately and need to be unified

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[Connector-JDBC]MaridbDriver not support Integer.MIN_VALUE when set fetch size.

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

We can reproduce the bug as follows (use screenshots if necessary):

  1. The job conf is: '....'

  2. The test data is: '....'

  3. Bitsail run command: '....'

  4. The error can be seen in: '....'

(Note that the above sentences are not necessary for each bug report issue.)

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

截屏2022-11-02 14 28 51

截屏2022-11-02 14 29 13

If applicable, add screenshots to help explain your problem.

Build Environment

Describe the environment where the test project was build.

  • OS [e.g. Debain 11.0]
  • JDK [e.g. 1.8.0_301]
  • Maven [e.g. 3.6.3]
  • ...

Execution Environment

Describe the environment where the test project ran.

  • OS [e.g. Debain 11.0]
  • JDK [e.g. 1.8.0_301]
  • Deployment [e.g. Yarn | CDH 6.3.x]
  • ...

Additional context

Add any other context about the problem here.

[BitSail][Connector] Migrate Kafka legacy connector to V1 interface

Is your feature request related to a problem? Please describe

We want to migrate the legacy kafka connector to V1 interface. Similar with #98 for redis

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Core] Support real-time display of task progress

Is your feature request related to a problem? Please describe

The real-time progress can be constructed by using the split completion status and the topo original information of the task

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Connector] Support local file source connector

Is your feature request related to a problem? Please describe

We need to read local file as source. CSV file format could be a good start.

Describe the solution you'd like

Use V1 source connector interface, read CSV file as source and able to write to PrintSink

Additional context

[bugfix]KerberosAuthenticator#getugi error

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

KerberosAuthenticator.getUgi not execute UserGroupInformation#loginUserFromKeytabAndReturnUGI, Even though kerberos parameters are passed。

this is my test
image

securityConfiguration.get(KerberosOptions.KERBEROS_ENABLE) is always false.because There's no place to assign。
you should add:
securityConf.set(KerberosOptions.KERBEROS_ENABLE, true);
image

and There's actually a problem here,

UserGroupInformation#loginUserFromKeytabAndReturnUGI should be execute in HadoopSecurityModule#login,but you return value is UserGroupInformation.getCurrentUser();

so securityModule#login is invalid

We can reproduce the bug as follows (use screenshots if necessary):

  1. The job conf is: '....'

  2. The test data is: '....'

  3. Bitsail run command: '....'

  4. The error can be seen in: '....'

(Note that the above sentences are not necessary for each bug report issue.)

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Build Environment

Describe the environment where the test project was build.

  • OS [e.g. Debain 11.0]
  • JDK [e.g. 1.8.0_301]
  • Maven [e.g. 3.6.3]
  • ...

Execution Environment

Describe the environment where the test project ran.

  • OS [e.g. Debain 11.0]
  • JDK [e.g. 1.8.0_301]
  • Deployment [e.g. Yarn | CDH 6.3.x]
  • ...

Additional context

Add any other context about the problem here.

[BitSail][Core] Support metric gather, based on prometheus

Is your feature request related to a problem? Please describe

Currently only supports LogMetricReporter

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

mvn clean package got compilation failure

Got the master branch, and run
mvn clean package -D maven.test.skip=true
but got following compilation failure:
/bitsail/bitsail-connectors/bitsail-connectors-legacy/bitsail-connector-ftp/src/main/java/com/bytedance/bitsail/connector/legacy/ftp/source/FtpInputFormat.java:[75,91] incompatible types: java.lang.String cann ot be converted to com.bytedance.bitsail.common.type.TypeInfoConverter
The code snippet in line 75:
image

[BitSail][Connector] Support Pulsar sink connector

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

hive 读取支持自定义sql

请问 hive 读取这块会像jdbc 那样,支持自定义sql吗?
由于可能涉及到不同库下hive表与 hive 表之间的同步的数据量比较大,整表同步的话不太现实。

[BitSail][Connector] Migrate hadoop legacy connector to V1 interface

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Javahost dependency doesn't work on latest JDK (11+)

Uses reflection on InetAddress which no longer exists

final var cacheField = InetAddress.class.getDeclaredField("addressCache");
cacheField.setAccessible(true);
final var addressCache = cacheField.get(InetAddress.class);

var clazz = addressCache.getClass();
final var cacheMapField = clazz.getDeclaredField("cache");
cacheMapField.setAccessible(true);

[BitSail][Connector] Support Pulsar source connector

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Doc] Change LarkSheet_connector to LarkSheet connector in connector introduction

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Connector] Support S3 sink connector

Is your feature request related to a problem? Please describe

Currently only supports HDFS storage, can be extended to object storage, such as S3

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Doc] Add Kudu Connector in ReadMe

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Test] Add FTP/SFTP test container in bitsail-test

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

mariadb driver access mysql failed

run a mysql reader job ,sql driver exception :
image

source code driver name is mariadb ,is it fully compatible for connectting mysql
image

does mysql version occur this problem or other reason ?

found some types in readme

Describe the bug

see below

To Reproduce

image

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

image

Build Environment

[BitSail][Test] Add Hbase test container in bitsail-test

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

binlog2hudi链路需求

bitsail目前是否支持binlog connector?看了下代码里除了MysqlBinlogEventTimeExtractor没有其他相关的代码,是暂时还没有开源?

我们这边目前binlog->kafka->hudi的架构在kafka扩分区场景下存在消息乱序问题,bitsail是否能提供解决该问题的方案?

另外问下,bitsail有交流群么

[BitSail][Runtime] Support multi version of Flink(1.13 1.14 1.15 1.16)

Is your feature request related to a problem? Please describe

Currently we only support Flink 1.11. We want to support multiple Flink version and be able to compile with different maven profile. Some Flink API was change during the version upgrade, so we need to find a way to resolve this issue.

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Connector] Migrate Hbase legacy connector to V1 interface

Is your feature request related to a problem? Please describe

Migrate Hbase legacy connector to V1 interface. We did the migration for Redis on #98

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

flink版本有支持1.13吗

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Doc] Building the project home page based on GitHub Pages

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Runtime]Support SourceReader DirtyCollector & Messenger

Is your feature request related to a problem? Please describe

Now we support source reader for unified arch in bitsail. In legacy mode, we already support messenger & dirty collector for rate limit & dirty collector features.

We want support this feature in SourceReader also.

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Connector] Support clickhouse sink connector

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Doc] Add BitSail Logo to the introduction section in ReadMe

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Connector] Migrate Redis legacy connector to V1 interface

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Core] Unified stream and batch hive row builder module

Is your feature request related to a problem? Please describe

At present, the row builder of batch channel and stream channel are implemented separately and need to be unified

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Connector] Support Druid sink connector

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Connector] Support Local file sink connector

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Connector] Support durid sink connector

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Runtime] Support latest JDK (11+)

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Doc] Add Elasticsearch Connector in ReadMe

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Example: I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[BitSail][Connector] Migrate mongoDB legacy connector to V1 interface

Is your feature request related to a problem? Please describe

Migrate mangoDB connector to V1 interface. We did the same for Redis #98

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.