moiot / gravity Goto Github PK

View Code? Open in Web Editor NEW

902.0 46.0 173.0 59.17 MB

A Data Replication Center

License: Apache License 2.0

Makefile 0.40% Go 99.01% Shell 0.59%

mysql storage replication change-data-capture gravity

gravity's People

Contributors

Stargazers

Watchers

Forkers

kayaklee lvleiice m00zh33 xiuxiu thy0108 xxj123go archlevel scsldb hustjieke vinsonzou chenkaigithub devopsmi trendingtechnology xialuo1990 g475479308 tabdba tovin-xu luyilei haochenxiao666 iziren jianhaiqing priestd09 lijn007 apple006 db-storage wy00757005 zhanglei isgasho beebol exchideas aylei keytj linuxerwang baixiaoshi percyzhang stateis0 ijiajia taohyson ancin kevin20chan awesomedatatool tangyanjun2004 kidmam scienjus antiman07 abioy chakra-coder junjuncheng jetbinliu romberli dongjiashun tonytanlin loveleon xifenmin helakelas redmoonja deanwei bigleon austinyan drlimd1987 sultan8252 tangjunhuelva csyjx opsfarmer duzhanyuan zwunix callmedba tangjian021 tom2jack wchb alimy cfeng2017 daihu bitriory jions7ihj 15901880620 zj15243885020 jerryhu1234 thinkdb glkappe treeyh star-table starttry godannylai syzhere recall704 crazynash jaxliangkui eagle2016 rongxr liuhuanhappystudy navyaijm2017 guanqingtao wangzongxian elonzh canxuegigi baijr0811 ramy-ahmed ratanasoth yrwyctb

gravity's Issues

mysql batch scan support sql or where clause

create table DDL not support default value for datetime/timestamp with explicit fractional seconds precision

the argument for function current_timestamp() is lost after parsing. waiting for response from pingcap/parser#186

configuration keys should be sorted in drc-admin

How to setup test mysql server

The link is for how mySQL needs to be configured is broken here:

https://github.com/moiot/gravity/blob/master/docs/2.0/01-quick-start-en.md

Can you let me know, so i can get started.

could also provide the sample from MySQL to TiDB?

As titled, thx

simplify the configure files

In [[input.config.table-configs]] and [[output.config.routes]] part when I have many tables in database order_1,eg abcdefg table, I just want to the table abcd to synchronization.current I need to write 4 section config,I wish to simplization the config,like this ：

[[output.config.routes]]
match-schema = "order_1"
match-table = ["a","b","c","d"]
target-schema = "OrderDB"
target-table = ["a","b","c","d"]

handle batch delete effectively

the mysql_replace engine's performance of huge transaction full of deletions(e.g. delete * from big_table) is bad (on our sandbox TiDB cluster, 5k row/sec with 200 connections) .

support insert ignore

Add new sql engine to support insert ignore into, it's very useful to fix data.

create internal table in input plugin

Create _gravity.gravity_txn_tags in the input plugin.

Could you provide a binary package for special environment，eg Production Which not allow connect to Internet，or I Just want to download it to Use. Do you Have QQ group or wechat group to talk about Gravity？

mognogdb, tidb kafka unique key sequence guarentee

when unique key chagnes, we should guarentee the event sequence

Add a configuration for user to deal with auto-increment key when using bidirectional

When doing bi-directional replication. There will be conflict when the primary key is auto increment (generated by mysql).

Postgesql

There is a WAL implementation in golang for postgesql here.

https://github.com/wal-g/wal-g

It does not require any add one to pistresql. But it needs a little work to integrate.

Is it something the team is interested in ? I can work on integration.

unify metrics

refactor metrics according our plugin architecture. The framework should collect tps and latency for input, filter, scheduler, output and end2end.

release 1.0-rc

Use english as default document, and use binary distribution in quick start directly @ming-relax
push helm into public repo, so that other people can install gravity cluster using a single helm command @Ryan-Git
push grafana json definition into repo @ming-relax

the master has purged binary logs containing GTIDs that the slave requires

使用过程中，停掉gravity一段时间后再启动复制，会出现错误:
ERROR 1236 (HY000): The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.

查看http://mysql.taobao.org/monthly/2016/01/08/，有可能是这个问题引起，gravity内部是否会处理

[input.mysql-batch] position serialization fails on small table without pk or uk

benchmark for scheduler

MySQL--> TiDB add special clolumn to record when the data in TiDB write to disk?

Because MySQL-->TiDB is asynchronous,the Gravity may down or the other reasons lead to lag,in some cases i want to know when the data write to TiDB,eg,query the incremental data.
So I need to add a special column for every table to record the time in target database. at current version,if i restart gravity it can't synchrone the data. I advice to add a special column for every table when synchronous data,eg _loadtime（alter table t add column _intime datetime(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6) ）. Then the table in target database will automatic add a column for example _loadtime to record the time when the data to target database ,and if restart Gravity it can replicate normal. The better way to deal with this question ,I think set a special configure options,when need you can turn it on,or you can turn it off,default set it OFF.

[MapString] unknown type: int32

Gravity version:

./gravity -V

Release Version (gravity): 0.0.1+git.
Git Commit Hash:
Git Branch:
UTC Build Time: 2019-03-12 12:21:13
the version may be 0.9.27, I clone the master branch and compiled by myself.

Error log：

{"file":"mysql_table_scanner.go","level":"info","line":185,"msg":"[LoopInBatch] prepare current: [{Value:{String:1467178125170318497 Valid:true} Type: Column:id}]","pipeline":"crm_mysql2tidb","time":"2019/03/14 11:20:32.051"}
{"file":"cache.go","level":"fatal","line":63,"msg":"[defaultPositionCache] ticker flush failed: mysqlbatch.BatchPositionValueV1.TableStates: mysqlbatch.TableStatsV1.Done: ScannedCount: EstimatedRowCount: Current: Min: []mysqlbatch.TablePosition: [MapString] unknown type: int32, column: Id\n/usr/local/go/src/github.com/moiot/gravity/pkg/position_store/cache.go:229: \n/usr/local/go/src/github.com/moiot/gravity/pkg/position_store/cache.go:169: ","pipeline":"crm_mysql2tidb","time":"2019/03/14 11:20:32.694"}

configure file like this:

#MySQL 到 TiDB 全量+增量同步
name="crm_mysql2tidb"
[input]
type = "mysql"
mode = "replication"
[input.config]
nr-scanner = 10
table-scan-batch = 10000
batch-per-second-limit = 1
max-full-dump-count = 10000
[input.config.source]
host = "172.16.1.86"
username = "_gravity"
password = "gravity"
port = 3310
[input.config.source-slave]
host = "172.16.1.87"
username = "_gravity"
password = "gravity"
port = 3310
#需要扫描的表
[[input.config.table-configs]]
schema = "yjp_broker"
table = "*"

[[input.config.table-configs]]
schema = "yjp_office"
table = "*"

[[input.config.table-configs]]
schema = "yjp_czbank"
table = "*"

#目标库表：
[output]
type = "mysql"
[output.config.target]
host = "172.17.14.19"
username = "root"
password = "yijiupi"
port = 4000

[output.config]
enable-ddl=true

[[output.config.routes]]
match-schema = "yjp_broker"
match-table = ""
target-schema = "BrokerDB"
target-table = ""

[[output.config.routes]]
match-schema = "yjp_office"
match-table = ""
target-schema = "OfficeDB"
target-table = ""

[[output.config.routes]]
match-schema = "yjp_czbank"
match-table = ""
target-schema = "BankDB"
target-table = ""

at TiDB I just create database,like this:

create database BrokerDB;
create database OfficeDB;
create databse BankDB;
then i start gravity:
/usr/local/gravity/gravity -config=/usr/local/gravity/scm_mysql2tidb.toml -http-addr=172.17.14.22:8091 -log-file=/usr/local/gravity/scm.log -L=info
and the table define like this ：
mysql> show create table brokerregionordersync\G
*************************** 1. row ***************************
Table: brokerregionordersync
Create Table: CREATE TABLE brokerregionordersync (
id bigint(20) NOT NULL COMMENT '主键',
orderid bigint(20) NOT NULL COMMENT '订单ID',
orderitemid bigint(20) NOT NULL COMMENT '订单项ID',
city_id int(11) NOT NULL COMMENT '城市id',
state tinyint(4) NOT NULL COMMENT '订单同步状态 1 下单 2 完成 3 取消 4 退货',
regionbrokerid int(11) DEFAULT NULL COMMENT '专区经纪人id',
crmchannelid bigint(20) DEFAULT NULL COMMENT 'CRM频道id',
salaryratio decimal(4,2) DEFAULT NULL COMMENT '业绩系数',
returnorderid bigint(20) DEFAULT NULL COMMENT '退货订单ID',
returnorderitemid bigint(20) DEFAULT NULL COMMENT '退货订单项ID',
createtime datetime NOT NULL COMMENT '创建时间',
lastmodifytime datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最后修改时间',
salarybrokerid int(11) DEFAULT NULL COMMENT '业绩经纪人id',
PRIMARY KEY (id),
KEY ix_orderid (orderid) USING BTREE COMMENT '订单id',
KEY ix_regionbrokerid (regionbrokerid) USING BTREE COMMENT '经纪人id',
KEY ix_city_id (city_id) USING BTREE COMMENT '城市id',
KEY ix_orderitemid (orderitemid) USING BTREE COMMENT '订单项id',
KEY ix_returnorderitemid (returnorderitemid) USING BTREE COMMENT '退货单项id'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='经纪人多专区订单同步表'
1 row in set (0.00 sec)

conclusion：

I guess the data type（bigint unsigned） in MySQL is not mapping well in Gravity soure code,so the error info is:unknown type: int32, column: Id

Please check.

schema store

Initial design https://github.com/moiot/gravity/blob/master/docs/rfc_schema_storage.md

we can consider use https://github.com/bytewatch/ddl-executor

安装问题

请问是怎么安装呢，make会报错

use go module

[batch table scheduler] each table buffer should have its own working set

Now one working set is used for all table buffers. As message in different table buffer can't be conflict, each table buffer should have its own working set.

MySQL -->kafka data type decimal(M,N) not support

the value of MySQL data type decimal(M,N) to kafka,the data is unknonw.

the original data:

ExactPayAmount: 142.00
OrderAmount: 142.00
PayableAmount: 142.00
ReduceAmount: 0.00
UseCouponAmount: 0.00
GiveBonusAmount: 0.00

by gravity to kafka:

{
"version": "0.1",
"database": "order_3",
"table": "orders",
"type": "insert",
"ts": 1553479471,
"time_zone": "Asia/Shanghai",
"host": "",
"data": {
"Address_Id": "1293184",
"City": "太原市",
"City_Id": "704",
"Contact": "郭生",
"CostWineScore": "[48 46 48 48]",
"County": "小店区",
"CreateTime": "2018-10-10 15:24:41",
"CreateUserId": "408501",
"DeliveryMode": "0",
"DetailAddress": "御龙庭底*****",
"ExactPayAmount": "[49 52 50 46 48 48]",
"GiveBonusAmount": "[48 46 48 48]",
"GiveCouponAmount": "[48 46 48 48]",
"GiveWineScore": "[48 46 48 48]",
"HasPayment": "0",
"Id": "704118101015312437",
"IsTestOrder": "0",
"LastModifyTime": "2019-01-27 00:05:00.1",
"LastUpdateTime": "2018-10-12 10:19:00",
"LastUpdateUserId": "891",
"OrderAmount": "[49 52 50 46 48 48]",
"OrderAttachedGiftRule_Id": null,
"OrderAuditTime": null,
"OrderBatch_Id": "704418101015312436",
"OrderCancelTime": "2018-10-12 10:19:00",
"OrderClassify": "0",
"OrderCompleteTime": null,
"OrderConfirmTime": "2018-10-10 15:24:41",
"OrderCreateTime": "2018-10-10 15:24:41",
"OrderDeliverTime": null,
"OrderNo": "704828300340",
"OrderPrintTime": null,
"OrderReducationRule_Id": null,
"OrderSyncTime": null,
"OrderType": "0",
"PayType": "0",
"PayableAmount": "[49 52 50 46 48 48]",
"PaymentState": null,
"Phone": "1873457****",
"ProductReduceAmount": "[48 46 48 48]",
"Province": "山西省",
"QrCodeFileId": null,
"ReduceAmount": "[48 46 48 48]",
"RemarkService": null,
"RemarkUser": "",
"SalesmanId": "762",
"State": "8",
"StateUser": "7",
"Street": "平阳路西二巷",
"SysRemark": "",
"UseAwardBonusAmount": null,
"UseBonusAmount": "[48 46 48 48]",
"UseCouponAmount": "[48 46 48 48]",
"UseCouponCodeAmount": "[48 46 48 48]",
"UseRewardBonusAmount": "[48 46 48 48]",
"UserCancelReason": null,
"UserCompanyName": "生鲜超市****",
"UserId": "408501",
"UserMobileNo": "1873457****",
"UserName": "郭*生",
"Verison": "1.0.0",
"ZipCode": null,
"area_id": null,
"onlineSalesmanDeptId": null,
"onlineSalesmanId": null,
"orderItemCount": "1",
"pickup_warehouse_Id": "7041",
"salesmanDeptId": "2",
"selfPickUpReduceAmount": "[48 46 48 48]",
"shop_id": null
},
"old": {},
"pks": {
"Id": "704118101015312437"
}
}

unknow data:

"ExactPayAmount": "[49 52 50 46 48 48]",
"GiveBonusAmount": "[48 46 48 48]",
"GiveCouponAmount": "[48 46 48 48]",
"GiveWineScore": "[48 46 48 48]",

myql table scanner cannot stop if source db deletes data

We should consider start a snapshot consistent read.

Feature check -- frond-end

Friends told me that Gravity has frond-end to configure the task with Visualization.
While i don't get the step in the repo how to use it. I wonder check whether the gravity has the frond-end and if exists how to deploy and use.

add test case for bidirection replication

Just want to synchronize special databases and tables ,however it scan full database

I just want to synchronize special databases and tables in thousands of tables,like this ：

需要扫描的表

- 必填

[[input.config.table-configs]]
schema = "yjp_productsku_1"
table = "productsku,productinfo,ProductSkuSalePolicy"
[[input.config.table-configs]]
schema = "yjp_productsku_2"
table = "productsku,ProductSkuSalePolicy"
[[input.config.table-configs]]
schema = "yjp_prodcutsku_3"
table = "productsku,ProductSkuSalePolicy"
[output]
type = "mysql"
[output.config.target]
host = "172.17.14.19"
username = "root"
password = "xyz"
port = 4000
[output.config]
enable-ddl=true
#分库分表合并
[[output.config.routes]]
match-schema = "jp_productsku_1"
match-table = "ProductSku"
target-schema = "ProductDB"
target-table = ""
[[output.config.routes]]
match-schema = "jp_productsku_1"
match-table = "ProductSkuSalePolicy"
target-schema = "ProductDB"
target-table = ""
[[output.config.routes]]
match-schema = "jp_productsku_2"
match-table = "ProductSku"
target-schema = "ProductDB"
target-table = ""
[[output.config.routes]]
match-schema = "jp_productsku_2"
match-table = "ProductSkuSalePolicy"
target-schema = "ProductDB"
target-table = ""
[[output.config.routes]]
match-schema = "jp_productsku_3"
match-table = "ProductSku"
target-schema = "ProductDB"
target-table = ""
[[output.config.routes]]
match-schema = "jp_productsku_3"
match-table = "ProductSkuSalePolicy"
target-schema = "ProductDB"
target-table = ""
#全局表
[[output.config.routes]]
match-schema = "jp_productsku_1"
match-table = "ProductInfo"
target-schema = "ProductDB"
target-table = "ProductInfo"

however by log or monitor,it scan all the database's table:
and how to wite the configure file to synchronize the tables and database which I just want. by history data and incremental data. The other are not to synchronize.

when MongoDB-->TiDB，please support MongoDB‘s _id

when MongoDB to TiDB，I need to use the values of field _id in mongodb's evey collections as primary key,In TiDB I can define it VARCHAR(32) to store.However in latest version,It appears like

\kԸ��
or
output exec error: Error 1366: incorrect utf8 value 5c9452256c6aac201560b38d(\\ufffdR%lj\ufffd \u0015`\ufffd\ufffd) for column _id\n

I guess it when get data from mongodb,it does't parse the _id field,just pass the _id 's value like:
ObjectId("5c949127edbfd32f36eea4cd")
not 5c949127edbfd32f36eea4cd.

configuration format for 1.0

Compatible with old configuration
Update docuement
Batch mode should exit on success
gravity-operator should create a Job when in batch mode

support mongo --> tidb

Support mongo to tidb

数据迁移场景

首先感谢坐着开源，在这里向问一下。目前我要把gravity放到一个数据迁移的场景下使用。
请问必须要在目标端创建跟源端一样的库及表才可以进行吗？
麻烦可以提供一下微信群号方便之后的学习跟交流。
谢谢。

Finish refactor position store

AWS RDS compatability

can provide an example for mongodb to tidb/mysql config file?

I want to the from mongodb to tidb/mysql ，but I don‘t know how configure it ? Could you provide an example for it.

schema store support

双数据中心只同步DDL的需求

Match specific DDL statement
Use hint in SQL to tag the internal DDL, so that we can synchronize DDL bidirectional.

需求1：两个数据源A，B，需要在A库的DDL同步到B库，B库的DDL同步到A库，目前该需求可以通过filter
[[filters]]
type = "reject"
match-dml-op = ["insert", "update", "delete"]
临时解决，
但是仍存在写循环的问题，即A生成的DDL通过binlog同步到B库，但是B库自己写入的binlog，会反向同步到A库，此时B->A的复制链路就会因为冲突而出错。

郁白给出的建议是通过gravity执行的SQL通过加hint的方式标记，B->A链路在处理B库产生的binlog的时候通过hint过滤，不再处理通过gravity执行的SQL，来解决写循环的问题。

需求2：细粒度区分DDL类型，create-table、alter table add 、 alter table drop ，alter table modify ....。提供matcher 进行过滤同步指定哪一类的DDL type。

sql execution engine should have a uniform way to generate sql

Problem:

For different batch size, we are handling it different when generating sql.

queue and store and backups

There is a golang library that is written by the golang team to allow your go code to use different Messages Queues and Stores.
So for Gravity make it easier to use different queues and stores, which is exactly what its intend is.

Message Queues
https://github.com/google/go-cloud/blob/master/pubsub/pubsub.go
There is no Kafka one yet, but it has NATS

Blob Store ( like s3)
https://github.com/google/go-cloud/tree/master/blob

NATS

I would like to add nats and minio.

This is easy to do I think with the existing code base and is 100% golang.

NATS is easy.

Minio also easy and has SQL.

Do you use Benthos also for the transformation to customise the data ?

detect circular when using bidirection pipeline

Support MySQL/TiDB --> ElasticSearch

Do you have a plan to support transfer the data of MySQL and TiDB to ElasticSearch?
ElasticSearch is more professional than mysql or tidb do fulltext index .

Oracle

I would like to add Oracle DB as a source.

Is the team ok with this ?

support DDL statement:drop table and drop database

up to 2019.02.27,gravity not support DDL statement:drop table .Wish to support it,if used not frequently can disabled default.

数据迁移场景下的使用

首先感谢作者的开源。在这里我想把gravity放到一个迁移的场景下进行使用。
这里问一下根据手册必要要在目标端创建跟源端一样的库及表结构才可以进行，请问这里可以不这样做吗？
方便的话可以提供一下微信或者qq群方便学习，交流。
谢谢， ^_^

how each instance of cluster server for different conf

If deploy in cluster mode, how can i distribute different toml config to different.
eg:if i had 10 different mysql2mysql.toml config file and depley cluster with 10 instances. how each instance server only for one mysql.toml in case of duplicate sync.

help

没有同步成功，提示信息如下
[mysql] 2019/01/05 17:26:05 packets.go:36: read tcp 10.0.1.4:46838->10.0.1.5:3380: i/o timeout
[mysql] 2019/01/05 17:26:37 packets.go:36: read tcp 10.0.1.4:47064->10.0.1.5:3380: i/o timeout

Error 1406: Data too long for column 'AuditRemark' at row 1

gravity version:

./gravity -V
Release Version (gravity): 0.0.1+git.dc0de6e
Git Commit Hash: dc0de6e
Git Branch: master
UTC Build Time: 2019-03-18 02:49:05
error message:
{"file":"batch_table_scheduler.go","level":"fatal","line":211,"msg":"[batchScheduler] output exec error: Error 1406: Data too long for column 'AuditRemark' at row 1\n/usr/local/go/src/github.com/moiot/gravity/pkg/sql_execution_engine/mysql_replace_engine.go:101: \n/usr/local/go/src/github.com/moiot/gravity/pkg/outputs/mysql/mysql.go:336: ","pipeline":"crm_mysql2tidb","time":"2019/03/18 10:59:01.054"}

the column AuditRemark's datatye is varchar(1024).

desc yjp_broker.brokerdisplaycontractcyclerecord;
+-------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------+------+-----+---------+----------------+
| Id | int(11) | NO | PRI | NULL | auto_increment |
| CycleId | int(11) | NO | | NULL | |
| Location | varchar(255) | YES | | NULL | |
| Longitude | double(13,10) | YES | | NULL | |
| Latitude | double(13,10) | YES | | NULL | |
| ImgId | varchar(128) | YES | | NULL | |
| UploadTime | datetime | YES | | NULL | |
| State | tinyint(4) | YES | | NULL | |
| CheckTime | datetime | YES | | NULL | |
| CreateTime | datetime | NO | | NULL | |
| CreateUser_Id | int(11) | NO | | NULL | |
| LastUpdateTime | datetime | YES | | NULL | |
| LastUpdateUser_Id | int(11) | YES | | NULL | |
| AuditRemark | varchar(1024) | YES | | NULL | |
+-------------------+---------------+------+-----+---------+----------------+
14 rows in set (0.01 sec)

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test
Database changed
mysql> CREATE TABLE t(id BIGINT,remark VARCHAR(255));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t(id,remark)values(1,REPEAT('汉',255));
Query OK, 1 row affected (0.00 sec)

mysql> select CHARACTER_LENGTH(remark),length(remark) from t;
+--------------------------+----------------+
| CHARACTER_LENGTH(remark) | length(remark) |
+--------------------------+----------------+
| 255 | 765 |
+--------------------------+----------------+
1 row in set (0.00 sec)

moiot / gravity Goto Github PK

gravity's People

Contributors

Stargazers

Watchers

Forkers

gravity's Issues

./gravity -V

需要扫描的表

- 必填

Recommend Projects

Recommend Topics

Recommend Org