pingcap / tiflow Goto Github PK
View Code? Open in Web Editor NEWThis repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
License: Apache License 2.0
This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
License: Apache License 2.0
Is your feature request related to a problem? Please describe:
Currently a user can't detect whether a replication task is running normally.
Describe the feature you'd like:
TiCDC should collect replication status, and provide a convenient way to query them, the status maybe include
Currently TiCDC could replicate some DDLs that are not compatible between upstream and downstream. If DDL is executed failed in downstream, the replication task will be stopped.
The compatibility issues exist in higher version TiDB and lower version TiDB, or TiDB and MySQL.
To improve user experience, we can:
Is your feature request related to a problem? Please describe:
CDC has a common use case as follows:
checkpoint ts
checkpoint ts
However the restore/import procedure could be very long and TiKV doesn't have a long enough GC interval for us to catch data from the checkpoint ts
, so the use case is often like this
checkpoint ts
.checkpoint ts
Describe the feature you'd like:
We need CDC to provide a new work mode:
when user creates a new chagnefeed, TiCDC should have some pre verification for the changefeed config, including
StartTs
must be larger than tikv_gc_safe_point
/ # ./cdc version
Release Version:
Git Commit Hash: 80ee381230d5b7a3181464ad874f9a54c9220184
Git Branch: master
UTC Build Time: 2019-12-13 09:54:47
Go Version: go version go1.13.4 linux/amd64
sh-4.2# ./tidb-server -V
Release Version: v4.0.0-alpha-516-g5466a3c31
Git Commit Hash: 5466a3c31bf4b93fb3a2c595dd6aeac46aca7b8e
Git Branch: HEAD
UTC Build Time: 2019-12-02 09:22:52
GoVersion: go version go1.13.4 linux/amd64
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
sh-4.2# ./tikv-server -V
TiKV
Release Version: 4.0.0-alpha
Git Commit Hash: 56dc6d63ade182289c4ab1e37996746040bc07d6
Git Commit Branch: cdc
UTC Build Time: 2019-11-06 03:29:57
Rust Version: rustc 1.39.0-nightly (c6e9c76c5 2019-09-04)
/ # ./pd-server -V
Release Version: v4.0.0-alpha-200-gf7f643c61
Git Commit Hash: f7f643c6138cc5240d954bfa1a560e3b14bfdc6e
Git Branch: HEAD
UTC Build Time: 2019-12-13 11:42:23
use test-infra to test cdc,
found that some tables failed to sync to mysql and some error log in cdc log
tidb tables:
+----------------+
| Tables_in_test |
+----------------+
| amfev |
| cmqwqm |
| dcnlvf |
| dofvv |
| eacnnohz |
| iyotimi |
| mnrbu |
| mvtabkee |
| nqjftinj |
| phfxkijuy |
| pklmxor |
| sflzhfns |
| sfrpmvpa |
| sxdho |
| t1576252933 |
| wzgzdkdwq |
| xlfmlygpi |
| zpmlqvdcr |
+----------------+
18 rows in set (0.00 sec)
mysql tables:
+----------------+
| Tables_in_test |
+----------------+
| amfev |
| cmqwqm |
| dcnlvf |
| dofvv |
| eacnnohz |
| mvtabkee |
| phfxkijuy |
| pklmxor |
| sflzhfns |
| sfrpmvpa |
| sxdho |
| wzgzdkdwq |
| xlfmlygpi |
| zpmlqvdcr |
+----------------+
14 rows in set (0.00 sec)
cdc log:
[2019/12/13 16:05:14.537 +00:00] [ERROR] [client.go:371] ["RPC error"] [error="rpc error: code = Canceled desc = context canceled"]
[2019/12/13 16:05:14.537 +00:00] [ERROR] [client.go:371] ["RPC error"] [error="rpc error: code = Canceled desc = context canceled"]
[2019/12/13 16:05:14.537 +00:00] [ERROR] [client.go:371] ["RPC error"] [error="rpc error: code = Canceled desc = context canceled"]
[2019/12/13 16:05:14.537 +00:00] [ERROR] [client.go:371] ["RPC error"] [error="rpc error: code = Canceled desc = context canceled"]
[2019/12/13 16:05:14.537 +00:00] [ERROR] [client.go:371] ["RPC error"] [error="rpc error: code = Canceled desc = context canceled"]
[2019/12/13 16:05:14.537 +00:00] [ERROR] [client.go:371] ["RPC error"] [error="rpc error: code = Canceled desc = context canceled"]
[2019/12/13 16:05:14.537 +00:00] [ERROR] [client.go:371] ["RPC error"] [error="rpc error: code = Canceled desc = context canceled"]
[2019/12/13 16:05:14.537 +00:00] [ERROR] [client.go:371] ["RPC error"] [error="rpc error: code = Canceled desc = context canceled"]
the full log: http://139.219.11.38:8000/qTl2Q/cdc-2019-12-13T16-05-14.537.log
Our first milestone is to make CDC works for syncing to MySQL/TiDB, which corresponds to the mysqlSink
in our code.
We should update the way Capture interact with Sink so that the necessary information can be passed to create a mysqlSink
Please answer these questions before submitting your issue. Thanks!
create table t (a int, b int as (a + 1) stored primary key);
insert into t(a) values (1),(2), (3);
update t set a = 10 where a = 1;
mysql [email protected]:aa> select * from t;
+----+----+
| a | b |
+----+----+
| 2 | 3 |
| 3 | 4 |
| 10 | 11 |
+----+----+
mysql [email protected]:aa> select * from t;
+---+---+
| a | b |
+---+---+
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
+---+---+
3 rows in set
topology:
Upstream TiDB: 172.16.6.206
Downstream: MySQL 5.6.46
[tidb@localhost tidb-ansible]$ /data1/tidb/deploy/bin/tidb-server -V
Release Version: v4.0.0-alpha-516-g5466a3c31
Git Commit Hash: 5466a3c31bf4b93fb3a2c595dd6aeac46aca7b8e
Git Branch: master
UTC Build Time: 2019-10-14 03:55:02
GoVersion: go version go1.13 linux/amd64
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
[tidb@localhost tidb-ansible]$ /data1/tidb/deploy/bin/tikv-server -V
TiKV
Release Version: 4.0.0-alpha
Git Commit Hash: 56dc6d63ade182289c4ab1e37996746040bc07d6
Git Commit Branch: cdc
UTC Build Time: 2019-11-06 03:29:57
Rust Version: rustc 1.39.0-nightly (c6e9c76c5 2019-09-04)
[tidb@localhost tidb-ansible]$ /data1/tidb/deploy/bin/pd-server -V
Release Version: v4.0.0-alpha-191-g7811255c
Git Commit Hash: 7811255c7345503ed5f44afb981bbf9712fd25c6
Git Branch: master
UTC Build Time: 2019-12-06 05:07:33
mysql -h 127.0.0.1 -P 4000 -u root
CREATE table test.simple2(id int primary key, val int);
CREATE table test.simple2(id int primary key, val int);
## start_ts=$(($(date +%s%N | cut -b1-13)<<18)) => 413114580074496000
INSERT INTO test.simple1(id, val) VALUES (1, 1);
INSERT INTO test.simple1(id, val) VALUES (2, 2);
INSERT INTO test.simple1(id, val) VALUES (3, 3);
UPDATE test.simple1 set val = 22 where id = 2;
DELETE from test.simple1 where id = 3
mysql -h 127.0.0.1 -P 3306 -u root -e 'create database test'
nohup /home/tidb/cdc server --pd-endpoints http://172.16.6.206:2379 &
/home/tidb/cdc cli --pd-addr http://172.16.6.206:2379 --start-ts=413114580074496000 --sink-uri 'root@tcp(127.0.0.1:3306)/test'
cdc.log:
[2019/12/09 21:20:10.843 -05:00] [DEBUG] [storage.go:302] ["handle job: "] ["sql query"="CREATE TABLE if not exists mysql.stats_top_n (\n\t\ttable_id bigint(64) NOT NULL,\n\t\tis_index tinyint(2) NOT NULL,\n\t\thist_id bigint(64) NOT NULL,\n\t\tvalue longblob,\n\t\tcount bigint(64) UNSIGNED NOT NULL,\n\t\tindex tbl(table_id, is_index, hist_id)\n\t);"] [job="ID:38, Type:create table, State:synced, SchemaState:public, SchemaID:3, TableID:37, RowCount:0, ArgLen:0, start time: 2019-12-09 21:14:55.003 -0500 EST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2019/12/09 21:20:10.843 -05:00] [DEBUG] [storage.go:221] ["create table success"] [name=mysql.stats_top_n] [id=37]
[2019/12/09 21:20:10.843 -05:00] [DEBUG] [storage.go:302] ["handle job: "] ["sql query"="CREATE TABLE IF NOT EXISTS mysql.expr_pushdown_blacklist (\n\t\tname char(100) NOT NULL\n\t);"] [job="ID:40, Type:create table, State:synced, SchemaState:public, SchemaID:3, TableID:39, RowCount:0, ArgLen:0, start time: 2019-12-09 21:14:55.103 -0500 EST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2019/12/09 21:20:10.843 -05:00] [DEBUG] [storage.go:221] ["create table success"] [name=mysql.expr_pushdown_blacklist] [id=39]
[2019/12/09 21:20:10.844 -05:00] [DEBUG] [storage.go:302] ["handle job: "] ["sql query"="CREATE TABLE IF NOT EXISTS mysql.opt_rule_blacklist (\n\t\tname char(100) NOT NULL\n\t);"] [job="ID:42, Type:create table, State:synced, SchemaState:public, SchemaID:3, TableID:41, RowCount:0, ArgLen:0, start time: 2019-12-09 21:14:55.153 -0500 EST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2019/12/09 21:20:10.844 -05:00] [DEBUG] [storage.go:221] ["create table success"] [name=mysql.opt_rule_blacklist] [id=41]
[2019/12/09 21:20:10.844 -05:00] [DEBUG] [storage.go:302] ["handle job: "] ["sql query"="CREATE table test.simple1(id int primary key, val int)"] [job="ID:44, Type:create table, State:synced, SchemaState:public, SchemaID:1, TableID:43, RowCount:0, ArgLen:0, start time: 2019-12-09 21:17:06.003 -0500 EST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2019/12/09 21:20:10.844 -05:00] [DEBUG] [storage.go:221] ["create table success"] [name=test.simple1] [id=43]
[2019/12/09 21:20:10.844 -05:00] [DEBUG] [storage.go:302] ["handle job: "] ["sql query"="CREATE table test.simple2(id int primary key, val int)"] [job="ID:46, Type:create table, State:synced, SchemaState:public, SchemaID:1, TableID:45, RowCount:0, ArgLen:0, start time: 2019-12-09 21:17:12.253 -0500 EST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2019/12/09 21:20:10.844 -05:00] [DEBUG] [storage.go:221] ["create table success"] [name=test.simple2] [id=45]
[2019/12/09 21:20:10.845 -05:00] [DEBUG] [client.go:228] ["singleEventFeed quit"]
[2019/12/09 21:20:10.845 -05:00] [INFO] [processor.go:353] ["Checkpoint worker exited"]
[2019/12/09 21:20:10.845 -05:00] [INFO] [client.go:235] ["EventFeed disconnected"] [span="{\"Start\":\"bURETEpvYkxp/3N0AAAAAAAA+QAAAAAAAABs\",\"End\":\"bURETEpvYkxp/3N0AAAAAAAA+QAAAAAAAABt\"}"] [checkpoint=413124368270098433] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/[email protected]/juju_adaptor.go:15\ngithub.com/pingcap/ticdc/cdc/kv.(*CDCClient).singleEventFeed\n\tgithub.com/pingcap/ticdc@/cdc/kv/client.go:408\ngithub.com/pingcap/ticdc/cdc/kv.(*CDCClient).partialRegionFeed.func1\n\tgithub.com/pingcap/ticdc@/cdc/kv/client.go:227\ngithub.com/pingcap/ticdc/pkg/retry.Run.func1\n\tgithub.com/pingcap/ticdc@/pkg/retry/retry.go:31\ngithub.com/cenkalti/backoff.RetryNotify\n\tgithub.com/cenkalti/[email protected]+incompatible/retry.go:37\ngithub.com/cenkalti/backoff.Retry\n\tgithub.com/cenkalti/[email protected]+incompatible/retry.go:24\ngithub.com/pingcap/ticdc/pkg/retry.Run\n\tgithub.com/pingcap/ticdc@/pkg/retry/retry.go:30\ngithub.com/pingcap/ticdc/cdc/kv.(*CDCClient).partialRegionFeed\n\tgithub.com/pingcap/ticdc@/cdc/kv/client.go:215\ngithub.com/pingcap/ticdc/cdc/kv.(*CDCClient).EventFeed.func1.1\n\tgithub.com/pingcap/ticdc@/cdc/kv/client.go:188\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1357"]
[2019/12/09 21:20:10.845 -05:00] [INFO] [scheduler.go:313] ["stop to run processor"] ["changefeed id"=245b6079-015f-4707-9f18-78bca094b6cf]
[2019/12/09 21:20:10.846 -05:00] [DEBUG] [client.go:228] ["singleEventFeed quit"]
[2019/12/09 21:20:10.846 -05:00] [ERROR] [server.go:80] ["run server"] [error="Error 1298: Unknown or incorrect time zone: 'UTC'\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/[email protected]/juju_adaptor.go:15\ngithub.com/pingcap/ticdc/cdc/sink.(*mysqlSink).Emit\n\tgithub.com/pingcap/ticdc@/cdc/sink/mysql.go:141\ngithub.com/pingcap/ticdc/cdc.(*processor).syncResolved\n\tgithub.com/pingcap/ticdc@/cdc/processor.go:587\ngithub.com/pingcap/ticdc/cdc.(*processor).Run.func3\n\tgithub.com/pingcap/ticdc@/cdc/processor.go:283\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1357"]
[2019/12/09 21:20:10.846 -05:00] [INFO] [client.go:235] ["EventFeed disconnected"] [span="{\"Start\":\"bURETEpvYkxp/3N0AAAAAAAA+QAAAAAAAABs\",\"End\":\"bURETEpvYkxp/3N0AAAAAAAA+QAAAAAAAABt\"}"] [checkpoint=413124368270098433] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/[email protected]/juju_adaptor.go:15\ngithub.com/pingcap/ticdc/cdc/kv.(*CDCClient).singleEventFeed\n\tgithub.com/pingcap/ticdc@/cdc/kv/client.go:408\ngithub.com/pingcap/ticdc/cdc/kv.(*CDCClient).partialRegionFeed.func1\n\tgithub.com/pingcap/ticdc@/cdc/kv/client.go:227\ngithub.com/pingcap/ticdc/pkg/retry.Run.func1\n\tgithub.com/pingcap/ticdc@/pkg/retry/retry.go:31\ngithub.com/cenkalti/backoff.RetryNotify\n\tgithub.com/cenkalti/[email protected]+incompatible/retry.go:37\ngithub.com/cenkalti/backoff.Retry\n\tgithub.com/cenkalti/[email protected]+incompatible/retry.go:24\ngithub.com/pingcap/ticdc/pkg/retry.Run\n\tgithub.com/pingcap/ticdc@/pkg/retry/retry.go:30\ngithub.com/pingcap/ticdc/cdc/kv.(*CDCClient).partialRegionFeed\n\tgithub.com/pingcap/ticdc@/cdc/kv/client.go:215\ngithub.com/pingcap/ticdc/cdc/kv.(*CDCClient).EventFeed.func1.1\n\tgithub.com/pingcap/ticdc@/cdc/kv/client.go:188\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1357"]
[2019/12/09 21:20:10.846 -05:00] [DEBUG] [capture_info.go:128] ["watchC from etcd close normally"]
[2019/12/09 21:20:10.846 -05:00] [INFO] [owner.go:372] ["handleWatchCapture quit"]
[2019/12/09 21:20:10.846 -05:00] [DEBUG] [etcd.go:205] ["update subchangefeed info success"] ["changefeed id"=6cdfb9e6-e0ec-4933-bd77-b269946cd685] ["capture id"=a3d0a077-497e-4b4a-a7c3-cb186e9e110d] [modRevision=232] [info="{\"checkpoint-ts\":0,\"resolved-ts\":413124368270098433,\"table-infos\":[{\"id\":45,\"start-ts\":413124328229699584}],\"table-p-lock\":null,\"table-c-lock\":null}"]
[2019/12/09 21:20:10.846 -05:00] [INFO] [processor.go:330] ["Local resolved worker exited"]
Please answer these questions before submitting your issue. Thanks!
start a CDC server
cdc cli capture list
returns the server
[2020/03/18 19:44:09.797 +08:00] [INFO] [root.go:47] ["init log"] [file=ticdc_1.log] [level=debug]
[2020/03/18 19:44:09.797 +08:00] [INFO] [version.go:34] ["Welcome to Change Data Capture (CDC)"] [release-version=v4.0.0-beta.2] [git-hash=63b1db95df26ef914bc1f1dc29ddfa4936100ff8] [git-branch=master] [utc-build-time="2020-03-13 09:45:32"] [go-version="go version go1.13 linux/amd64"]
[2020/03/18 19:44:09.797 +08:00] [INFO] [server.go:76] ["creating CDC server"] [pd-addr=http://hw-dt-wms-warp1-tidb01:2379] [status-host=127.0.0.1] [status-port=8301]
[2020/03/18 19:44:09.804 +08:00] [INFO] [capture.go:96] ["creating capture"] [capture-id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 19:44:09.805 +08:00] [INFO] [client.go:134] ["[pd] create pd client with endpoints"] [pd-address="[http://hw-dt-wms-warp1-tidb01:2379]"]
[2020/03/18 19:44:09.812 +08:00] [INFO] [base_client.go:226] ["[pd] update member urls"] [old-urls="[http://hw-dt-wms-warp1-tidb01:2379]"] [new-urls="[http://10.232.0.109:2379,http://10.232.0.166:2379,http://10.232.0.212:2379]"]
[2020/03/18 19:44:09.812 +08:00] [INFO] [base_client.go:242] ["[pd] switch leader"] [new-leader=http://10.232.0.212:2379] [old-leader=]
[2020/03/18 19:44:09.812 +08:00] [INFO] [base_client.go:92] ["[pd] init cluster id"] [cluster-id=6804742633952162675]
[2020/03/18 19:44:09.812 +08:00] [INFO] [http_status.go:54] ["status http server is running"] [addr=127.0.0.1:8301]
[2020/03/18 19:44:09.819 +08:00] [INFO] [manager.go:253] ["get owner"] [ownerID=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 19:44:09.819 +08:00] [INFO] [manager.go:223] ["campaign to be owner"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 19:44:09.819 +08:00] [DEBUG] [manager.go:269] ["watch owner key"] [key=/tidb/cdc/capture/owner/6aab70e2c63ea25e]
[2020/03/18 19:44:10.317 +08:00] [INFO] [owner.go:1263] ["start to watch processors"]
[2020/03/18 19:44:10.318 +08:00] [INFO] [owner.go:1213] ["monitoring processors"] [key=/tidb/cdc/processor/info] [rev=93442]
[2020/03/18 21:49:11.764 +08:00] [DEBUG] [manager.go:274] ["lost owner role, send retire notification"]
[2020/03/18 21:49:11.764 +08:00] [WARN] [manager.go:229] ["lost owner"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:49:11.764 +08:00] [INFO] [manager.go:187] ["etcd session is done, creates a new one"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:49:11.765 +08:00] [ERROR] [owner.go:1272] ["watch processor failed"] []
[2020/03/18 21:49:13.448 +08:00] [INFO] [manager.go:253] ["get owner"] [ownerID=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:49:13.448 +08:00] [INFO] [manager.go:223] ["campaign to be owner"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:49:13.448 +08:00] [DEBUG] [manager.go:269] ["watch owner key"] [key=/tidb/cdc/capture/owner/6aab70e2c63ee261]
[2020/03/18 21:49:13.733 +08:00] [INFO] [owner.go:1263] ["start to watch processors"]
[2020/03/18 21:49:13.734 +08:00] [INFO] [owner.go:1213] ["monitoring processors"] [key=/tidb/cdc/processor/info] [rev=97243]
[2020/03/18 21:51:42.785 +08:00] [DEBUG] [manager.go:274] ["lost owner role, send retire notification"]
[2020/03/18 21:51:42.785 +08:00] [WARN] [manager.go:229] ["lost owner"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:51:42.785 +08:00] [INFO] [manager.go:187] ["etcd session is done, creates a new one"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:51:42.785 +08:00] [ERROR] [owner.go:1272] ["watch processor failed"] []
[2020/03/18 21:51:55.899 +08:00] [ERROR] [manager.go:215] ["failed to campaign"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462] [error="etcdserver: request timed out"]
[2020/03/18 21:52:06.900 +08:00] [ERROR] [manager.go:215] ["failed to campaign"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462] [error="etcdserver: request timed out"]
[2020/03/18 21:52:17.901 +08:00] [ERROR] [manager.go:215] ["failed to campaign"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462] [error="etcdserver: request timed out"]
[2020/03/18 21:52:28.901 +08:00] [ERROR] [manager.go:215] ["failed to campaign"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462] [error="etcdserver: request timed out"]
[2020/03/18 21:52:30.285 +08:00] [INFO] [manager.go:253] ["get owner"] [ownerID=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:52:30.286 +08:00] [INFO] [manager.go:223] ["campaign to be owner"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:52:30.286 +08:00] [DEBUG] [manager.go:269] ["watch owner key"] [key=/tidb/cdc/capture/owner/6aab70e2c63ee387]
[2020/03/18 21:52:30.410 +08:00] [INFO] [owner.go:1263] ["start to watch processors"]
[2020/03/18 21:52:30.411 +08:00] [INFO] [owner.go:1213] ["monitoring processors"] [key=/tidb/cdc/processor/info] [rev=97424]
[2020/03/18 21:54:06.330 +08:00] [INFO] [manager.go:301] ["watch failed, owner is deleted"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:54:06.330 +08:00] [DEBUG] [manager.go:274] ["lost owner role, send retire notification"]
[2020/03/18 21:54:06.330 +08:00] [WARN] [manager.go:229] ["lost owner"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:54:06.330 +08:00] [ERROR] [owner.go:1272] ["watch processor failed"] []
[2020/03/18 21:54:06.331 +08:00] [ERROR] [manager.go:215] ["failed to campaign"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462] [error="etcdserver: requested lease not found"]
[2020/03/18 21:54:06.333 +08:00] [INFO] [manager.go:207] ["etcd session encounters the error of lease not found, closes it"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462] [error="etcdserver: requested lease not found"]
[2020/03/18 21:54:06.333 +08:00] [INFO] [manager.go:187] ["etcd session is done, creates a new one"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:54:06.340 +08:00] [INFO] [manager.go:253] ["get owner"] [ownerID=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:54:06.340 +08:00] [INFO] [manager.go:223] ["campaign to be owner"] [id=9fa3ce40-00e8-4153-873c-4dfcbf4b3462]
[2020/03/18 21:54:06.340 +08:00] [DEBUG] [manager.go:269] ["watch owner key"] [key=/tidb/cdc/capture/owner/6aab70e2c63ee446]
In current kv client, we process kv event Entry in a *cdcpb.Event_Entries_
one by one.
put entry to eventCh -> put a sorter item to sorter ->
put entry to eventCh -> put a sorter item to sorter ->
put entry to eventCh -> put a sorter item to sorter -> ...
So we will generate multiple commit event entry with the same commit ts to puller and one resolved ts event generated by the sorter, all of them have the same commit ts.
kv -> resolve -> kv -> kv
In fact this is not exactly as our design.
Describe alternatives you've considered:
We can remove the sorter mechanism and forward event totally based on the resolved ts from TiKV
the processor can only handle about 400 txn, and cpu of server < 10%
https://github.com/pingcap/ticdc/blob/d2621b3f0f65f33567fa6bf772b93e8b2aee1128/cdc/processor.go#L712
after avoiding this sleep style and doing nothing it can improve much more
Support TLS and online reload new certs.
Optional value: 1~5
1 Point for 1 Person/Work Day
Is your feature request related to a problem? Please describe:
The output from cdc -h
is not straightforward or providing an easy for a newbie to start to use CDC rapidly.
Describe the feature you'd like:
Please include at least the following information:
Is your feature request related to a problem? Please describe:
The block-allow-list config is not straightforward, we should provide a detail usage document for it
Is your feature request related to a problem? Please describe:
[coprocessor]
region-max-keys = 1200
region-split-keys = 1000
Describe the feature you'd like:
Currently CDC reuses the region cache lib
in TiDB, and is able to handle normal region split. But in the above benchmark scenario, it always fails.
We need to dig into this problem and make the kv client more robust.
We cant run kafka as we just dont like the legacy of java.
It would be awesome if you also support LIftBridge. It has a GRPC interface as well as a NATS interface.
Its basically like Kafka but written in Golang
https://github.com/liftbridge-io/liftbridge
Optional value: 1~5
1 Point for 1 Person/Work Day
The function partialRegionFeed
(in cdc/kv/client.go) accepts a region info from parameter and may reload region info from region cache before sending request. So it's possible that the region has changed after a split. As a result it will get a smaller region after calling regionCache.LocateKey
that doesn't cover the range hold by the parameter regionInfo
, so the remaining part of the range will be missing.
topology:
Upstream TiDB: 172.16.5.86
Downstream: MySQL 5.7.28
/data1/deploy1/bin/tidb-server -V
Release Version: v4.0.0-alpha-1148-g5da10ffec
Git Commit Hash: 5da10ffecc280136b2041801b23034c557e41751
Git Branch: HEAD
UTC Build Time: 2019-12-12 03:12:21
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
/data1/deploy1/bin/tikv-server -V
TiKV
Release Version: 4.0.0-alpha
Git Commit Hash: 38579ea3e2ed08dc5bd724b2c0cda82b4588c42f
Git Commit Branch: master
UTC Build Time: 2019-12-09 04:37:17
Rust Version: rustc 1.39.0-nightly (c6e9c76c5 2019-09-04)
/data1/deploy1/bin/tikv-server -V
TiKV
Release Version: 4.0.0-alpha
Git Commit Hash: 38579ea3e2ed08dc5bd724b2c0cda82b4588c42f
Git Commit Branch: master
UTC Build Time: 2019-12-09 04:37:17
Rust Version: rustc 1.39.0-nightly (c6e9c76c5 2019-09-04)
[tidb@localhost tidb-ansible]$ /data1/deploy1/bin/pd-server -V
Release Version: v4.0.0-alpha-197-gbd7b3f46
Git Commit Hash: bd7b3f46eef5dfb8241bcdcea27c68454b2f1f1c
Git Branch: master
UTC Build Time: 2019-12-12 02:16:14
../go-tpc/bin/go-tpc --time=400m tpch --host 172.16.5.86 -P 4000 -T 1 --sf=1 prepare // 灌数据
// get ts
+ mysql -h 172.16.5.86 -uroot -P4000 -e 'drop database if exists tmp_db' // 创建表
+ mysql -h 172.16.5.86 -uroot -P4000 -e 'create database tmp_db'
./resources/bin/cdc server --pd-endpoints http://172.16.5.86:2379 > cdc_server.log
$ cat cdc_server.log
Error: run server: schema 68 not found
Usage:
cdc server [flags]
Flags:
-h, --help help for server
--pd-endpoints string endpoints of PD, separated by comma (default "http://127.0.0.1:2379")
--status-addr string bind address for http status server (default "127.0.0.1:8300")
Global Flags:
--log-file string log file path (default "cdc.log")
--log-level string log level (etc: debug|info|warn|error) (default "debug")
run server: schema 68 not found
currently pass a DSN of mysql driver actually don't support other schema
consider just use pkg/loader from tidb-binlog
Replace Loader with Lightning
In DM, the loader.Loader
struct implements loading data from mydumper output files into TiDB. Since v3.0.3, Lightning supports the TiDB backend, which enables Lightning to do the same thing.
We consider the TiDB backend mode of Lightning the better implementation of the two, because:
loader.Loader
would fail;So we propose to replace Loader with Lightning.
loader.Loader
(which is an implementation of the Unit interface) with Lightning;task.yaml
Enhancement
Enhance the ability of load data.
1500
csuzhangxc
Close
Pause
and Resume
Status
, Error
, Type
, IsFreshTask
N/A
[2020/03/07 10:53:45.345 -05:00] [INFO] [mysql.go:97] ["execute DDL failed, but error can be ignored"] [query="create database cdc_bench"] [error="Error 1049: Unknown database 'test'"] [errorVerbose="Error 1049: Unknown database 'test'
github.com/pingcap/errors.AddStack
github.com/pingcap/[email protected]/errors.go:174
github.com/pingcap/errors.Trace
github.com/pingcap/[email protected]/juju_adaptor.go:15
github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).execDDL
github.com/pingcap/ticdc@/cdc/sink/mysql.go:109
github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).execDDLWithMaxRetries.func1
github.com/pingcap/ticdc@/cdc/sink/mysql.go:95
github.com/pingcap/ticdc/pkg/retry.Run.func1
github.com/pingcap/ticdc@/pkg/retry/retry.go:31
github.com/cenkalti/backoff.RetryNotify
github.com/cenkalti/[email protected]+incompatible/retry.go:37
github.com/cenkalti/backoff.Retry
github.com/cenkalti/[email protected]+incompatible/retry.go:24
github.com/pingcap/ticdc/pkg/retry.Run
github.com/pingcap/ticdc@/pkg/retry/retry.go:30
github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).execDDLWithMaxRetries
github.com/pingcap/ticdc@/cdc/sink/mysql.go:94
github.com/pingcap/ticdc/cdc/sink.(*mysqlSink).EmitDDLEvent
github.com/pingcap/ticdc@/cdc/sink/mysql.go:89
github.com/pingcap/ticdc/cdc.(*changeFeed).handleDDL
github.com/pingcap/ticdc@/cdc/owner.go:900
github.com/pingcap/ticdc/cdc.(*ownerImpl).handleDDL
github.com/pingcap/ticdc@/cdc/owner.go:811
github.com/pingcap/ticdc/cdc.(*ownerImpl).run
github.com/pingcap/ticdc@/cdc/owner.go:1118
github.com/pingcap/ticdc/cdc.(*ownerImpl).Run
github.com/pingcap/ticdc@/cdc/owner.go:1076
github.com/pingcap/ticdc/cdc.(*Capture).Start.func1
github.com/pingcap/ticdc@/cdc/capture.go:150
golang.org/x/sync/errgroup.(*Group).Go.func1
golang.org/x/[email protected]/errgroup/errgroup.go:57
runtime.goexit
runtime/asm_amd64.s:1357"]
/ # ./cdc version
Release Version:
Git Commit Hash: 80ee381230d5b7a3181464ad874f9a54c9220184
Git Branch: master
UTC Build Time: 2019-12-13 09:54:47
Go Version: go version go1.13.4 linux/amd64
sh-4.2# ./tidb-server -V
Release Version: v4.0.0-alpha-516-g5466a3c31
Git Commit Hash: 5466a3c31bf4b93fb3a2c595dd6aeac46aca7b8e
Git Branch: HEAD
UTC Build Time: 2019-12-02 09:22:52
GoVersion: go version go1.13.4 linux/amd64
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
sh-4.2# ./tikv-server -V
TiKV
Release Version: 4.0.0-alpha
Git Commit Hash: 56dc6d63ade182289c4ab1e37996746040bc07d6
Git Commit Branch: cdc
UTC Build Time: 2019-11-06 03:29:57
Rust Version: rustc 1.39.0-nightly (c6e9c76c5 2019-09-04)
/ # ./pd-server -V
Release Version: v4.0.0-alpha-200-gf7f643c61
Git Commit Hash: f7f643c6138cc5240d954bfa1a560e3b14bfdc6e
Git Branch: HEAD
UTC Build Time: 2019-12-13 11:42:23
use test-infra to test cdc, found that cdc server produced 5Gi logs in ten minutes
log:
[2019/12/13 16:05:14.537 +00:00] [ERROR] [client.go:371] ["RPC error"] [error="rpc error: code = Canceled desc = context canceled"]
[2019/12/13 16:05:14.537 +00:00] [ERROR] [client.go:371] ["RPC error"] [error="rpc error: code = Canceled desc = context canceled"]
[2019/12/13 16:05:14.537 +00:00] [ERROR] [client.go:371] ["RPC error"] [error="rpc error: code = Canceled desc = context canceled"]
Is your feature request related to a problem? Please describe:
sbtest_pk
add primary key(id);check downstream status
Describe the feature you'd like:
Describe alternatives you've considered:
[2020/03/11 23:47:04.772 +08:00] [DEBUG] [schema_storage.go:445] ["handle job: "] ["sql query"="alter table sbtest_pk add primary key(id)"] [job="ID:3484, Type:add primary key, State:synced, SchemaState:public, SchemaID:3413, TableID:3482, RowCount:0, ArgLen:0, start time: 2020-03-11 23:37:14.406 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:415220638759256066"]
[2020/03/11 23:47:04.772 +08:00] [DEBUG] [schema_storage.go:445] ["handle job: "] ["sql query"="drop table sbtest_pk"] [job="ID:3485, Type:drop table, State:synced, SchemaState:none, SchemaID:3413, TableID:3482, RowCount:0, ArgLen:0, start time: 2020-03-11 23:45:02.856 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2020/03/11 23:47:04.772 +08:00] [DEBUG] [schema_storage.go:367] ["drop table success"] [name=sbtest_pk] [id=3482]
[2020/03/11 23:47:04.772 +08:00] [DEBUG] [schema_storage.go:445] ["handle job: "] ["sql query"="CREATE TABLE `sbtest_pk` ( `id` int(11) NOT NULL, `k` int(11) NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '' )"] [job="ID:3487, Type:create table, State:synced, SchemaState:public, SchemaID:3413, TableID:3486, RowCount:0, ArgLen:0, start time: 2020-03-11 23:45:08.256 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2020/03/11 23:47:04.772 +08:00] [DEBUG] [schema_storage.go:383] ["create table success"] [name=cdc_sbtest.sbtest_pk] [id=3486]
currently every puller create a client, and every table will create a puller, so we may create many grpc client
In CDC we have the following replication model
1. kv client recvs data
2. kv client sends data to puller via an event chan
3. puller adds data to a buffer, sorts data and re-constructs transactions
4. puller sends transactions to tableInfo (managed in a processor) via a txn chan
5. procssor pulls all txns from txn chan of each tableInfo (with txn ts no more than CDC GlobalResolvedTs)
In the large number of regions test, we found the replication blocked with buffer in 3
, chan in 4
full and no data was pulled in step 5. Maybe this is also part reason for slow replication and low throughput. We should have a better data forward model, for the following consideration:
We can separate this refactor into multiple small changes, including:
so can support some sink don't care txn.
the implementation of the sink can reconstruct txn by resolve event internally if need.
currently, we will stop to consume kv events from tikv once the rest pipeline is slow.
implement a limit memory buffer and use it to buffer event from tikv, we should consume events from tikv ASAP and fail the changfeed if the rest pipeline of feed is too slow.
[2020/02/18 19:05:47.839 +08:00] [WARN] [disk.go:56] ["Mkdir temporary file error"] [tmpDir=/var/folders/nw/c0ncybdd6gj2f5w5tmqvk9y40000gn/T/tidb-server-tidb-server] [error="mkdir /var/folders/nw/c0ncybdd6gj2f5w5tmqvk9y40000gn/T/tidb-server-tidb-server: file exists"]
➜ ticdc git:(ana) ✗ fd disk.go ./vendor
vendor/github.com/pingcap/tidb/util/chunk/disk.go
vendor/github.com/shirou/gopsutil/disk/disk.go
cause tidb/util/chunk/disk.go
init a temporary dir(we start multi instance of tidb)
func init() {
err := os.RemoveAll(tmpDir) // clean the uncleared temp file during the last run.
if err != nil {
log.Warn("Remove temporary file error", zap.String("tmpDir", tmpDir), zap.Error(err))
}
err = os.Mkdir(tmpDir, 0755)
if err != nil {
log.Warn("Mkdir temporary file error", zap.String("tmpDir", tmpDir), zap.Error(err))
}
}
At about 2020/03/08 05:27:06.141 -04:00
the replication does not forward any anymore, because one of table's(sbtest3) resolved ts is not forward
CDC uses #308 version, TiKV uses 5kbpers/tikv@1765a5b version
➜ curl -s http://172.16.5.113:10080/tables/cdc_bench/sbtest3/regions |grep region_id
"region_id": 176,
"region_id": 192,
"region_id": 204,
"region_id": 160,
cdc log: http://139.219.11.38:8000/KJTrQ/issue_321_cdc.log.tar.gz
tikv log: http://139.219.11.38:8000/NK0s0/tikv.log.tar.gz
some abnormal behavior:
[2020/03/08 05:26:56.099 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=160]
but last resolved ts in TiKV is
[2020/03/08 05:29:05.462 -04:00] [INFO] [delegate.rs:279] ["resolved ts updated"] [resolved_ts=415146900050149856] [region_id=160]
[2020/03/08 05:26:56.099 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=176]
[2020/03/08 05:26:56.104 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 176, leader may None\" not_leader { region_id: 176 }))"] [conn_id=Some(ConnID(6))] [downstream_id=Some(DownstreamID(10))] [region_id=176]
make test
which bin/failpoint-ctl >/dev/null 2>&1 || CGO_ENABLED=0 GO111MODULE=on go build -trimpath -o bin/failpoint-ctl github.com/pingcap/failpoint/failpoint-ctl
mkdir -p "/tmp/tidb_cdc_test"
$(echo $(for p in $(go list ./...| grep -vE 'vendor|proto|ticdc\/tests'); do echo ${p#"github.com/pingcap/ticdc/"}|grep -v "github.com/pingcap/ticdc"; done) | xargs bin/failpoint-ctl enable >/dev/null)
ok github.com/pingcap/ticdc 0.070s coverage: 100.0% of statements
{"level":"info","ts":"2020-03-03T17:40:20.472+0800","caller":"embed/etcd.go:117","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:50545"]}
{"level":"info","ts":"2020-03-03T17:40:20.473+0800","caller":"embed/etcd.go:127","msg":"configuring client listeners","listen-client-urls":["http://localhost:50546"]}
{"level":"info","ts":"2020-03-03T17:40:20.474+0800","caller":"embed/etcd.go:299","msg":"starting an etcd server","etcd-version":"3.4.3","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.14","go-os":"darwin","go-arch":"amd64","max-cpu-set":12,"max-cpu-available":12,"member-initialized":false,"name":"default","data-dir":"/var/folders/nw/c0ncybdd6gj2f5w5tmqvk9y40000gn/T/check-2797722976074707835/0","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/folders/nw/c0ncybdd6gj2f5w5tmqvk9y40000gn/T/check-2797722976074707835/0/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://localhost:50545"],"advertise-client-urls":["http://localhost:2379"],"listen-client-urls":["http://localhost:50546"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"default=http://localhost:2380","initial-cluster-state":"new","initial-cluster-token":"etcd-cluster","quota-size-bytes":2147483648,"pre-vote":false,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":""}
{"level":"info","ts":"2020-03-03T17:40:20.566+0800","caller":"etcdserver/backend.go:79","msg":"opened backend db","path":"/var/folders/nw/c0ncybdd6gj2f5w5tmqvk9y40000gn/T/check-2797722976074707835/0/member/snap/db","took":"90.734245ms"}
fatal error: checkptr: unsafe pointer conversion
goroutine 157 [running]:
runtime.throw(0x6c41cad, 0x23)
/usr/local/go/src/runtime/panic.go:1112 +0x72 fp=0xc0007d97a0 sp=0xc0007d9770 pc=0x40379f2
runtime.checkptrAlignment(0xc0001d9370, 0x6a84ee0, 0x1)
/usr/local/go/src/runtime/checkptr.go:18 +0xb7 fp=0xc0007d97d0 sp=0xc0007d97a0 pc=0x4009617
go.etcd.io/bbolt.(*Bucket).write(0xc0007d9948, 0x0, 0x0, 0x0)
/Users/huangjiahao/go/pkg/mod/go.etcd.io/[email protected]/bucket.go:624 +0x15c fp=0xc0007d9838 sp=0xc0007d97d0 pc=0x59c87bc
go.etcd.io/bbolt.(*Bucket).CreateBucket(0xc000200018, 0x858a188, 0x7, 0x7, 0xc0007d9ba8, 0x5a862c
./resources/bin/br version
Release Version:
Git Commit Hash: 719cac031a89dff89e8c8d3f2c10d988bf401617
Git Branch: master
UTC Build Time: 2019-12-09 03:32:23
Race Enabled: false
../go-tpc/bin/go-tpc --time=400m tpch --host 172.16.5.86 -P 4000 -T 1 --sf=1 prepare
mysql -h 172.16.5.86 -uroot -P4000 -e 'drop database if exists tmp_db'
mysql -h 172.16.5.86 -uroot -P4000 -e 'create database tmp_db'
./resources/bin/cdc server --pd-endpoints http://172.16.5.86:2379 > cdc_server.log
./resources/bin/cdc cli --pd-addr http://172.16.5.86:2379 --start-ts 1 --sink-uri 'root@tcp(127.0.0.1:3306)/test'
kill -9 $(pgrep cdc)
./resources/bin/cdc server --pd-endpoints http://172.16.5.86:2379 > cdc_server.log
./resources/bin/cdc cli --pd-addr http://172.16.5.86:2379 --start-ts 1 --sink-uri 'root@tcp(127.0.0.1:3306)/test
$ cat cdc_server.log
[2019/12/12 13:49:29.296 +08:00] [WARN] [disk.go:56] ["Mkdir temporary file error"] [tmpDir=/tmp/tidb-server-cdc] [error="mkdir /tmp/tidb-server-cdc: file exists"]
test on pr #308
fatal error: sync: RUnlock of unlocked RWMutex
goroutine 508 [running]:
runtime.throw(0x1e130ef, 0x21)
runtime/panic.go:774 +0x72 fp=0xc0017637f0 sp=0xc0017637c0 pc=0x42f612
sync.throw(0x1e130ef, 0x21)
runtime/panic.go:760 +0x35 fp=0xc001763810 sp=0xc0017637f0 pc=0x42f595
sync.(*RWMutex).rUnlockSlow(0xc0008b64a0, 0xc0bfffffff)
sync/rwmutex.go:80 +0x3f fp=0xc001763838 sp=0xc001763810 pc=0x46f42f
sync.(*RWMutex).RUnlock(...)
sync/rwmutex.go:70
github.com/pingcap/ticdc/cdc/kv.(*CDCClient).receiveFromStream(0xc0006365d0, 0x2128360, 0xc000944000, 0xc0001cef90, 0xc000051fe0, 0x12, 0x5, 0x2145bc0, 0xc0007405a0, 0xc000445860, ...)
github.com/pingcap/ticdc@/cdc/kv/client.go:563 +0x3a4 fp=0xc001763ec8 sp=0xc001763838 pc=0x14391b4
github.com/pingcap/ticdc/cdc/kv.(*CDCClient).dispatchRequest.func1(0xc0008ad768, 0x0)
github.com/pingcap/ticdc@/cdc/kv/client.go:290 +0xc2 fp=0xc001763f58 sp=0xc001763ec8 pc=0x1443612
golang.org/x/sync/errgroup.(*Group).Go.func1(0xc0001cef90, 0xc00016ad20)
golang.org/x/[email protected]/errgroup/errgroup.go:57 +0x64 fp=0xc001763fd0 sp=0xc001763f58 pc=0xe0da34
runtime.goexit()
runtime/asm_amd64.s:1357 +0x1 fp=0xc001763fd8 sp=0xc001763fd0 pc=0x45f131
created by golang.org/x/sync/errgroup.(*Group).Go
golang.org/x/[email protected]/errgroup/errgroup.go:54 +0x66
full stdout log: http://139.219.11.38:8000/suZMH/20200306_1258_cdc_stdout.log
If this tool will be used by tikv alone in the future, then the name, tidb-cdc, is not accurate.
Is your feature request related to a problem? Please describe:
merge-schedule-limit = 0
in pd.toml to disable region mergesysbench --config-file=config oltp_insert --rand-seed=$RANDOM --tables=1 --table-size=8000000 prepare
mysql -h 172.16.5.113 -u root -P 4000 -e "split table cdc_bench.sbtest1 between (0) and (1100000) regions 1000"
mysql -h 172.16.5.113 -u root -P 4000 -e "split table cdc_bench.sbtest1 between (1100000) and (2200000) regions 1000"
mysql -h 172.16.5.113 -u root -P 4000 -e "split table cdc_bench.sbtest1 between (2200000) and (3300000) regions 1000"
mysql -h 172.16.5.113 -u root -P 4000 -e "split table cdc_bench.sbtest1 between (3300000) and (4400000) regions 1000"
➜ grep "cdc register region" tikv.log|wc -l
1024
Describe the feature you'd like:
TiKV EventFeed supports receiving data with regions more than 1024.
2020/03/10 04:28:40.560 -04:00
one TiKV is killed because of OOM. The CDC doesn't receive any region data anymore. (Except for a region_id=8)Note: has a special tikv config to test frequently region split
[coprocessor]
region-max-keys = 3000
region-split-keys = 2500
Have some doubts:
Is your feature request related to a problem? Please describe:
[coprocessor]
region-max-keys = 6000
region-split-keys = 5000
sysbench oltp_write_only --create_secondary=off --rand-seed=$RANDOM --tables=1 --table-size=10000000 prepare
sysbench oltp_write_only --create_secondary=off --rand-seed=$RANDOM --tables=1 --table-size=10000000 run
Describe the feature you'd like:
Besides found the span_frontier tasks too much CPU:
profile file: http://139.219.11.38:8000/oitOt/pprof.cdc.samples.cpu.005.pb.gz
very easy to happen.
look well
...
| tikv_gc_life_time | 10m0s | All versions within life time will not be collected by GC, at least 10m, in Go format. |
| tikv_gc_last_run_time | 20200218-18:51:14 +0800 | The time when last GC starts. (DO NOT EDIT) |
| tikv_gc_safe_point | 20200218-18:41:14 +0800 | All versions after safe point can be accessed. (DO NOT EDIT) |
+-----------------------+-------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
create changefeed ID: 04ab9a26-a5fb-42a4-a89f-7853ebf725e4 detail {"sink-uri":"root@tcp(127.0.0.1:3306)/","opts":{},"create-time":"2020-02-18T18:51:18.754343+08:00","start-ts":414717857956102145,"target-ts":0,"admin-job-type":0,"config":{"filter-case-sensitive":false,"filter-rules":null,"ignore-txn-commit-ts":null}}
You may now debug from another terminal. Press [ENTER] to exit.
log of cdc.server
Error: run server: create change feed 04ab9a26-a5fb-42a4-a89f-7853ebf725e4: create schema store failed: [tikv:9001]PD server timeout
Usage:
cdc server [flags]
Flags:
-h, --help help for server
--pd-endpoints string endpoints of PD, separated by comma (default "http://127.0.0.1:2379")
--status-addr string bind address for http status server (default "127.0.0.1:8300")
Global Flags:
--log-file string log file path (default "cdc.log")
--log-level string log level (etc: debug|info|warn|error) (default "debug")
run server: create change feed 04ab9a26-a5fb-42a4-a89f-7853ebf725e4: create schema store failed: [tikv:9001]PD server timeout
+08:00] [INFO] [client.go:134] ["[pd] create pd client with endpoints"] [pd-address="[http://127.0.0.1:2379]"]
[2020/02/18 18:51:17.738 +08:00] [INFO] [base_client.go:242] ["[pd] switch leader"] [new-leader=http://127.0.0.1:2379] [old-leader=]
[2020/02/18 18:51:17.738 +08:00] [INFO] [base_client.go:92] ["[pd] init cluster id"] [cluster-id=6794737329153784617]
[2020/02/18 18:51:17.738 +08:00] [INFO] [http_status.go:54] ["status http server is running"] [addr=0.0.0.0:8300]
[2020/02/18 18:51:17.771 +08:00] [INFO] [manager.go:253] ["get owner"] [ownerID=8db4a77e-bf7d-4a20-bc13-de2975abc096]
Is your feature request related to a problem? Please describe:
We have s simple processor aliveness check, basically check either resolvedTs or checkpointTs is updated in one minute, which doesn't meet the requirement of RTO < 30s
In some tests we found kv client may block or with some other reasons, the resolvedTs and checkpointTs can't be updated in time, which means the not real-time of replication status doesn't always mean abnormal of a processor
Describe the feature you'd like:
Design a better aliveness check strategy, which satisfies
Please answer these questions before submitting your issue. Thanks!
Please answer these questions before submitting your issue. Thanks!
./cdc cli --pd-addr=172.16.5.83:2329,172.16.5.84:2329,172.16.5.89:2329 --sink-uri="mysql://root:[email protected]:13307/" --start-ts 0
Error: [pd] failed to get cluster id
start a changefeed
[pd] failed to get cluster id"] [url=http://172.16.5.83:2329,172.16.5.84:2329,172.16.5.89:2329] [error="error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: address 172.16.5.83:2329,172.16.5.84:2329,172.16.5.89:2329: too many colons in address\" target:172.16.5.83:2329,172.16.5.84:2329,172.16.5.89:2329 status:TRANSIENT_FAILURE"]
Currently, TiCDC cli
outputs logs to cdc.log
by default and, as a command-line tool, it is not reasonable and not easy for user to debug problems. However if we log to stdout directly, we may get a log of noise logs.
To improve user experience when using cdc cli
, we can:
I'm going to bench the performance of CDC. I populate some data into the upstream TiDB cluster and after it's finished, I run cdc cli, and get some info like:
create changefeed detail &{SinkURI:root@tcp(127.0.0.1:3306)/test Opts:map[] CreateTime:2019-12-09 21:11:09.219849281 -0500 EST m=+0.017868850 StartTs:413114580074496000 TargetTs:0 Info:<nil>}
The sync task is running on cli server, so How do I know data is synced?
Please answer these questions before submitting your issue. Thanks!
What did you do?
If possible, provide a recipe for reproducing the error.
sh tests/run.sh --debug
sysbench load some data at upstream.
What did you expect to see?
replication works normally.
What did you see instead?
stop replicate after some time (no more data at down stream).
log of tikv.log continues printing (even after stop cdc server means we can make sure no any more request to tikv)
endless retry of tikv?
[2020/02/19 11:48:13.256 +08:00] [WARN] [endpoint.rs:255] ["region not found on incremental scan"] [region_id=48]
[2020/02/19 11:48:13.256 +08:00] [WARN] [endpoint.rs:255] ["region not found on incremental scan"] [region_id=48]
[2020/02/19 11:48:13.256 +08:00] [WARN] [endpoint.rs:255] ["region not found on incremental scan"] [region_id=48]
[2020/02/19 11:48:13.256 +08:00] [WARN] [endpoint.rs:255] ["region not found on incremental scan"] [region_id=48]
[2020/02/19 11:48:13.257 +08:00] [WARN] [endpoint.rs:255] ["region not found on incremental scan"] [region_id=48]
version of tikv: ad59724513ab83461c54c1996f89235301a036d7
the region not....
log in tikv is filtered
cdc.tar.gz
issue268.tar.gz
Is your feature request related to a problem? Please describe:
The original TiKV EventFeed API will be changed to a duplex stream. ref:
https://docs.google.com/document/d/1SN3ztOXy2QTlCS1Qp9dUWTBfxowx-nIGpkuCw2ccULM/edit
Describe the feature you'd like:
The kv client has a clear input and ouput:
Things need to be done:
this help test lag & latency regard the sink
Please answer these questions before submitting your issue. Thanks!
We have processors for each change feed task, the processor is essentially a goroutine. When all the tables processed by this processor are all removed, the processor should have stopped.
Reproduce steps:
Solution:
The processor exits when all its tables are dropped.
The processor keeps running.
Versions of the cluster
Upstream TiDB cluster version (execute SELECT tidb_version();
in a MySQL client):
master
TiCDC version (execute cdc version
):
master
TiCDC(TiDB Change Data Capture) is a new distributed incremental replication tool for TiDB ecosystem. TiCDC is still in development, but it already works properly in the experimental environment. When the TiCDC cluster starts, an owner will be voted and other nodes are named processor. The processors pull change key-value logs from TiKV, assemble logs into transactions, output to downstream data-target.The owner watches the replication progress of processors and coordinates them to ensure the transaction order.
Easy
2100
TiCDC distributed design(Chinese version)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.