Giter Club home page Giter Club logo

br's Introduction

br's People

Contributors

3pointer avatar 5kbpers avatar amyangfei avatar andremouche avatar caitinchen avatar disksing avatar dveeden avatar fengou1 avatar glorv avatar holys avatar huachaohuang avatar ianthereal avatar jmpotato avatar july2993 avatar kennytm avatar lance6716 avatar leavrth avatar lichunzhu avatar little-wallace avatar lonng avatar ninglin-p avatar overvenus avatar relax4life avatar silentsai avatar sleepymole avatar sunrunaway avatar tongtongyin avatar wangxiangustc avatar yujuncen avatar zjj2wry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

br's Issues

UCP: Stop retry on some error during restore.

Description

Current BR will use utils.WithRetry function to retry on function downloadSST and ingestSST. but there are not all error need this retry, for example, file NotFound error.
we need a function to distinguish this error and stop retry.

To improve user experience, we can:

  1. add a function to distinguish error in utils.WithRetry
  2. stop retry immediately when met unretryable error.

Score

  • 300

Mentor(s)

Recommended Skills

  • Go programming

Support timestamp parameter on backup

Feature Request

Describe your feature request related problem:

Sometimes user need one certain timestamp in different backup, such as

  • two different backup want the same timestamp snapshot

current timeago can't solve above problem.

Describe the feature you'd like:

support input timestamp on backup

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

Restore failure

Try new BR restore with below command:

./br restore full --connect "root@tcp(10.233.58.235:4000)/" --pd 10.233.9.221:2379 --storage "local:///var/lib/tikv/backup"

It fails as below (the logs are repeated continuously):

[2019/11/18 11:11:00.796 +00:00] [WARN] [import.go:93] ["download file failed"] [file="name:\"1_2_22_51_write.sst\" sha256:\"3dbe57eaeeeadae7272239064cbb22dd555d39e2b177b9ce08d5604894af938d\" start_key:\"t\\200\\000\\000\\000\\000\\000\\0003\" end_key:\"t\\200\\000\\000\\000\\000\\000\\0004\" start_version:412632649740320770 end_version:412632649740320770 crc64xor:3733601071017700193 total_kvs:100000 total_bytes:3800000 "] [region="id:100 start_key:\"t\\200\\000\\000\\000\\000\\000\\000\\3770_i\\200\\000\\000\\000\\000\\377\\000\\000\\001\\000\\000\\000\\000\\000\\372\" end_key:\"t\\200\\000\\000\\000\\000\\000\\000\\3770_r\\000\\000\\000\\000\\000\\372\" region_epoch:<conf_ver:5 version:24 > peers:<id:101 store_id:1 > peers:<id:102 store_id:43 > peers:<id:103 store_id:38 > "] [scanStartKey="t\ufffd\u0000\u0000\u0000\u0000\u0000\u0000\ufffd0\u0000\u0000\u0000\u0000\u0000\u0000\u0000\ufffd"] [scanEndKey="t\ufffd\u0000\u0000\u0000\u0000\u0000\u0000\ufffd1\u0000\u0000\u0000\u0000\u0000\u0000\u0000\ufffd"] [error="rpc error: code = Unknown desc = CannotReadExternalStorage(\"local:///var/lib/tikv/backup\", \"1_2_22_51_write.sst\", Os { code: 2, kind: NotFound, message: \"No such file or directory\" })"]
[2019/11/18 11:11:00.795 +00:00] [WARN] [import.go:93] ["download file failed"] [file="name:\"4_1005_22_49_default.sst\" sha256:\"f4b4504a40a436bcbcc3115c0505a6a09ac4533c3597e575621e894d3a0ea501\" start_key:\"t\\200\\000\\000\\000\\000\\000\\0001\" end_key:\"t\\200\\000\\000\\000\\000\\000\\0002\" start_version:412632649740320770 end_version:412632649740320770 crc64xor:8739028299443176440 total_kvs:100000 total_bytes:21200000 "] [region="id:92 start_key:\"t\\200\\000\\000\\000\\000\\000\\000\\377-_i\\200\\000\\000\\000\\000\\377\\000\\000\\001\\000\\000\\000\\000\\000\\372\" end_key:\"t\\200\\000\\000\\000\\000\\000\\000\\377-_r\\000\\000\\000\\000\\000\\372\" region_epoch:<conf_ver:5 version:24 > peers:<id:93 store_id:1 > peers:<id:94 store_id:43 > peers:<id:95 store_id:38 > "] [scanStartKey="t\ufffd\u0000\u0000\u0000\u0000\u0000\u0000\ufffd-\u0000\u0000\u0000\u0000\u0000\u0000\u0000\ufffd"] [scanEndKey="t\ufffd\u0000\u0000\u0000\u0000\u0000\u0000\ufffd.\u0000\u0000\u0000\u0000\u0000\u0000\u0000\ufffd"] [error="rpc error: code = Unknown desc = CannotReadExternalStorage(\"local:///var/lib/tikv/backup\", \"4_1005_22_49_default.sst\", Os { code: 2, kind: NotFound, message: \"No such file or directory\" })"]
^C

But I do have backup sst files under the correct directory:

/var/lib/tikv/backup # pwd
/var/lib/tikv/backup
/var/lib/tikv/backup # ls
1_2_22_51_default.sst     4_1005_22_49_default.sst  backupmeta
1_2_22_51_write.sst       4_1005_22_49_write.sst    localtmp

Refine `ExternalStorage` interface.

The interface ExternalStorage force us to save the entire file into memory when reading or writing.

// ExternalStorage represents a kind of file system storage
type ExternalStorage interface {
	// Write file to storage
	Write(ctx context.Context, name string, data []byte) error
	// Read storage file
	Read(ctx context.Context, name string) ([]byte, error)
	// FileExists return true if file exists
	FileExists(ctx context.Context, name string) (bool, error)
}

Suggest a new interface

// ExternalStorage represents a kind of file system storage
type ExternalStorage interface {
	// Write file to storage
	NewWriter(ctx context.Context, name string) (io.WriteCloser, error)
	// Read storage file
	NewReader(ctx context.Context, name string) (rc io.ReadCloser, size int64, err error)
	// FileExists return true if file exists
	FileExists(ctx context.Context, name string) (bool, error)
}

UCP: output log to /tmp by default, log backupmeta if task fails and always print summary to console

Description

Currently, BR outputs many logs to console by default and, as a command-line tool, they are meaningless and noisy to users.

To improve user experience, we can:

  1. By default, output logs to /tmp/br.log,
  2. always print summary to console, no matter success or fail,
  3. print backupmeta to log if backup/restore fails.

Note:

Score

  • 300

Mentor(s)

Recommended Skills

  • Go programming

Backup to common cloud storage

BR support common cloud storage

Overview

Integrate BR with common cloud object storage (S3, GCS, Azure Blob storage etc).

Problem statement

Currently, BR supports local storage where backup files are stored on local directory. But the backup files need to be collected together and copied to every TiKV node. This is difficult to use in practice, so it's better to mount an NFS like filesystem to every TiKV node and BR node. However, mounting NFS to every node is difficult to set up and error-prone.

Alternatively, object storage is better for this scenario, especially that it's quite common to backup to S3/GCS on public cloud.

TODO list

S3 Support (2400 points)

  • Extend current backup protobuf to support S3 compatible (S3, Ceph object storage, Minio, Aliyun OSS) storage (900 points / medium) @tennix pingcap/kvproto#507
  • Store BR metadata to S3 compatible object storage (600 points / easy) @DanielZhangQD #103
  • Store TiKV SST files to S3 compatible object storage (900 points / medium) @yiwu-arbug tikv/tikv#6209

GCS Support (1800 points)

TiDB Operator integration (900 points)

Test (2100 points)

  • Unit test coverage >= 60% (900 points / easy)
  • e2e test (1200 points / medium) #108

Mentors

Recommended skills

  • Go language
  • Rust language
  • Familiar with S3/GCS/Azure blob storage

Clearer behavior if restoring to a cluster that already contains data

Feature Request

Describe your feature request related problem:

Right now, restoring to a cluster to which a backup has already been restored generates a lot of "splitting regions failed" and ["Full Restore"] [progress=0.00%] messages.

br should be able to detect this situation and exit earlier and without so many confusing and repetitive messages.

br.out.txt

Integration test for S3

Feature Request

Describe your feature request related problem:

We need an integration E2E test against S3 storage, to ensure the implementation is correct. cc #89.

Describe the feature you'd like:

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

BR full restore won't rebase `auto_random` counter

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.
    BR didn't rebase auto_random after restore

    1. set tidb config auto_random to true.
    2. create table t(a int key auto_random); insert into t values (), (), ();
    3. br backup table --db t --table t --pd xxx -s xxx
    4. drop table t
    5. br restore table --db t --table t --pd xxx -s xxx
    6. insert into t values (),(),();

before backup:

MySQL [t]> select a & b'0000011111111111111111111111111' from t.t;
+----------------------------------------+
| a & b'0000011111111111111111111111111' |
+----------------------------------------+
|                                      1 |
|                                      2 |
|                                      3 |
+----------------------------------------+
  1. What did you expect to see?

after restore:

select a & b'0000011111111111111111111111111' from t.t;
+----------------------------------------+
| a & b'0000011111111111111111111111111' |
+----------------------------------------+
|                                      4 |
|                                      5 |
|                                      6 |
|                                      1 |
|                                      2 |
|                                      3 |
+----------------------------------------+
  1. What did you see instead?

after restore:

select a & b'0000011111111111111111111111111' from t.t;
+----------------------------------------+
| a & b'0000011111111111111111111111111' |
+----------------------------------------+
|                                      1 |
|                                      2 |
|                                      3 |
|                                      1 |
|                                      2 |
|                                      3 |
+----------------------------------------+
  1. What version of BR and TiDB/TiKV/PD are you using?
    master

Integrate BR into TiDB - Refactor BR to abstract out TiDB to avoid import cycle

Feature Request

Describe your feature request related problem:

BR is currently a CLI program. To some DBAs, this interface is not as straightforward as managing via SQL directly.

Describe the feature you'd like:

We would like to invoke BR directly from TiDB via SQL. This involves several parts:

  • pingcap/parser need to recognize the BR-related SQL statements.
  • pingcap/tidb need to forward the requests to the BR library.
  • pingcap/br (this repository) need to refactor a bit so TiDB can import it.

This issue mainly focuses on the last item.

BR itself imports various packages from TiDB. If such TiDB packages depend on BR again, we get an "import cycle". We need to modify BR to avoid those packages. (The cycle is checked at package level, not module level, so go.mod is hopefully irrelevant.)

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

The whole argument of --storage is stored in backupmeta which may include sensitive data

According to the current implementation, the argument for --storage will be stored in backup meta and will be retrieved and sent to TiKV during restore.
In this case, for backup & restore with cloud storage, e.g. AWS S3, keys will also be included in the backup meta, which is insecure and should be avoided.

  • Is it necessary to store the argument in the backup meta? The --storage option is required in each backup, checksum and restore command.
  • It's also insecure to specify the keys in the argument of --storage, we may set the keys or other sensitive data in environment variables and retrieve them and send them to TiKV in BR.

@kennytm @overvenus @tennix What do you think?

Support black-whist list

Feature Request

Describe your feature request related problem:

Currently, BR only support three modes of backup:

  • Full backup
  • One DB backup
  • One Table backup
    And we may need support more general way of backup like black-white list on DBs or Tables,

Describe the feature you'd like:

  1. Add config about black-white list, like DM black-white list
  2. use simple wildcard in command.
  3. other ideas

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

Br_ddl_full sometimes frozen

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    run integration test br_ddl_full.

  2. What did you expect to see?
    The test passes in acceptable time duration.

  3. What did you see instead?
    On the restore process, the process bar frozen at 100% for a very long period(about 1hr.).
    And BR will repeat trying to call DownloadSST and try to ingest it.
    Eventually the test will pass, but restore a real strange size of file, alone with huge kv pair size:

the restore result

Some information may be useful:
  • The Download RPC was totally called about 1,660 times, and at the last SST file, scan regions shows that there are about 834 regions, and grep the log "scan regions", summing the count field, the answer is about 1,600.
  1. What version of BR and TiDB/TiKV/PD are you using?
    The branch Yujuncen/design-doc, but the current master may meet this problem so.

Lots of empty region after restore

Region size is not updated during BR restore, this is because we forget to set Length of SSTMeta.

br/pkg/restore/util.go

Lines 140 to 149 in 008ec45

return import_sstpb.SSTMeta{
Uuid: id,
CfName: cfName,
Range: &import_sstpb.Range{
Start: rangeStart,
End: rangeEnd,
},
RegionId: region.GetId(),
RegionEpoch: region.GetRegionEpoch(),
}

The Length means the length of the SST file, so TiKV should record it during backup as requested in #95 .

Support TiDB v3.0?

Hi, I know BR only supports TiDB v3.1 and later versions.
Can we support TiDB v3.0?

Properly handle disconnected TiKV stores

Feature Request

Describe your feature request related problem:

Currently BR (especially restore) requires all stores to be online. If a TiKV store is removed, the whole process would fail.

Describe the feature you'd like:

BR should adapt to scaling up and down, especially for online restore.

Describe alternatives you've considered:

None yet.

Teachability, Documentation, Adoption, Migration Strategy:

None yet.

UCP: Add backup retry when some tikv get down

Description

Current BR will break immediately when tikv node get down during backup.we need to handle this error.

To improve user experience, we can:

  1. check the error whether can retry(include tikv get down) after push.Backup
  2. skip return error and use fineGrainedBackup to retry

Score

  • 300

Mentor(s)

Recommended Skills

  • Go programming

Support multiple databases / tables

Feature Request

Describe your feature request related problem:

Support backing up multiple tables in a single command without needing the --backupts trick.

Describe the feature you'd like:

A simplified version of #173. We support the following commands to backup/restore multiple databases or tables.

./br backup database --db firstdb --db seconddb 
#                                 ^~~~~~~~~~~~~
./br backup table --???? 'firstdb.firsttable' --???? '`second.db`.`2,tbl`'

Because the table name can contain a ., we need to support quoting by `.

Unresolved question:

  • Should we change the behavior of database --db 'db1,db2'? (currently backing up a single database named "db1,db2"
  • Should we change the behavior of database --db '`db`'? (currently backing up a single database named "`db`"
  • What should the --???? parameter be named?
  • Should we keep the table --db db1 --table tbl1 form?
  • Do we want to support table --???? 'db1.tbl1,db2.tbl2'?
  • Do we want to support table --db db1 --???? tbl1 --???? tbl2?
  • Do we want to support table --db db1 --db db2 --???? tbl1 --???? tbl2?
  • Should we support ANSI quotes table --???? '"db1,db2"."tbl1,tbl2"'?
  • Should we reject if the inputs aren't proper identifiers table --???? '!@#$.%^&*'?

Describe alternatives you've considered:

Don't support this and go straight to #173.

Teachability, Documentation, Adoption, Migration Strategy:

If we decide to name --???? as --table we'll need to rewrite all documentations.

BR should filter out TiFlash nodes

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

Backup from a cluster where some TiFlash nodes are running

  1. What did you expect to see?

Success.

  1. What did you see instead?

Backup failed with "Unimplemented" error in SendBackup(), on this line:

resp, err := bcli.Recv()

Since TiFlash cannot participate in BR anyway, we should just filter these nodes out.

  1. What version of BR and TiDB/TiKV/PD are you using?

3.1.0-beta.1

BR cannot restore when max-index-length>3072

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

    1. set tidb config max-index-length to 12288;
    2. create table t1 (a varchar(3072) primary key);
    3. br backup full -s local:///home/tidb/tmp/br
    4. drop table t1;
    5. br restore full -s local:///home/tidb/tmp/br
  2. What did you expect to see?
    restore success

  3. What did you see instead?
    Error: [ddl:1071]Specified key was too long; max key length is 3072 bytes

  4. What version of BR and TiDB/TiKV/PD are you using?
    br master
    tidb master

BR support table route when restore

Feature Request

Describe your feature request related problem:

when user already has a table A in cluster.
sometimes if table A partially broken.
user need recover table A from BR but don't want to drop origin table A.
so a kind of table route function is needed.

Describe the feature you'd like:

table route in restore.

Describe alternatives you've considered:

not sure if we need introduce config file to support this funciton.

Teachability, Documentation, Adoption, Migration Strategy:

Incremental Backup & Restore Project

Incremental Backup & Restore

Abstract

BR only supports full backup & restore for now, in this project we require to support incremental backup & restore, this is related to BR and TiKV.

Problem Statement

Currently, the procedure of backup is:

  1. BR gets a TSO(called BackupTS) from PD
  2. BR distributes tasks to TiKVs
  3. TiKV scans data with BackupTS, generates SSTs, store them to external storage

And the procedure of restore is:

  1. BR resets the TSO of PD with BackupTS(this will be a NOOP, if BackupTS <= current TSO of PD)
  2. BR creates the databases and tables, gets the new table schema, constructs key rewrite rules(for rewrite the table id and index id in the keys) with them
  3. BR distributes DownloadSST tasks to TiKVs
  4. TiKV downloads the given files, rewrites the keys, and save the result file
  5. BR distributes IngestSST tasks to TiKVs

Several problems are raised as we support incremental backup & restore:

  1. All the data should be scan during the incremental backup is in (last BackupTS, current BackupTS]. Then for more efficient incremental backup , we need a way to skip the unnecessary data.
  2. In full restore, There is no old data in the cluster. But in incremental restore, sometimes we need to restore data to a table which already has some records, and they should be covered by the restored data. So we must rewrite the restored data with the TSO from the new cluster, rather reset the TSO with the BackpTS.

Success Criteria

At least support the basic function of incremental backup & restore.

TODO list

  • TiKV completes DeltaScanner (500 points / medium)
  • TiKV supports backup data in specified TS range (500 points / medium)
  • BR supports incremental backup command (500 points / medium)
  • TiKV supports to skip SSTs based on MVCC properties (600 points / medium)
  • BR and TiKV supports rewrite TS in DownloadSST (600 points / medium)

Score

2700

Mentor(s)

Recommended skills

  • Golang
  • Rust

Support backup/restore VIEW

VIEW is an important feature of TiDB. For now, BR only support for independent VIEWs. We need to support more complex VIEWs(depending on other tables and views).

Problem: BR's internal TiDB version must match cluster's TiDB version.

TiDB has a concept called "bootstrap version" (tidb_server_version). It represents the schemas of the system tables. If a TiDB server with a newer bootstrap version (e.g. v41) is started on a storage with older version (e.g. v40), the TiDB server will perform a "rolling upgrade" (execute some DDL statements) to migrate the schemas to v41.

This action is also performed when BR uses its own TiDB to bootstrap the session.

dom, err := session.BootstrapSession(storage)

This brings us to the problem:

  • If BR version < cluster TiDB version, BR cannot understand system schemas and thus won't run.
  • If BR version > cluster TiDB version, BR will "rolling upgrade" and TiDB can't use the cluster anymore unless user also upgrades TiDB.

Thus the only safe solution is to ensure BR version = cluster TiDB version.

How should our version management cope with this?

This was revealed in #158 (https://internal.pingcap.net/idc-jenkins/blue/organizations/jenkins/br_ghpr_unit_and_integration_test/detail/br_ghpr_unit_and_integration_test/828/pipeline/) due to pingcap/tidb#14598 renaming the column password (v40) to authentication_string (v41) in the system table mysql.user.

Provide a local server to replace local storage

Feature Request

Describe your feature request related problem:

local:// is confusing because SST files are distributed into multiple machines on backup, and must be manually copied on restore.

An alternative without requiring cloud storage is that we automatically set up a local file server. TiKV would then read and write directly on the BR machine.

Describe the feature you'd like:

Since we have already implemented S3 read/write, we could embed a MinIO server into BR and advertise it directly.

We should add a new scheme br:///path/local/backup/archive, which will

  • automatically generate a random and temporary bucket, access key and secret key
  • start a MinIO server over :port, where the port is taken from --advertise-addr (-A)
  • translate to s3://generated_bucket/ when sent downstream to TiKV. The bucket links directly to /path/local/backup/archive
  • and otherwise the same as --s3.provider=minio.

Describe alternatives you've considered:

Besides S3, we could try other protocols like TFTP, NFS, custom HTTP, or custom gRPC. But in all these cases we need to implement more new code.

--s3.endpoint can be auto-generated using get_if_addrs, but the output may not be reliable.

Teachability, Documentation, Adoption, Migration Strategy:

Replace all mentions of local:// by the local S3 server.

Support TLS for security

Feature Request

Describe your feature request related problem:

When TiDB cluster enables TLS connection, BR won't work.

Describe the feature you'd like:

BR needs to support TLS connection for security.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

backup progress is inaccurate

When I do a full backup, it keeps doing backup and printing progress=100% for a long time after the it first prints progress=100%.

Logs are like this:

[2019/12/20 15:58:26.674 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:26.874 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:27.074 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:27.274 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:27.474 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:27.674 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:27.702 +08:00] [INFO] [client.go:683] ["range backuped"] [StartKey="dIAAAAAAAAFQX3KAAAAAAAZfmw=="] [EndKey="dIAAAAAAAAF
QX3KAAAAAAAvJXQ=="]
[2019/12/20 15:58:27.874 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:28.074 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:28.274 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:28.407 +08:00] [INFO] [client.go:683] ["range backuped"] [StartKey="dIAAAAAAAAFQX3KAAAAAAAvJXQ=="] [EndKey="dIAAAAAAAAF
QX3L//////////wA="]
[2019/12/20 15:58:28.408 +08:00] [INFO] [client.go:676] ["backup streaming finish"] [StoreID=1]
[2019/12/20 15:58:28.408 +08:00] [INFO] [client.go:379] ["finish backup push down"] [Ok=4]
[2019/12/20 15:58:28.408 +08:00] [INFO] [client.go:392] ["backup time range"] [StartVersion=0] [EndVersion=413356152248860683]
[2019/12/20 15:58:28.408 +08:00] [INFO] [client.go:405] ["backup range finished"] [take=3.237989556s]
[2019/12/20 15:58:28.408 +08:00] [INFO] [client.go:349] ["backup started"] [StartKey="dIAAAAAAAAFQX2mAAAAAAAAAAQA="] [EndKey="dIAAAAAAAAFQX2mAAAAAAAAAAfs="] [RateLimit=0] [Concurrency=4]
[2019/12/20 15:58:28.409 +08:00] [INFO] [push.go:24] ["new backup client"]
[2019/12/20 15:58:28.409 +08:00] [INFO] [client.go:664] ["try backup"] ["backup request"="{\"cluster_id\":6771082893009782418,\"start_key\":\"dIAAAAAAAAFQX2mAAAAAAAAAAQA=\",\"end_key\":\"dIAAAAAAAAFQX2mAAAAAAAAAAfs=\",\"end_version\":413356152248860683,\"concurrency\":4,\"storage_backend\":{\"Backend\":{\"Local\":{\"path\":\"/data2/yilin/bak/full\"}}}}"]
[2019/12/20 15:58:28.474 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:28.674 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:28.874 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:29.074 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:29.274 +08:00] [INFO] [progress.go:100] ["Database backup"] [progress=100.00%]
[2019/12/20 15:58:29.385 +08:00] [INFO] [client.go:683] ["range backuped"] [StartKey="dIAAAAAAAAFQX2mAAAAAAAAAAQA="] [EndKey="dIAAAAAAAAFQX2mAAAAAAAAAAfs="]
[2019/12/20 15:58:29.385 +08:00] [INFO] [client.go:676] ["backup streaming finish"] [StoreID=1]

BR should retry RegionError on BatchSplitRegions

Integration test fails #214 (comment)

pretty printed backtrace

[2020-03-30T09:59:53.436Z] [2020/03/30 17:59:53.217 +08:00] [ERROR] [restore.go:238] ["split regions failed"] [error="split region failed: region=id:3828 start_key:\"t\\200\\000\\000\\000\\000\\000\\003\\377\\275_r\\000\\000\\000\\000\\000\\372\" end_key:\"t\\200\\000\\000\\000\\000\\000\\003\\377\\277\\000\\000\\000\\000\\000\\000\\000\\370\" region_epoch:<conf_ver:20 version:1049 > peers:<id:3829 store_id:6 > peers:<id:3830 store_id:1 > peers:<id:3831 store_id:5 > , err=message:\"peer is not leader for region 3828, leader may Some(id: 3831 store_id: 5)\" not_leader:<region_id:3828 leader:<id:3831 store_id:5 > > "] [errorVerbose="split region failed: region=id:3828 start_key:\"t\\200\\000\\000\\000\\000\\000\\003\\377\\275_r\\000\\000\\000\\000\\000\\372\" end_key:\"t\\200\\000\\000\\000\\000\\000\\003\\377\\277\\000\\000\\000\\000\\000\\000\\000\\370\" region_epoch:<conf_ver:20 version:1049 > peers:<id:3829 store_id:6 > peers:<id:3830 store_id:1 > peers:<id:3831 store_id:5 > , err=message:\"peer is not leader for region 3828, leader may Some(id: 3831 store_id: 5)\" not_leader:<region_id:3828 leader:<id:3831 store_id:5 > > 
github.com/pingcap/br/pkg/restore.(*pdClient).BatchSplitRegions
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/restore/split_client.go:230
github.com/pingcap/br/pkg/restore.(*RegionSplitter).splitAndScatterRegions
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/restore/split.go:316
github.com/pingcap/br/pkg/restore.(*RegionSplitter).Split
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/restore/split.go:118
github.com/pingcap/br/pkg/restore.SplitRanges
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/restore/util.go:344
github.com/pingcap/br/pkg/task.RunRestore
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/task/restore.go:236
github.com/pingcap/br/cmd.runRestoreCommand
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/cmd/restore.go:21
github.com/pingcap/br/cmd.newDbRestoreCommand.func1
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/cmd/restore.go:93
github.com/spf13/cobra.(*Command).execute
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:826
github.com/spf13/cobra.(*Command).ExecuteC
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:914
github.com/spf13/cobra.(*Command).Execute
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:864
github.com/pingcap/br.main
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/main.go:54
github.com/pingcap/br.TestRunMain.func1
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/main_test.go:39
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1357"] [stack="github.com/pingcap/log.Error
	/go/pkg/mod/github.com/pingcap/[email protected]/global.go:42
github.com/pingcap/br/pkg/task.RunRestore
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/task/restore.go:238
github.com/pingcap/br/cmd.runRestoreCommand
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/cmd/restore.go:21
github.com/pingcap/br/cmd.newDbRestoreCommand.func1
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/cmd/restore.go:93
github.com/spf13/cobra.(*Command).execute
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:826
github.com/spf13/cobra.(*Command).ExecuteC
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:914
github.com/spf13/cobra.(*Command).Execute
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:864
github.com/pingcap/br.main
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/main.go:54
github.com/pingcap/br.TestRunMain.func1
	/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/main_test.go:39"]

resp, err := client.SplitRegion(ctx, &kvrpcpb.SplitRegionRequest{
Context: &kvrpcpb.Context{
RegionId: regionInfo.Region.Id,
RegionEpoch: regionInfo.Region.RegionEpoch,
Peer: peer,
},
SplitKeys: keys,
})
if err != nil {
return nil, err
}
if resp.RegionError != nil {
return nil, errors.Errorf("split region failed: region=%v, err=%v", regionInfo.Region, resp.RegionError)
}

BR should retry on RegionError:

  • NotLeader
  • RegionNotFound
  • EpochNotMatch
  • ServerIsBusy
  • StaleCommand

Support restoring to a cluster containing TiFlash

Feature Request

Describe your feature request related problem:

Currently we'll reject BR restore if the cluster contains TiFlash, because TiFlash doesn't support all features of sst_importer yet.

Describe the feature you'd like:

We should be able to skip TiFlash nodes during (full, offline) restore.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

BR restore progress bar is inaccurate

  1. What did you do?
    Perform restore with dataset 2T size, restore progress bar is inaccurate, after check the code found that progress has increase more than len(keys) when split meet error, need to fix it

  2. What did you expect to see?
    accurate progress bar during restore .

  3. What did you see instead?
    100% progress for a long time.

  4. What version of BR and TiDB/TiKV/PD are you using?
    BR master

BR should output the commitTS after backup finished

Mydumper will output the commitTS after backup finished, it's better that BR also supports this which is useful to know the backup position.

		Started dump at: 2019-06-13 10:00:04
		SHOW MASTER STATUS:
			Log: tidb-binlog
			Pos: 409054741514944513
			GTID:

		Finished dump at: 2019-06-13 10:00:04

CI is unstable

The current tests fail CI almost every time, we need to find out why and stabilize it.

BR cannot incremental restore when met DDL with auto random DDL

  1. What did you do?
    incremental restore met DDL with auto random would fail, reproduce it with follow step
    1. Full backup
    2. create table ta(a int(11) NOT NULL /*T!30100 AUTO_RANDOM(5) */, PRIMARY KEY (a))
    3. Incremental backup
    4. Full restore
    5. Incremental restore will fail
Invalid auto random: auto_random is an experimental feature, which can only be used when allow-auto-random is enabled. This can be changed in the configuration.
  1. What did you expect to see?
    incremental restore success

  2. What did you see instead?

Invalid auto random: auto_random is an experimental feature, which can only be used when allow-auto-random is enabled. This can be changed in the configuration.
  1. What version of BR and TiDB/TiKV/PD are you using?
    br 3.1-rc
    br 4.0-rc

Empty TiDB cluster Backup need restore

Feature Request

Describe your feature request related problem:

When we backup a empty TiDB cluster, It will generate a empty bucket folder in storage, and when I restore it , it will failed.

Describe the feature you'd like:

I want the backup of empty TiDB cluster can be restore

Local storage - when an SST file is missing, try to copy from its peer

Feature Request

Describe your feature request related problem:

Currently, if we restore from local:// storage, the entire backup archive must be copied to every TiKV machines. While we did warn for this caveat, not every user follow the instruction correctly. In particular, users testing BR who just backup and then immediately restore will encounter error.

Copying also means the actual storage space is multiplied by the number of machines, which is not affordable in some use cases.

Describe the feature you'd like:

Rather than letting users to copy the files themselves (which is error prone), we try to automatically transfer the SST files if it is missing.

We will add two gRPC APIs to sst_importer (or a new service):

  1. FileExists(local_path_name: string) -> bool
  2. ReadFile(local_path_name: string) -> stream<bytes>

When reading from local:// storage, we will perform the following:

function read(local_path_name: string) -> stream<bytes> {
    if (file_exists(local_path_name)) {
        return open(local_path_name);
    }
    let peers_with_file = get_peers().filter(p => p.FileExists(local_path_name));
    if (peers_with_file.is_empty()) {
        throw FileNotFound(local_path_name);
    }
    return peers_with_file[0].ReadFile(local_path_name);
}

Implementation details:

  • The peers_with_file[0] can be replaced by better balancing strategy
  • FileExists returning "unimplemented" or other errors should be treated as "false". Therefore this strategy is backward-compatible.
  • The peer list should be dynamically retrieved.

Describe alternatives you've considered:

An alternative is deprecate local:// entirely and stop mentioning it in all documentations.

Teachability, Documentation, Adoption, Migration Strategy:

We could stop "requiring" copying SSTs to every machine, but it is still recommended to reduce network traffic.

No sst file on one TiKV after full backup

Try new br with full backup.
Here is the cluster:

[[email protected] tidb-cluster]$kubectl get pod -n dan5
NAME                                     READY   STATUS                       RESTARTS   AGE
dan5-pd-0                                1/1     Running                      0          112m
dan5-pd-1                                1/1     Running                      0          113m
dan5-pd-2                                1/1     Running                      0          114m
dan5-tidb-0                              2/2     Running                      0          42m
dan5-tidb-1                              2/2     Running                      0          43m
dan5-tikv-0                              1/1     Running                      0          111m
dan5-tikv-1                              1/1     Running                      0          112m
dan5-tikv-2                              1/1     Running                      0          112m

Run backup with below command:

./br backup full --pd 10.233.40.168:2379 --storage local:///var/lib/tikv/backup

Check result:

[[email protected] tidb-cluster]$kubectl exec -it dan5-tikv-0 -n dan5 -- sh -c "ls -lartR /var/lib/tikv/backup"
/var/lib/tikv/backup:
total 20324
drwxr-xr-x    7 root     root           102 Nov 18 09:17 ..
drwxr-xr-x    2 root     root             6 Nov 18 09:17 localtmp
-rw-r--r--    1 root     root       1731074 Nov 18 09:17 4_1005_22_49_write.sst
-rw-r--r--    1 root     root      19077269 Nov 18 09:17 4_1005_22_49_default.sst
drwxr-xr-x    3 root     root            84 Nov 18 09:17 .

/var/lib/tikv/backup/localtmp:
total 0
drwxr-xr-x    3 root     root            84 Nov 18 09:17 ..
drwxr-xr-x    2 root     root             6 Nov 18 09:17 .
[[email protected] tidb-cluster]$kubectl exec -it dan5-tikv-2 -n dan5 -- sh -c "ls -lartR /var/lib/tikv/backup"
/var/lib/tikv/backup:
total 20328
drwxr-xr-x    7 root     root           112 Nov 18 08:21 ..
drwxr-xr-x    2 root     root             6 Nov 18 09:17 localtmp
-rw-r--r--    1 root     root          4589 Nov 18 09:17 backupmeta
-rw-r--r--    1 root     root       1726923 Nov 18 09:17 1_2_22_51_write.sst
-rw-r--r--    1 root     root      19076989 Nov 18 09:17 1_2_22_51_default.sst
drwxr-xr-x    3 root     root            96 Nov 18 09:17 .

/var/lib/tikv/backup/localtmp:
total 0
drwxr-xr-x    3 root     root            96 Nov 18 09:17 ..
drwxr-xr-x    2 root     root             6 Nov 18 09:17 .
[[email protected] tidb-cluster]$kubectl exec -it dan5-tikv-1 -n dan5 -- sh -c "ls -lartR /var/lib/tikv/backup"
/var/lib/tikv/backup:
total 0
drwxr-xr-x    2 root     root             6 Nov 18 09:17 localtmp
drwxr-xr-x    7 root     root           102 Nov 18 09:17 ..
drwxr-xr-x    3 root     root            22 Nov 18 09:17 .

/var/lib/tikv/backup/localtmp:
total 0
drwxr-xr-x    3 root     root            22 Nov 18 09:17 ..
drwxr-xr-x    2 root     root             6 Nov 18 09:17 .

However, there is no backup file on tikv-1 even if there are 20 region leaders of 22 regions totally on it:

[[email protected] test]$curl http://10.233.40.168:2379/pd/api/v1/store/5
{
  "store": {
    "id": 5,
    "address": "dan5-tikv-1.dan5-tikv-peer.dan5.svc:20160",
    "labels": [
      {
        "key": "host",
        "value": "172.16.4.249"
      }
    ],
    "version": "3.0.3",
    "state_name": "Up"
  },
  "status": {
    "capacity": "440.9GiB",
    "available": "427.7GiB",
    "leader_count": 20,
    "leader_weight": 1,
    "leader_score": 20,
    "leader_size": 20,
    "region_count": 22,
    "region_weight": 1,
    "region_score": 112,
    "region_size": 112,
    "start_ts": "2019-11-18T08:01:55Z",
    "last_heartbeat_ts": "2019-11-18T09:59:56.491943663Z",
    "uptime": "1h58m1.491943663s"
  }
}

Please help confirm if this is expected behavior since per design, the sst files should be created by region leaders.

BTW, checksum fails because br wants to check the files on remote tikv, not sure if I run the command missing some options:

/var/lib/tikv # ./br meta checksum -s local:///var/lib/tikv/backup/
[2019/11/18 09:47:46.458 +00:00] [INFO] [version.go:22] ["Welcome to Backup & Restore (BR)"]
[2019/11/18 09:47:46.458 +00:00] [INFO] [version.go:23] [BR] [release-version=]
[2019/11/18 09:47:46.458 +00:00] [INFO] [version.go:24] [BR] [git-hash=bbd137c28cefc595ba8a3c8c98db378028a7b553]
[2019/11/18 09:47:46.459 +00:00] [INFO] [version.go:25] [BR] [git-branch=master]
[2019/11/18 09:47:46.459 +00:00] [INFO] [version.go:26] [BR] [utc-build-time="2019-11-18 08:12:51"]
[2019/11/18 09:47:46.459 +00:00] [INFO] [version.go:43] [arguments] [ca=] [cert=] [help=false] [key=] [log-file=] [log-level=info] [pd=127.0.0.1:2379] [slow-log-file=] [status-addr=] [storage=local:///var/lib/tikv/backup/]
[2019/11/18 09:47:46.472 +00:00] [INFO] [meta.go:85] ["file info"] [table=sbtest1] [file=4_1005_22_49_default.sst] [crc64xor=8739028299443176440] [totalKvs=100000] [totalBytes=21200000] [startVersion=412632649740320770] [endVersion=412632649740320770] [startKey=dIAAAAAAAAAx] [endKey=dIAAAAAAAAAy]
Error: open /var/lib/tikv/backup/4_1005_22_49_default.sst: no such file or directory
open /var/lib/tikv/backup/4_1005_22_49_default.sst: no such file or directory
github.com/pingcap/errors.AddStack
        /root/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:174
github.com/pingcap/errors.Trace
        /root/go/pkg/mod/github.com/pingcap/[email protected]/juju_adaptor.go:15
github.com/pingcap/br/cmd.NewMetaCommand.func2
        /home/dev/br/cmd/meta.go:99
github.com/spf13/cobra.(*Command).execute
        /root/go/pkg/mod/github.com/spf13/[email protected]/command.go:826
github.com/spf13/cobra.(*Command).ExecuteC
        /root/go/pkg/mod/github.com/spf13/[email protected]/command.go:914
github.com/spf13/cobra.(*Command).Execute
        /root/go/pkg/mod/github.com/spf13/[email protected]/command.go:864
main.main
        /home/dev/br/main.go:56
runtime.main
        /usr/local/go/src/runtime/proc.go:203
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1357

BR cannot restore to `alter-primary-key = true` server

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

First set up a TiDB with alter-primary-key = true.

Create a table:

create table test.a (a int primary key, b int unique);
insert into test.a values (42, 42);

Run backup

bin/br backup table --db test -t a -s 'local:///tmp/brapk'

Drop the table

drop table test.a;

Restore

bin/br restore table --db test -t a -s 'local:///tmp/brapk'
  1. What did you expect to see?

Restore success and checksum pass

  1. What did you see instead?

An "ranges overlapped" error

[2020/03/27 13:38:22.085 +00:00] [ERROR] [restore.go:225] ["split regions failed"] [error="ranges overlapped: [74800000000000003b5f69800000000000000100 74800000000000003b5f698000000000000001fb],
  1. What version of BR and TiDB/TiKV/PD are you using?

Cluster: 4.0
BR: master

RawKV Backup & Restore Project

RawKV Backup & Restore Project

Abstract

In this project we needs focus on TiKV and BR project. and we need to

  • Implement Raw KV backup and restore based on current BR project.

Problem Statement

For now TiKV is an independent CNCF project, the ability of backup and restore data is become more important, so we need to implement backup and restore data in TiKV with BR.
Raw KV is a concept of TiKV and irrelevant with tidb, current BR already support backup Txn KV from TiDB TiKV cluster and restore Txn KV data to new cluster.
we can reuse above process to implement Raw KV backup and restore, so the main process can be divided into two section.
Backup:

  • Use BR Backup Command to send backup request to TiKV cluster.
  • TiKV receive backup request then start to backup Raw KV, then put data into specify storage(local storage for now).

Restore:

  • Use BR Restore Command to send restore request to TiKV cluster.
  • TiKV restore data from specify storage.

Success Criteria

  • With BR program, one can backup Raw KV from exist TiKV cluster, and restore to new TiKV cluster successfully.

TODO list

  • TiKV use RegionIterator to scan Raw KV by given key range. (500 points / medium) detail
  • TiKV restore raw kv depends on current TiKV restore logic. (500 points / medium) detail
  • BR support communicate with only TiKV cluster. (300 points / medium)
  • BR support restore Raw KV command. (500 points / medium)
  • BR support backup Raw KV command. (500 points / medium)

Difficulty

  • Easy

Score

  • 2300

Mentor(s)

Recommended skills

  • Go language
  • Basic understanding of TiKV

References

br output error logs even backdup/restore success

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    sometimes after backup/restore succeed, br still log some error logs

  2. What did you expect to see?
    cleaned log

  3. What did you see instead?
    lots of

[2020/03/07 16:35:47.768 +08:00] [INFO] [coprocessor.go:855] ["[TIME_COP_PROCESS] resp_time:805.319684ms txnStartTS:415123386115555329 region_id:1231 store_addr:172.16.5.204:20191 kv_process_ms:734 scan_total_write:415965 scan_processed_write:415964 scan_total_data:0 scan_processed_data:0 scan_total_lock:1 scan_processed_lock:0"]
{"level":"warn","ts":"2020-03-07T16:35:50.309+0800","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-ee6aac04-29bd-4a17-a93b-fd32c6aa18cb/172.16.5.204:2379","attempt":30,"error":"rpc error: code = Canceled desc = grpc: the client connection is closing"}
  1. What version of BR and TiDB/TiKV/PD are you using?
    master

BR should provide the total size of one backup

BR should provide the total size of one backup considering following cases:

  • Users want to know the size of each backup.
  • The platforms that integrate with BR, e.g. TEP, DBaaS, etc. need to show the size of each backup.

Maybe TiKV can return the size of the files it created for each backup to BR and BR calculates the total size from all the TiKVs and writes the size to backup meta.

@kennytm @overvenus @tennix

Online Restore Project

BR support online restore

Overview

Currently, BR only supports offline restore, but some users need online restore.
The goal of this project: to support online restore, during which the online
query should not have a significant impact.

Problem Statement

BR will adjust TiKV to import mode during restore. In this mode, TiKV can
quickly import a large amount of data through RocksDB IngestSST, but this mode
will affect the online query seriously.

During restore, BR will split a large number of small regions and randomly
scatter them to each TiKV. This will affect PD scheduling decisions, which will
have an unpredictable impact on the online query.

After restore is complete, the entire cluster executes full Compaction,
which will affect the IO/CPU greatly.

Before the restore is completed, the restore data is inconsistent, but there is
no restriction to prevent TiDB from operating on the data.

Proposed Solution

Online import scenarios can be roughly divided into two categories:

  1. No new TiKV nodes join the cluster, and restore is performed on the old nodes.
  2. A new TiKV node joins the cluster and performs restore on the new node.

For case 1, we can:

  • Skip setting import mode and control the impact of import on online query
    through flow control. Specific strategies:
    • Flow control Upload/Download.
    • Flow Control Ingest SST.
  • After restore is complete, skip Compaction. In addition to optimizing
    RocksDB data structure, Compaction also writes SST table properties,
    so to skip Compaction, TiKV needs:
    • Use TiKV table properties to process SST files during Upload/Download.

For case 2, we can:

  • Let the scatter region support scatter regions on specified TiKV nodes.
  • Let PD better handle scheduling in online restore scenarios.
  • Follow the offline restore process on TiKV.

In both cases, TiDB needs to support making the restored database/table
invisible to the user.

Success Criteria

  • Achieve online restore.
  • No service unavailability during online restore.
  • Write stall cannot occur during online restore.
  • The user cannot operate the recovered database/table during online restore.

Difficulty

  • Medium

Score

  • 4000

TODO list

  • TiKV skips setting import mode during restore. (score: 200)
  • TiKV can flow control Upload/Download SST during restore. (score: 200)
  • TiKV can flow control ingest SST during restore. (score: 200)
  • Upload/Download handle table properties needed for SST to write to TiKV. (score: 300)
  • PD support scatter regions on specified TiKV nodes. (score: 800)
  • PD scheduling should not be affected by the restore scenario. (score: 800)
  • TiDB supports hiding the specified database/table. (score: 1000)
  • Add command in BR to rollback settings(PD and TiKV settings). (score: 500)

Mentor(s)

Recommended skills

  • Go language
  • Basic understanding of TiKV and PD

Time

GanttStart: YYYY-MM-DD
GanttDue: YYYY-MM-DD

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.