Comments (4)
Yes clickhouse-backup execute server side s3:CopyObject/s3:CreateMultipartUpload API during backup exists S3 disk
Use clickhouse-backup 2.4.33
And add object_disk_path
to s3
config session in /etc/clickhouse-backup/config.yml
clickhouse-backup
credentials in s3
section shall have read access to your original s3 bucket with original data during create_remote
operation,
and write access to destination s3 bucket during restore_remote
on destination cluster
and read+write access to backup bucket
which you provide in s3
section of /etc/clickhouse-backup/config.yml
All three buckets can be the same, but with different path inside
from clickhouse-backup.
Sorry, I am trying to make it clear.
I have set up two ClickHouse clusters on two separate EC2 instances, with identical configurations. I created a database called test
on ClickHouse cluster1 and inserted some data into it. The original data is stored in S3. My goal is to back up the data on ClickHouse cluster1 and then restore the database test
on ClickHouse cluster2.
Here's an overview of the steps I've followed:
-
Run
clickhouse-backup create
on the first EC2 instance:
clickhouse-backup create test-backup-0303 --tables=test.events
-
Copy the backup files to the second EC2 instance:
tar -czvf "test_meta.tgz" -C /var/lib/clickhouse/backup/ test-backup-0303/
tar -czvf "test_data.tgz" -C /var/lib/clickhouse/disks/s3/backup/ test-backup-0303/
scp test_meta.tgz username@ec2:/root
scp test_data.tgz username@ec2:/root
- Now, on the second EC2 instance, extract the files to the specific directory and run
clickhouse-backup list
. It will display the backuptest-backup-0303
:
tar -xzvf /root/test_meta.tgz -C /var/lib/clickhouse/backup/
tar -xzvf /root/test_data.tgz -C /var/lib/clickhouse/disks/s3/backup/
- Run
clickhouse-backup restore test-backup-0303
on the second EC2 instance.
Mt question is: "Does the original data stored in S3 get copied to another S3 path when restoring the test database on the second EC2 instance? If yes, how can I prevent copying the S3 data during the restore process on the second EC2 instance?"
Beacause I want the two clickhouse clusters to share the same clickhouse data.
The clickhouse-backup
conf is:
general:
remote_storage: none
max_file_size: 0
disable_progress_bar: true
backups_to_keep_local: 0
backups_to_keep_remote: 0
log_level: debug
allow_empty_backups: false
download_concurrency: 8
upload_concurrency: 8
use_resumable_state: true
restore_schema_on_cluster: ""
upload_by_part: true
download_by_part: true
restore_database_mapping: {}
retries_on_failure: 3
retries_pause: 30s
watch_interval: 1h
full_interval: 24h
watch_backup_name_template: shard{shard}-{type}-{time:20060102150405}
retriesduration: 30s
watchduration: 1h0m0s
fullduration: 24h0m0s
clickhouse:
username: default
password: "123456"
host: 127.0.0.1
port: 9000
disk_mapping: {}
skip_tables:
- system.*
- default.*
- INFORMATION_SCHEMA.*
- information_schema.*
- _temporary_and_external_tables.*
timeout: 5m
freeze_by_part: false
freeze_by_part_where: ""
use_embedded_backup_restore: false
embedded_backup_disk: ""
backup_mutations: true
restore_as_attach: false
check_parts_columns: true
secure: false
skip_verify: false
sync_replicated_tables: false
log_sql_queries: true
config_dir: /etc/clickhouse-server/
restart_command: systemctl restart clickhouse-server
ignore_not_exists_error_during_freeze: true
check_replicas_before_attach: true
tls_key: ""
tls_cert: ""
tls_ca: ""
debug: false
``
from clickhouse-backup.
Is your second EC2 instance has the same S3 disk credentials and path
?
In <storage_configuration>
section for clickhouse-server?
if i understand your setup properly, then
/var/lib/clickhouse/disks/s3/backup/test-backup-0303/
doesn't contain DATA, actually it contains metadata files which contains files which referenced to keys on first EC2 instance S3 bucket which you use for s3 disk
could you share
SELECT * FROM system.disks
and
SELECT * FROM system.storage_policies
from first and second EC2 instance?
from clickhouse-backup.
Yes, you are right. The second EC2 instance doesn't contain data, only contains metadata. Now it works well. Thanks.
from clickhouse-backup.
Related Issues (20)
- clickhouse-backup create error No backup file created HOT 3
- API: /backup/list HOT 64
- add object_disk_server_side_copy_concurrency with 32 by default
- replicated RBAC backup doesn't work if /var/lib/clickhouse/access doesn't present
- add --delete-source to `watch` command
- unable to restore from backup HOT 10
- add skip_disks option
- clickhouse is fail to start HOT 20
- skip ValidateObjectDiskConfig for --diff-from-remote when object disk doesn't contains data
- EKS Irsa doesnt work HOT 1
- Create_remote results in `error: data in objects disks` (Azure Blob) HOT 1
- Can i restore backups from one cluster to other? HOT 2
- restore stop works, if RBAC objects present in backup but user which used for connect to clickhouse don't have access_management
- wrong skip tables by engine when empty variables value "CLICKHOUSE_SKIP_TABLE_ENGINES=engine," instead of "CLICKHOUSE_SKIP_TABLE_ENGINES=engine"
- implements `--partitions=db.table:part_name1,part_name2` and `--partitions=db2.table2:*` to allow more flexible backup logic HOT 1
- create system.backup_version, add version to log, add GET /backup/version endpoint
- Implements `X/Y tables` logging for `done` logging HOT 1
- How to backup replica clickhouse cluster correctly ? HOT 1
- `acccess` and `configs` download should use archieve extensions the same manner as main data, for example allow download zstd instead of .tar
- API server should restart if `watch` command fails
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clickhouse-backup.