I use s3 to store my clcikhouse data(the original clickhouse data, not the backup dat

Is your second EC2 instance has the same S3 disk credentials and <code class="notransl

Does clickhouse-backup copy data when the data is only stored in s3? about clickhouse-backup HOT 4 CLOSED

and1990 commented on August 16, 2024

Does clickhouse-backup copy data when the data is only stored in s3?

from clickhouse-backup.

Comments (4)

Slach commented on August 16, 2024

Yes clickhouse-backup execute server side s3:CopyObject/s3:CreateMultipartUpload API during backup exists S3 disk

Use clickhouse-backup 2.4.33
And add object_disk_path to s3 config session in /etc/clickhouse-backup/config.yml

clickhouse-backup credentials in s3 section shall have read access to your original s3 bucket with original data during create_remote operation,
and write access to destination s3 bucket during restore_remote on destination cluster

and read+write access to backup bucket which you provide in s3 section of /etc/clickhouse-backup/config.yml

All three buckets can be the same, but with different path inside

from clickhouse-backup.

and1990 commented on August 16, 2024

Sorry, I am trying to make it clear.

I have set up two ClickHouse clusters on two separate EC2 instances, with identical configurations. I created a database called test on ClickHouse cluster1 and inserted some data into it. The original data is stored in S3. My goal is to back up the data on ClickHouse cluster1 and then restore the database test on ClickHouse cluster2.

Here's an overview of the steps I've followed:

Run clickhouse-backup create on the first EC2 instance:
clickhouse-backup create test-backup-0303 --tables=test.events
Copy the backup files to the second EC2 instance:

tar -czvf "test_meta.tgz" -C /var/lib/clickhouse/backup/ test-backup-0303/
tar -czvf "test_data.tgz" -C /var/lib/clickhouse/disks/s3/backup/ test-backup-0303/

scp test_meta.tgz username@ec2:/root
scp test_data.tgz username@ec2:/root

Now, on the second EC2 instance, extract the files to the specific directory and run clickhouse-backup list. It will display the backup test-backup-0303:

tar -xzvf /root/test_meta.tgz -C /var/lib/clickhouse/backup/
tar -xzvf /root/test_data.tgz -C /var/lib/clickhouse/disks/s3/backup/

Run clickhouse-backup restore test-backup-0303 on the second EC2 instance.

Mt question is: "Does the original data stored in S3 get copied to another S3 path when restoring the test database on the second EC2 instance? If yes, how can I prevent copying the S3 data during the restore process on the second EC2 instance?"
Beacause I want the two clickhouse clusters to share the same clickhouse data.

The clickhouse-backup conf is:

general:
    remote_storage: none
    max_file_size: 0
    disable_progress_bar: true
    backups_to_keep_local: 0
    backups_to_keep_remote: 0
    log_level: debug
    allow_empty_backups: false
    download_concurrency: 8
    upload_concurrency: 8
    use_resumable_state: true
    restore_schema_on_cluster: ""
    upload_by_part: true
    download_by_part: true
    restore_database_mapping: {}
    retries_on_failure: 3
    retries_pause: 30s
    watch_interval: 1h
    full_interval: 24h
    watch_backup_name_template: shard{shard}-{type}-{time:20060102150405}
    retriesduration: 30s
    watchduration: 1h0m0s
    fullduration: 24h0m0s
clickhouse:
    username: default
    password: "123456"
    host: 127.0.0.1
    port: 9000
    disk_mapping: {}
    skip_tables:
        - system.*
        - default.*
        - INFORMATION_SCHEMA.*
        - information_schema.*
        - _temporary_and_external_tables.*
    timeout: 5m
    freeze_by_part: false 
    freeze_by_part_where: ""
    use_embedded_backup_restore: false
    embedded_backup_disk: ""
    backup_mutations: true
    restore_as_attach: false
    check_parts_columns: true
    secure: false
    skip_verify: false
    sync_replicated_tables: false
    log_sql_queries: true
    config_dir: /etc/clickhouse-server/
    restart_command: systemctl restart clickhouse-server
    ignore_not_exists_error_during_freeze: true
    check_replicas_before_attach: true
    tls_key: ""
    tls_cert: ""
    tls_ca: ""
    debug: false
``

from clickhouse-backup.

Slach commented on August 16, 2024

Is your second EC2 instance has the same S3 disk credentials and path?
In <storage_configuration> section for clickhouse-server?

if i understand your setup properly, then
/var/lib/clickhouse/disks/s3/backup/test-backup-0303/
doesn't contain DATA, actually it contains metadata files which contains files which referenced to keys on first EC2 instance S3 bucket which you use for s3 disk

could you share
SELECT * FROM system.disks
and
SELECT * FROM system.storage_policies
from first and second EC2 instance?

from clickhouse-backup.

and1990 commented on August 16, 2024

Yes, you are right. The second EC2 instance doesn't contain data, only contains metadata. Now it works well. Thanks.

from clickhouse-backup.

Does clickhouse-backup copy data when the data is only stored in s3? about clickhouse-backup HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent