Giter Club home page Giter Club logo

database-migrator's Introduction

Prerequisites

IMPORTANT! Requires Java 8 SDK

Please make sure you have JDK 8 installed:

ubuntu@pc:~$ java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

Description

This tool used for migrating ThingsBoard setup from PostgreSQL only into Hybrid mode.

You can use this tool in two scenarios.

Scenario #1

This is a scenario when rule engine requires historical data to work properly (e.g. you have rule nodes that are fetching historical data during processing new messages from devices). In this case system downtime requires for the migration.

Scenario #2

This is a scenario when rule engine DOES NOT require historical data to work properly. In this case system downtime not required (in case you are running cluster you can do this migration with zero downtime), or you'll need to restart ThingsBoard service in case of monolith setup. You'll need to upgrade configuration of ThingsBoard to use Cassandra for timeseries instead PostgreSQL - old historical data will be added to Cassandra from PostgreSQL later. In this scenario historical data will not be available after the reconfiguration of ThingsBoard (for instance on dashboards), but you'll see after migration.

Performance

Performance of this tool depends on disk type and instance type (mostly on CPU resources). Here are few benchmarks in general:

  1. Creating Dump of the postgres ts_kv table (if it's size is 100 GB) ~ 1-2 hours
  2. Generation SSTables from dump 100 GB ~ 3-4 hours
  3. 100 GB dump file will be converted into SSTable with size about ~ 20-30 GB

NOTE Recommended instance type in AWS is m5.xlarge or larger. Avoid using instance with burst CPUs.

Tool build instructions:

To build the project execute:

mvn clean compile assembly:single

It will generate single jar file with all required dependencies inside target dir -> database-migrator-1.0-SNAPSHOT-jar-with-dependencies.jar.

Prepare required files and run tool

Scenario #1

Cassandra setup

Install Cassandra - in this scenario you can install only single instance.

Using cqlsh create thingsboard keyspace and required tables.

NOTE You can use schema-ts.cql and schema-ts-latest.cql files, that are located in main/resources folder of this project.

Dump data from the Postgres DB to files

Do not use compression if possible because tool can only work with uncompressed files

  • Stop ThingsBoard instance

  • Dump related tables (entities) that used to validate telemetry:

    pg_dump -h localhost -U postgres -d thingsboard --exclude-table=admin_settings \
    --exclude-table=attribute_kv --exclude-table=audit_log --exclude-table=component_discriptor \
    --exclude-table=device_credentials --exclude-table=event --exclude-table=oauth2_client_registration \
    --exclude-table=oauth2_client_registration_info --exclude-table=oauth2_client_registration_template \
    --exclude-table=relation --exclude-table=rule_node_state --exclude-table=tb_schema_settings \
    --exclude-table=user_credentials --exclude-table=ts_kv* --exclude-table=_timescaledb_internal.* > /home/user/dump/related_entities.dmp`
  • Dump ts_kv_dictionary:

    pg_dump -h localhost -U postgres -d thingsboard --table=ts_kv_dictionary > /home/user/dump/ts_kv_dictionary.dmp
  • Dump ts_kv and all partitions:

    pg_dump -h localhost -U postgres -d thingsboard --load-via-partition-root --data-only \
    --table=ts_kv* --table=_timescaledb_internal.* > /home/user/dump/ts_kv_all.dmp  
    • If only part of the data needs to be migrated, it will be necessary to create a custom table in Postgres:

      create table ts_kv_custom as (
        select ts_kv.* 
        from ts_kv 
        join ts_kv_dictionary on ts_kv.key=ts_kv_dictionary.key_id
        join device on device.id=ts_kv.entity_id
        join device_profile on device_profile.id=device.device_profile_id
        where ts_kv_dictionary.key in ('pulseCounter', 'temperature', 'flow')
          and device_profile.name in ('default')
      );
      

      Such tables can be created several times using different queries. The table name must start with ts_kv_custom. Like ts_kv_custom1, ts_kv_custom_tenant_x

      To create a dump, use the command:

      pg_dump -h localhost -U postgres -d thingsboard --load-via-partition-root --data-only \
      --table=ts_kv_custom* > /home/user/dump/ts_kv_all.dmp  

[Optional] Move table dumps to the instance where cassandra will be hosted

Prepare directory structure for SSTables

Tool use 3 different directories for saving SSTables - ts_kv_cf, ts_kv_latest_cf, ts_kv_partitions_cf.

Create 3 empty directories. For example:

/home/user/migration/ts
/home/user/migration/ts_latest
/home/user/migration/ts_partition

Run tool

IMPORTANT! If you run this tool on the remote instance - don't forget to execute this command in screen to avoid unexpected termination

java -jar ./target/database-migrator-1.0-SNAPSHOT-jar-with-dependencies.jar \
        -telemetryFrom /home/user/dump/ts_kv_all.dmp \
        -relatedEntities /home/user/dump/related_entities.dmp \
        -dictionary /home/user/dump/ts_kv_dictionary.dmp \
        -latestOut /home/user/migration/ts_latest \
        -tsOut /home/user/migration/ts \
        -partitionsOut /home/user/migration/ts_partition \ 
        -castEnable false \
        -partitioning MONTHS \
        -linesToSkip 0 > /tmp/migration.log &

If you want to migrate just ts_kv without ts_kv_latest, execute next:

java -jar ./target/database-migrator-1.0-SNAPSHOT-jar-with-dependencies.jar \
        -telemetryFrom /home/user/dump/ts_kv_all.dmp \
        -relatedEntities /home/user/dump/related_entities.dmp \
        -dictionary /home/user/dump/ts_kv_dictionary.dmp \
        -tsOut /home/user/migration/ts \
        -partitionsOut /home/user/migration/ts_partition \ 
        -castEnable false \
        -partitioning MONTHS \
        -linesToSkip 0 > /tmp/migration.log &

Use your paths for program arguments

Tool execution time depends on DB size, CPU resources and disk throughput.

Loading SSTables into Cassandra

Note that this part works only for single node Cassandra cluster

  • Stop Cassandra
  • Look at /var/lib/cassandra/data/thingsboard and check for names of data folders
  • Copy generated SSTable files into cassandra data dir using next command:
sudo find /home/user/migration/ts -name '*.*' -exec mv {} /var/lib/cassandra/data/thingsboard/ts_kv_cf-0e9aaf00ee5511e9a5fa7d6f489ffd13/ \;
sudo find /home/user/migration/ts_latest -name '*.*' -exec mv {} /var/lib/cassandra/data/thingsboard/ts_kv_latest_cf-161449d0ee5511e9a5fa7d6f489ffd13/ \;
sudo find /home/user/migration/ts_partition -name '*.*' -exec mv {} /var/lib/cassandra/data/thingsboard/ts_kv_partitions_cf-12e8fa80ee5511e9a5fa7d6f489ffd13/ \;

Pay attention! Data folders have similar name ts_kv_cf-0e9aaf00ee5511e9a5fa7d6f489ffd13, but you have to use own

  • Start Cassandra service and trigger compaction

    • Trigger compactions: nodetool compact thingsboard
    • Check compaction status: nodetool compactionstats
  • Switch ThingsBoard into Hybrid Mode

Modify ThingsBoard properties file thingsboard.conf and add next lines:

export DATABASE_TS_TYPE=cassandra
export TS_KV_PARTITIONING=MONTHS
export CASSANDRA_URL=YOUR_CASSANDRA_URL
export CASSANDRA_CLUSTER_NAME=YOUR_CASSANDRA_CLUSTER_NAME
export CASSANDRA_USE_CREDENTIALS=true # false if credentials not required 
export CASSANDRA_USERNAME=YOUR_CASSANDRA_USERNAME
export CASSANDRA_PASSWORD=YOUR_CASSANDRA_PASSWORD

Final steps

Start ThingsBoard instance and verify migration.

Scenario #2

Cassandra setup

Install Cassandra - you can install single cluster, cluster of N nodes. Cluster can be in docker or k8s or bare metal.

Using cqlsh create thingsboard keyspace and required tables.

NOTE You can use schema-ts.cql and schema-ts-latest.cql files, that are located in main/resources folder of this project.

Switch ThingsBoard into Hybrid Mode

Modify ThingsBoard properties file thingsboard.conf and add next lines:

export DATABASE_TS_TYPE=cassandra
export TS_KV_PARTITIONING=MONTHS
export CASSANDRA_URL=YOUR_CASSANDRA_URL
export CASSANDRA_CLUSTER_NAME=YOUR_CASSANDRA_CLUSTER_NAME
export CASSANDRA_USE_CREDENTIALS=true # false if credentials not required 
export CASSANDRA_USERNAME=YOUR_CASSANDRA_USERNAME
export CASSANDRA_PASSWORD=YOUR_CASSANDRA_PASSWORD

Re-start ThingsBoard and verify that new timeseries data written into Cassandra.

Dump data from the Postgres DB to files

Do not use compression if possible because tool can only work with uncompressed files

  • Dump related tables (entities) that used to validate telemetry:

    pg_dump -h localhost -U postgres -d thingsboard --exclude-table=admin_settings \
    --exclude-table=attribute_kv --exclude-table=audit_log --exclude-table=component_discriptor \
    --exclude-table=device_credentials --exclude-table=event --exclude-table=oauth2_client_registration \
    --exclude-table=oauth2_client_registration_info --exclude-table=oauth2_client_registration_template \
    --exclude-table=relation --exclude-table=rule_node_state --exclude-table=tb_schema_settings \
    --exclude-table=user_credentials --exclude-table=ts_kv* --exclude-table=_timescaledb_internal.* > /home/user/dump/related_entities.dmp`
  • Dump ts_kv_dictionary:

    pg_dump -h localhost -U postgres -d thingsboard --table=ts_kv_dictionary > /home/user/dump/ts_kv_dictionary.dmp
  • Dump ts_kv and all partitions:

    pg_dump -h localhost -U postgres -d thingsboard --load-via-partition-root --data-only \
    --table=ts_kv* --table=_timescaledb_internal.* > /home/user/dump/ts_kv_all.dmp  
    • If only part of the data needs to be migrated, it will be necessary to create a custom table in Postgres:

      create table ts_kv_custom as (
        select ts_kv.* 
        from ts_kv 
        join ts_kv_dictionary on ts_kv.key=ts_kv_dictionary.key_id
        join device on device.id=ts_kv.entity_id
        join device_profile on device_profile.id=device.device_profile_id
        where ts_kv_dictionary.key in ('pulseCounter', 'temperature', 'flow')
          and device_profile.name in ('default')
      );
      

      Such tables can be created several times using different queries. The table name must start with ts_kv_custom. Like ts_kv_custom1, ts_kv_custom_tenant_x

      To create a dump, use the command:

      pg_dump -h localhost -U postgres -d thingsboard --load-via-partition-root --data-only \
      --table=ts_kv_custom* > /home/user/dump/ts_kv_all.dmp  

[Optional] Move table dumps to the instance where cassandra will be hosted

Prepare directory structure for SSTables

Tool use 3 different directories for saving SSTables - ts_kv_cf, ts_kv_latest_cf, ts_kv_partitions_cf

Create 3 empty directories.

IMPORTANT directories MUST follow pattern .../KEYSPACE_NAME/COLUMN_FAMILY_NAME/

For example:

/home/user/migration/thingsboard/ts_kv_cf/
/home/user/migration/thingsboard/ts_kv_partitions_cf/
/home/user/migration/thingsboard/ts_kv_latest_cf/

Run tool

IMPORTANT! If you run this tool on the remote instance - don't forget to execute this command in screen to avoid unexpected termination

java -jar ./target/database-migrator-1.0-SNAPSHOT-jar-with-dependencies.jar \
        -telemetryFrom /home/user/dump/ts_kv_all.dmp \
        -relatedEntities /home/user/dump/related_entities.dmp \ 
        -dictionary /home/user/dump/ts_kv_dictionary.dmp \
        -latestOut /home/user/migration/thingsboard/ts_kv_latest_cf \
        -tsOut /home/user/migration/thingsboard/ts_kv_cf \
        -partitionsOut /home/user/migration/thingsboard/ts_kv_partitions_cf \
        -castEnable false \
        -partitioning MONTHS \
        -linesToSkip 0 > /tmp/migration.log &

If you want to migrate just ts_kv without ts_kv_latest, execute next:

java -jar ./target/database-migrator-1.0-SNAPSHOT-jar-with-dependencies.jar \
        -telemetryFrom /home/user/dump/ts_kv_all.dmp \
        -relatedEntities /home/user/dump/related_entities.dmp \ 
        -dictionary /home/user/dump/ts_kv_dictionary.dmp \
        -tsOut /home/user/migration/thingsboard/ts_kv_cf \
        -partitionsOut /home/user/migration/thingsboard/ts_kv_partitions_cf \
        -castEnable false \
        -partitioning MONTHS \
        -linesToSkip 0 > /tmp/migration.log &

Use your paths for program arguments

Tool execution time depends on DB size, CPU resources and disk throughput.

Loading SSTables into Cassandra

Using sstableloader start loading data into Cassandra:

sstableloader --verbose --nodes CASSANDRA_NODES --username cassandra --password CASSANDRA_PASSWORD /home/user/migration/thingsboard/ts_kv_partitions_cf/

sstableloader --verbose --nodes CASSANDRA_NODES --username cassandra --password CASSANDRA_PASSWORD /home/user/migration/thingsboard/ts_kv_cf/

Final steps

Verify that historical data available in ThingsBoard.

Troubleshooting

Continue migration in case of failure on particular migration line

IMPORTANT: works only in case of Scenario #2

Tool is able to continue creation of SSTables from the particular line. Let's image that tool has stopped at particular line - XXXXXXX:

2021-11-30 13:55:22,648 [main] INFO  o.t.c.t.m.writer.AbstractTbWriter - Lines processed 408000000, castOk 0, castErr 0, skippedLines 68956935
2021-11-30 13:55:25,625 [main] INFO  o.t.c.t.m.writer.AbstractTbWriter - Lines migrated 738000000, castOk 0, castErr 0, skippedLines 68966040
2021-11-30 13:55:32,037 [main] INFO  o.t.c.t.m.writer.AbstractTbWriter - Lines processed 409000000, castOk 0, castErr 0, skippedLines 68975104
2021-11-30 13:55:34,974 [main] INFO  o.t.c.t.m.writer.AbstractTbWriter - Lines migrated 739000000, castOk 0, castErr 0, skippedLines 68984191
2021-11-30 13:55:37,762 [main] INFO  o.t.c.t.m.writer.AbstractTbWriter - Lines processed 410000000, castOk 0, castErr 0, skippedLines 6899318

You'll need to find the latest entry of Lines migrated XXXXXX. You can start migration tool from XXXXXX line.

To use this feature please do next steps:

  • Migrate already created SSTable to the Cassandra by following steps in the guide below for Scenario #2 You can copy already created SSTables into some other folder, and later migrate with sstableloader folder by folder
  • Create new 3 different directories for saving SSTables, or reuse the ones used in the previous step
/home/user/migration/thingsboard/ts_kv_cf/
/home/user/migration/thingsboard/ts_kv_partitions_cf/
/home/user/migration/thingsboard/ts_kv_latest_cf/
  • Start tool according to instructions above, but with an additional parameter linesToSkip:
java -jar ./target/database-migrator-1.0-SNAPSHOT-jar-with-dependencies.jar \
        -telemetryFrom /home/user/dump/ts_kv_all.dmp \
        -relatedEntities /home/user/dump/related_entities.dmp \ 
        -dictionary /home/user/dump/ts_kv_dictionary.dmp \
        -latestOut /home/user/migration/thingsboard/ts_kv_latest_cf \
        -tsOut /home/user/migration/thingsboard/ts_kv_cf \
        -partitionsOut /home/user/migration/thingsboard/ts_kv_partitions_cf \
        -castEnable false \
        -partitioning MONTHS \
        -linesToSkip XXXXXX > /tmp/migration.log &

where XXXXXXX is the number of the line, that script should continue.

  • Continue migration according to the steps above in Scenario #2

database-migrator's People

Contributors

pon0marev avatar volodymyr-babak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

database-migrator's Issues

Corrupted ts_kv_partitions_cf after migration

I try to migrate to cassandra with scenario 2. All steps working fine. But after load the SSTables, cassandra log says, that /var/lib/cassandra/data/thingsboard/ts_kv_partitions_cf-426feb6087b711eebeceddffc248ce8c/md-3-big-Data.db is corrupted.

So, no historical data on Thingsboard. Any hints?
cassandra_system.log

About changing postgres DB to Cassandra Hybrid

I have previously posted a question about changing postgres DB to Cassandra Hybrid. At that time, it was said that the work could be carried out through the GitHub - thingsboard/database-migrator link, and it proceeded in the way of Scenario #1. After several trials and errors, the file was successfully filed with postgresql.dmp -> migration Folder, moved through the command to the /var/lib/passandra/data/thingsboard folder, and then modified the thingsboard.conf file to boot the thingsboard. However, it was not possible to check the time series data of the terminal on the thingsboard.(If you change it back to the single Postgresql setting, you can see it well) I will attach a picture of the setting. Please answer me.

I used thingsboard version 3.3.4 PE.

As a result of checking the log, the following log was found. You are about to send the log source file together. Can you tell me how to respond?

ERROR o.t.s.d.t.CassandraBaseTimeseriesDao - [DEVICE][01cf3f30-e069-11ec-b067-a1db09533645] Failed to fetch partitions for interval 1664582400000-1664582400000

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.