scality / cloud-migration-tool Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 3.0 730 KB

A Cloud storage migration command-line tool that uses the scality droplet library

License: BSD 3-Clause "New" or "Revised" License

CMake 5.49% C 94.51%

cloud-migration-tool's People

Contributors

Stargazers

Watchers

Forkers

hojo zhangjinde judealfred

cloud-migration-tool's Issues

Improve ETA

Currently, the ETA measures the last few seconds of bandwidth in order to attempt an estimation at how much data can be transferred. It does not take into account:

Bandwidth variation/evolution on a larger timespan than 10 secs.
Small objects migrations that induce a lot of latency (and thus reduce bandwidth)

We should try to find an efficient ETA algorithm, make it testable, run scenarios with it, and replace the current, unreliable one.

logging behavior in bg mode

Do not log anything by default in background mode.

Background Mode option

An option should exist to tell the tool to run in background and not block the user's terminal during the transfer.

Verbose : droplet trace option

An option should be added in order to activate parts of the tracing options of the droplet library at will.
Something like openbsd's ktrace -t options would be nice (that is: -t ihce, with each letter meaning a specific tracing option)

Output to a log file

When in background-mode, the tool should by default output everything to a file.
An option to choose the output file would be convenient.

Identification of migration in cloudmig-view

Currently, cloudmig-view displays the process ID, but that's not anything interesting for any user.

It should display the URIs of the source and the destination.
In order to do that, a simple file with one line could be written within the same directory than the socket file by the main tool, and be read by the viewer.

Build: libmenu finding

Currently, the libmenu's path is written in the viewer's CMakeLists instead of being retrieved by a FindPackage call.
We need to make one to ensure we can find it.

Droplet backend support

Currently, the tool is too rigid in the way it supports the multiple backends from droplet.
For instance, the POSIX droplet backend cannot be supported because it does not provide any bucket functionnality, feature over which the internals of cloudmig rely heavily.

The solution may be linked to the possibility to migrate from a directory to a bucket, from a bucket to a directory, and between directories within some buckets.

ACL management : default behaviour

The acl are still not managed within the application :
Everything is created as a Canned ACL : PRIVATE

By default, the application should at least reproduce the canned_acl, or in case of failure, fallback to PRIVATE by default.

Viewer does not compile in 64bits

Many integer variables are declared as uint64_t, and printed with format %llu.
On 64 bits, this should be %lu, so a solution is needed to avoid this problematic, and to make the viewer compile.

End of migration report

The tool should report a last line at the end of the migration in order to tell the stats of the whole migration :

number of files transfered,
Number of file not transfered (failures)
Total size of the transfer.
Average speed,
Time it took,

Resume option

Currently, the tool resumes a migration by default.

In order to avoid the risk of starting two migrations on the same source account at the same time, it is preferrable to have an option to force resuming the migration.

By default, it should not do anything if a migration is currently ongoing.

cloudmig fails to run in concurrent, simultaneous threads

Cloudmig will fail to run if run twice, in 2 different bash shells, simultaneously. Only 1 instance will succeed.
If i add a slight delay (even .1 seconds) between cloudmig runs, everything works perfectly fine. The problem occurs specifically, when run simultaneously.

I proved this out with the following script & configurations:
https://gist.github.com/mlaurie/8d8011aaeaad03a41e6e

If you don't get the error immediately, try running several times. It will error pretty commonly.

There 1 primary error that cloudmig gives:

Unexpected exception: Command 'cloudmig -c /tmp/testConfig1.json' returned non-zero exit status 1
cloudmig:6611:[INFO][Loading Profiles]: Starting...
cloudmig:6611:[INFO][Loading Profiles]: Profiles loaded with success.
cloudmig:6611:[INFO][Loading Status] Starting status loading...
cloudmig:6611:[ERR][Loading Status/Exists] Could not list open status store path(opendir): DPL_ENOENT
cloudmig:6611:[INFO][Creating Status Store]  Status Store not found. Creating...
mkdir: File exists
cloudmig:6611:[ERR][Creating Status Store] Could not create store(directory): DPL_EEXIST

Change the way the names are stored

Currently, the names in the status files are stored with a precise length.
If the length is 0 mod 4, there is no nul terminating character.
We should improve the binary status file formats in order to make manipulation of those files easier. (That would probably help cleaning code too)

Logging : syslog

The application is to be used by system administrators (mainly, at least).
Instead of giving them a shitty log in the form of an output, we should log everything that's happening with syslog.

Improve the verbose option

Currently, there are 3 verbose modes :
DEBUG (with verbose option, includes FULL droplet library tracing)
INFO (default)
WARN (quiet mode)

It would be nice to choose whether to activate the droplet library tracing or not, with a more flexible option.

Permission for the viewer

Currently, the viewer can see any migration going on, may it be by the same user or not.
Setting permissions over the directories conitainins the socket file should be enough to prevent this behavior, and limit the viewer to what it should access.

success reported on failure when destination inaccessible

When the destination isn't directly accessable, cloudmig reports a successful migration when no files were transferred.
The transfer fails properly if the source is not accessible. This only occurs when destination is not accessible.

Example1:
When cloudmig receives HTTP-403 during posix->RS2 migration, it results in:

cloudmig:3476:[INFO][Migrating] File 'newfile.txt' transfer failed !
cloudmig:3476:[ERR][Migrating] : Could not migrate file newfile.txt
cloudmig:3469:[INFO]Uploading digest: 0/0 objs, 1/0 bytes
cloudmig:3469:[INFO][Uploading Status Digest]  Uploaded digest: { "objects": 1, "done_objects": 0, "bytes": 1024, "done_bytes": 0 }
cloudmig:3469:[INFO]Migration finished with success !

Example2:
When cloudmig receives HTTP-307 during S3->S3 migration, it results in:

cloudmig:2545:[INFO][Migrating] File '100Kfile.txt' transfer failed !
cloudmig:2545:[ERR][Migrating] : Could not migrate file 100Kfile.txt
cloudmig:2538:[INFO]Uploading digest: 0/0 objs, 1/0 bytes
cloudmig:2538:[INFO][Uploading Status Digest]  Uploaded digest: { "objects": 1, "done_objects": 0, "bytes": 102400, "done_bytes": 0 }
cloudmig:2538:[INFO]Migration finished with success !

Using delete_source parameter also deletes the status files

When using the "delete_source" configuration parameter, the status files are also deleted when the test completes.

While the migration is running, the status files are created properly and populated. But when the migration completes, it appears those status files are also removed, even when they are in a different file location (posix:/tmp/statusDir1) than the migration-source objects (s3:srcbucket:/)

Expected:

/tmp/statusDir1/cloudmig.s3.amazonaws.com.to.s3.amazonaws.com/
mlz1%3a%2f  
mlz1%3a%2f.json

Actual:

/tmp/statusDir1/

config.json

{
    "source": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "destination": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "status": {
        "backend": "posix",
        "base_path": "/tmp/statusDir1"
    },
    "cloudmig": {
        "buckets": {"srcbucket:/": "dstbucket:/"},
        "delete-source": true,
        "worker-threads": 10,
        "create-directories": true
    }
}

Deb/RPM installers

A Cpack configuration should be added to the CMakeLists in order to be able to provide both installers

Improve Cloudmig-view testability

We need a dummy cloudmig to be able to load and run loadable scenarios in order to test cloudmig-view easily, and quickly. This should be part of an effort to improve testability and stability of the tools.

Add an option to deactivate transfer for specific file types

When migrating data from one backend to another, where the feature support is different between the two, we would like to be able to tell cloudmig not to try and migrate some specific kind of files.
For instance, the S3 backend does not support symlinks, and as such, we would like to prevent migrating any symlink during a migration to this backend, suppressing errors from symlink migrations.

ACL management : pushing raw ACL xml

We should provide an option allowing to attempt pushing raw ACL to the destination.
In case of failure, fallback to push ACL in a separate file with extension .acl.
Thus, the file Test.txt would either have its acl SET, or have a joint file Test.txt.acl containing the ACL's XML.

Option to choose buckets constraint-location

An option should provide the user with the possibility to choose which constraint-location to apply to the destination buckets to be created.

Add Files deletion to the view

When a migration is done, it is possible to delete the source files, but nothing is shown to the viewer for the user to visualize how the deletion is progressing.

destination buckets name

Currently, whenever a destination bucket name is given, the tool does not try to access the bucket when creating it.
This results in an error while creating the destination bucket where sometimes, it could merely use an existing bucket.

ACL management : pushing raw ACL in a separate file

We should provide an option offering to push the RAW ACL's XML into a separate file.
For instance, the file foo.txt would be transfered, and the new file foo.txt.acl would contain the file's ACL's xml.

This would be the default in case of failure of the option given in ticket #8

Multipart upload not working: intermediate status cannot be saved

There is an error in the computing of the path of a status directory and its temporary status files, that leads to a difference in the names of the directory created for the status of a specific bucket and the name of the directory used in the path computed for the temporary status file.

All of this happens in status_bucket.c

buckets options : change the format

change the semicolon into a coma for the buckets associations

Multithreading

Most of the code is planned to be multithreaded one day or another, so let's just do that.

Transfer configuration

An option allowing to give a configuration file instead of command-line arguments.

written as --config-file=config_file

It would contain information about :

Source account (replacing the droplet source profile)
Destination account (replacing the droplet destination profile)
A list of association of source/destination buckets
Every command-line argument should have an equivalent in this file

directories transfer not done

the comparison offset done/size gives 0 < 0 in case of directories.
Thus they are not transfered

Report output in dest account

It may be practical to have the possibility to retrieve the output report directly from the destination account/bucket.

Transfer of files inside directories is buggy

Actually, by using the libdroplet's vfile and vdir api, a limitation occurs :
If a file with a name containing a directory-like path is to be transfered, the vfile API checks wether a file with the directory's name exists.
e.g.

The file 'bucket:/directory/foo.txt' requires a file named 'bucket:/directory/' to exist in order to be transfered.

Sadly, that is not a thing that can be expected from any provider/account, and it must be possible to transfer a file without those limitations.

Option to empty the destination bucket if it already exists

currently we only put the files inside the destination buckets, without caring about what was present inside before.
We should be able to take an argument telling whether to delete its content or not.

Properly manage ^C Interruption

The tool does not manage properly a ^C Interruption, possibly corrupting some files (statuses mostly).
We should manage it properly, and do a clean stop in this case.

Migration between buckets

Currently, the tool only migrates an account's content to another.

We need an option to migrate only between two buckets.

Add an option to choose whether to encrypt on the fly

For now, at most, the application uses de DPL_VFILE_MD5 flags that allows to check the file's integrity, but doesn't prevent anyone from reading it on the way.

An option to select on-the-fly encryption would be a good idea.

cannot double migrate 1 source bucket

Cannot migrate a single sourceBucket into 2 different destination buckets.
Below is a sample configuration that might be used for this scenario (buckets section is most important here):

{
    "cloudmig": {
        "buckets": {"srcbucket1:/": "dstbucket1:/", "srcbucket1:/": "dstbucket2:/"},
        "create-directories": true
    }
}

Expected results:

srcbucket1 is migrated to dstbucket1
AND
srcbucket1 is migrated to dstbucket2

Actual results:

srcbucket1 is migrated to dstbucket1
OR
srcbucket1 is migrated to dstbucket2

A workaround is to migrate the same source twice, using two different bucket configurations.

Migration might overwrite files without user's consent

When doing a "merge"-type migration (merging a source directory into an existing and used target directory), any name-conflicting file might be overwritten without so much as a second thought.

We might want to prevent that by default but add an option to force the overwrite.

Add some usage documentation

Currently, there is no proper documentation about how to use the tool:

What parameters are mandatory ? -> source, destination, cloudmig/buckets
What can help to explain an issue ? -> Assumptions of the different configuration variables (ie: cloudmig/buckets assumes the / ending for a directory, otherwise the names are just prepended)
etc...

Also some basic configurations might be useful, even a simple configuration file with all the default values set in; it would be almost like launching cloudmig with the minimal number of parameters, but everything would be written explicitly,

Force restart option

A resume option is to be added to the tool, but there is no way to ignore the status bucket and force restarting the transfer.
Such an option may prove to be useful.

delete_source option fails on s3

On posix, the deletion is successful, but it finishes with segfault.
On S3, deletion is unsuccessful and it finishes with segfault.

On S3, attempting to use the "delete_source" configuration parameter results in failure and segfault.
(possibly due to decimal prefix in hostname?):

cloudmig -c configuration.json
....skipped output here....
cloudmig:7433:[INFO][Migrating] : file 100Kfile.txt migrated.
cloudmig:7426:[INFO]Uploading digest: 1/0 objs, 1/0 bytes
cloudmig:7426:[INFO][Uploading Status Digest]  Uploaded digest: { "objects": 1, "done_objects": 1, "bytes": 102400, "done_bytes": 102400 }
cloudmig:7426:[INFO]Migration finished with success !
cloudmig:7426:[INFO][Deleting Source]: Starting deletion of the migration's source...
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
cloudmig:7426:[ERR][Deleting Source File] Could not delete the file 100Kfile.txt : DPL_FAILURE.
Segmentation fault (core dumped)

configuration.json:

{
    "source": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "destination": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "status": {
        "backend": "posix",
        "base_path": "/tmp/statusDir1"
    },
    "cloudmig": {
        "buckets": {"srcbucket:/": "dstbucket:/"},
        "delete-source": true,
        "worker-threads": 10,
        "create-directories": true
    }
}

delete_source does not delete the source

Using the delete_source configuration option does not seem to delete the source.
It fails with ' Failed to lookup hostname ".s3.amazonaws.com" '

Error output:

cloudmig:5944:[INFO][Migrating] File '100Kfile.txt' transfer succeeded !
unlink: No such file or directory
cloudmig:5944:[INFO][Migrating] : file 100Kfile.txt migrated.
cloudmig:5937:[INFO]Uploading digest: 1/0 objs, 1/0 bytes
cloudmig:5937:[INFO][Uploading Status Digest]  Uploaded digest: { "objects": 1, "done_objects": 1, "bytes": 102400, "done_bytes": 102400 }
cloudmig:5937:[INFO]Migration finished with success !
cloudmig:5937:[INFO][Deleting Source]: Starting deletion of the migration's source...
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
cloudmig:5937:[ERR][Deleting Source File] Could not delete the file 100Kfile.txt : DPL_FAILURE.
cloudmig:5937:[INFO][Deleting Source]: Deletion of the migration's source done.
cloudmig:5937:[STATUS]End of data migration. During this session :
    Transfered 1 objects, totaling 1/1 objects.
    Transfered 102400 Bytes, totaling 102400/102400 Bytes.
    Average transfer speed : 102400 Bytes/s.
    Transfer Duration : 0d0h0m1s.

configuration.json

{
    "source": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "destination": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "status": {
        "backend": "posix",
        "base_path": "/tmp/statusDir1"
    },
    "cloudmig": {
        "buckets": {"srcbucket:/": "dstbucket:/"},
        "delete-source": true,
        "worker-threads": 10,
        "create-directories": true
    }
}