scality / cloud-migration-tool Goto Github PK
View Code? Open in Web Editor NEWA Cloud storage migration command-line tool that uses the scality droplet library
License: BSD 3-Clause "New" or "Revised" License
A Cloud storage migration command-line tool that uses the scality droplet library
License: BSD 3-Clause "New" or "Revised" License
Currently, the ETA measures the last few seconds of bandwidth in order to attempt an estimation at how much data can be transferred. It does not take into account:
We should try to find an efficient ETA algorithm, make it testable, run scenarios with it, and replace the current, unreliable one.
Do not log anything by default in background mode.
An option should exist to tell the tool to run in background and not block the user's terminal during the transfer.
An option should be added in order to activate parts of the tracing options of the droplet library at will.
Something like openbsd's ktrace -t options would be nice (that is: -t ihce, with each letter meaning a specific tracing option)
When in background-mode, the tool should by default output everything to a file.
An option to choose the output file would be convenient.
Currently, cloudmig-view displays the process ID, but that's not anything interesting for any user.
It should display the URIs of the source and the destination.
In order to do that, a simple file with one line could be written within the same directory than the socket file by the main tool, and be read by the viewer.
Currently, the libmenu's path is written in the viewer's CMakeLists instead of being retrieved by a FindPackage call.
We need to make one to ensure we can find it.
Currently, the tool is too rigid in the way it supports the multiple backends from droplet.
For instance, the POSIX droplet backend cannot be supported because it does not provide any bucket functionnality, feature over which the internals of cloudmig rely heavily.
The solution may be linked to the possibility to migrate from a directory to a bucket, from a bucket to a directory, and between directories within some buckets.
The acl are still not managed within the application :
Everything is created as a Canned ACL : PRIVATE
By default, the application should at least reproduce the canned_acl, or in case of failure, fallback to PRIVATE by default.
Many integer variables are declared as uint64_t, and printed with format %llu.
On 64 bits, this should be %lu, so a solution is needed to avoid this problematic, and to make the viewer compile.
The tool should report a last line at the end of the migration in order to tell the stats of the whole migration :
Currently, the tool resumes a migration by default.
In order to avoid the risk of starting two migrations on the same source account at the same time, it is preferrable to have an option to force resuming the migration.
By default, it should not do anything if a migration is currently ongoing.
Cloudmig will fail to run if run twice, in 2 different bash shells, simultaneously. Only 1 instance will succeed.
If i add a slight delay (even .1 seconds) between cloudmig runs, everything works perfectly fine. The problem occurs specifically, when run simultaneously.
I proved this out with the following script & configurations:
https://gist.github.com/mlaurie/8d8011aaeaad03a41e6e
If you don't get the error immediately, try running several times. It will error pretty commonly.
There 1 primary error that cloudmig gives:
Unexpected exception: Command 'cloudmig -c /tmp/testConfig1.json' returned non-zero exit status 1
cloudmig:6611:[INFO][Loading Profiles]: Starting...
cloudmig:6611:[INFO][Loading Profiles]: Profiles loaded with success.
cloudmig:6611:[INFO][Loading Status] Starting status loading...
cloudmig:6611:[ERR][Loading Status/Exists] Could not list open status store path(opendir): DPL_ENOENT
cloudmig:6611:[INFO][Creating Status Store] Status Store not found. Creating...
mkdir: File exists
cloudmig:6611:[ERR][Creating Status Store] Could not create store(directory): DPL_EEXIST
Currently, the names in the status files are stored with a precise length.
If the length is 0 mod 4, there is no nul terminating character.
We should improve the binary status file formats in order to make manipulation of those files easier. (That would probably help cleaning code too)
The application is to be used by system administrators (mainly, at least).
Instead of giving them a shitty log in the form of an output, we should log everything that's happening with syslog.
Currently, there are 3 verbose modes :
DEBUG (with verbose option, includes FULL droplet library tracing)
INFO (default)
WARN (quiet mode)
It would be nice to choose whether to activate the droplet library tracing or not, with a more flexible option.
Currently, the viewer can see any migration going on, may it be by the same user or not.
Setting permissions over the directories conitainins the socket file should be enough to prevent this behavior, and limit the viewer to what it should access.
When the destination isn't directly accessable, cloudmig reports a successful migration when no files were transferred.
The transfer fails properly if the source is not accessible. This only occurs when destination is not accessible.
Example1:
When cloudmig receives HTTP-403 during posix->RS2 migration, it results in:
cloudmig:3476:[INFO][Migrating] File 'newfile.txt' transfer failed !
cloudmig:3476:[ERR][Migrating] : Could not migrate file newfile.txt
cloudmig:3469:[INFO]Uploading digest: 0/0 objs, 1/0 bytes
cloudmig:3469:[INFO][Uploading Status Digest] Uploaded digest: { "objects": 1, "done_objects": 0, "bytes": 1024, "done_bytes": 0 }
cloudmig:3469:[INFO]Migration finished with success !
Example2:
When cloudmig receives HTTP-307 during S3->S3 migration, it results in:
cloudmig:2545:[INFO][Migrating] File '100Kfile.txt' transfer failed !
cloudmig:2545:[ERR][Migrating] : Could not migrate file 100Kfile.txt
cloudmig:2538:[INFO]Uploading digest: 0/0 objs, 1/0 bytes
cloudmig:2538:[INFO][Uploading Status Digest] Uploaded digest: { "objects": 1, "done_objects": 0, "bytes": 102400, "done_bytes": 0 }
cloudmig:2538:[INFO]Migration finished with success !
When using the "delete_source" configuration parameter, the status files are also deleted when the test completes.
While the migration is running, the status files are created properly and populated. But when the migration completes, it appears those status files are also removed, even when they are in a different file location (posix:/tmp/statusDir1) than the migration-source objects (s3:srcbucket:/)
Expected:
/tmp/statusDir1/cloudmig.s3.amazonaws.com.to.s3.amazonaws.com/
mlz1%3a%2f
mlz1%3a%2f.json
Actual:
/tmp/statusDir1/
config.json
{
"source": {
"backend": "s3",
"aws_region": "us-east-1",
"host": "s3.amazonaws.com",
"access_key": "mykey",
"secret_key": "mysecret"
},
"destination": {
"backend": "s3",
"aws_region": "us-east-1",
"host": "s3.amazonaws.com",
"access_key": "mykey",
"secret_key": "mysecret"
},
"status": {
"backend": "posix",
"base_path": "/tmp/statusDir1"
},
"cloudmig": {
"buckets": {"srcbucket:/": "dstbucket:/"},
"delete-source": true,
"worker-threads": 10,
"create-directories": true
}
}
A Cpack configuration should be added to the CMakeLists in order to be able to provide both installers
We need a dummy cloudmig to be able to load and run loadable scenarios in order to test cloudmig-view easily, and quickly. This should be part of an effort to improve testability and stability of the tools.
When migrating data from one backend to another, where the feature support is different between the two, we would like to be able to tell cloudmig not to try and migrate some specific kind of files.
For instance, the S3 backend does not support symlinks, and as such, we would like to prevent migrating any symlink during a migration to this backend, suppressing errors from symlink migrations.
We should provide an option allowing to attempt pushing raw ACL to the destination.
In case of failure, fallback to push ACL in a separate file with extension .acl.
Thus, the file Test.txt would either have its acl SET, or have a joint file Test.txt.acl containing the ACL's XML.
An option should provide the user with the possibility to choose which constraint-location to apply to the destination buckets to be created.
When a migration is done, it is possible to delete the source files, but nothing is shown to the viewer for the user to visualize how the deletion is progressing.
Currently, whenever a destination bucket name is given, the tool does not try to access the bucket when creating it.
This results in an error while creating the destination bucket where sometimes, it could merely use an existing bucket.
We should provide an option offering to push the RAW ACL's XML into a separate file.
For instance, the file foo.txt would be transfered, and the new file foo.txt.acl would contain the file's ACL's xml.
This would be the default in case of failure of the option given in ticket #8
There is an error in the computing of the path of a status directory and its temporary status files, that leads to a difference in the names of the directory created for the status of a specific bucket and the name of the directory used in the path computed for the temporary status file.
All of this happens in status_bucket.c
change the semicolon into a coma for the buckets associations
Most of the code is planned to be multithreaded one day or another, so let's just do that.
An option allowing to give a configuration file instead of command-line arguments.
written as --config-file=config_file
It would contain information about :
the comparison offset done/size gives 0 < 0 in case of directories.
Thus they are not transfered
It may be practical to have the possibility to retrieve the output report directly from the destination account/bucket.
Actually, by using the libdroplet's vfile and vdir api, a limitation occurs :
If a file with a name containing a directory-like path is to be transfered, the vfile API checks wether a file with the directory's name exists.
e.g.
The file 'bucket:/directory/foo.txt' requires a file named 'bucket:/directory/' to exist in order to be transfered.
Sadly, that is not a thing that can be expected from any provider/account, and it must be possible to transfer a file without those limitations.
currently we only put the files inside the destination buckets, without caring about what was present inside before.
We should be able to take an argument telling whether to delete its content or not.
The tool does not manage properly a ^C Interruption, possibly corrupting some files (statuses mostly).
We should manage it properly, and do a clean stop in this case.
Currently, the tool only migrates an account's content to another.
We need an option to migrate only between two buckets.
For now, at most, the application uses de DPL_VFILE_MD5 flags that allows to check the file's integrity, but doesn't prevent anyone from reading it on the way.
An option to select on-the-fly encryption would be a good idea.
Cannot migrate a single sourceBucket into 2 different destination buckets.
Below is a sample configuration that might be used for this scenario (buckets section is most important here):
{
"cloudmig": {
"buckets": {"srcbucket1:/": "dstbucket1:/", "srcbucket1:/": "dstbucket2:/"},
"create-directories": true
}
}
Expected results:
Actual results:
A workaround is to migrate the same source twice, using two different bucket configurations.
When doing a "merge"-type migration (merging a source directory into an existing and used target directory), any name-conflicting file might be overwritten without so much as a second thought.
We might want to prevent that by default but add an option to force the overwrite.
Currently, there is no proper documentation about how to use the tool:
Also some basic configurations might be useful, even a simple configuration file with all the default values set in; it would be almost like launching cloudmig with the minimal number of parameters, but everything would be written explicitly,
A resume option is to be added to the tool, but there is no way to ignore the status bucket and force restarting the transfer.
Such an option may prove to be useful.
On posix, the deletion is successful, but it finishes with segfault.
On S3, deletion is unsuccessful and it finishes with segfault.
On S3, attempting to use the "delete_source" configuration parameter results in failure and segfault.
(possibly due to decimal prefix in hostname?):
cloudmig -c configuration.json
....skipped output here....
cloudmig:7433:[INFO][Migrating] : file 100Kfile.txt migrated.
cloudmig:7426:[INFO]Uploading digest: 1/0 objs, 1/0 bytes
cloudmig:7426:[INFO][Uploading Status Digest] Uploaded digest: { "objects": 1, "done_objects": 1, "bytes": 102400, "done_bytes": 102400 }
cloudmig:7426:[INFO]Migration finished with success !
cloudmig:7426:[INFO][Deleting Source]: Starting deletion of the migration's source...
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
cloudmig:7426:[ERR][Deleting Source File] Could not delete the file 100Kfile.txt : DPL_FAILURE.
Segmentation fault (core dumped)
configuration.json:
{
"source": {
"backend": "s3",
"aws_region": "us-east-1",
"host": "s3.amazonaws.com",
"access_key": "mykey",
"secret_key": "mysecret"
},
"destination": {
"backend": "s3",
"aws_region": "us-east-1",
"host": "s3.amazonaws.com",
"access_key": "mykey",
"secret_key": "mysecret"
},
"status": {
"backend": "posix",
"base_path": "/tmp/statusDir1"
},
"cloudmig": {
"buckets": {"srcbucket:/": "dstbucket:/"},
"delete-source": true,
"worker-threads": 10,
"create-directories": true
}
}
Using the delete_source configuration option does not seem to delete the source.
It fails with ' Failed to lookup hostname ".s3.amazonaws.com" '
Error output:
cloudmig:5944:[INFO][Migrating] File '100Kfile.txt' transfer succeeded !
unlink: No such file or directory
cloudmig:5944:[INFO][Migrating] : file 100Kfile.txt migrated.
cloudmig:5937:[INFO]Uploading digest: 1/0 objs, 1/0 bytes
cloudmig:5937:[INFO][Uploading Status Digest] Uploaded digest: { "objects": 1, "done_objects": 1, "bytes": 102400, "done_bytes": 102400 }
cloudmig:5937:[INFO]Migration finished with success !
cloudmig:5937:[INFO][Deleting Source]: Starting deletion of the migration's source...
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
cloudmig:5937:[ERR][Deleting Source File] Could not delete the file 100Kfile.txt : DPL_FAILURE.
cloudmig:5937:[INFO][Deleting Source]: Deletion of the migration's source done.
cloudmig:5937:[STATUS]End of data migration. During this session :
Transfered 1 objects, totaling 1/1 objects.
Transfered 102400 Bytes, totaling 102400/102400 Bytes.
Average transfer speed : 102400 Bytes/s.
Transfer Duration : 0d0h0m1s.
configuration.json
{
"source": {
"backend": "s3",
"aws_region": "us-east-1",
"host": "s3.amazonaws.com",
"access_key": "mykey",
"secret_key": "mysecret"
},
"destination": {
"backend": "s3",
"aws_region": "us-east-1",
"host": "s3.amazonaws.com",
"access_key": "mykey",
"secret_key": "mysecret"
},
"status": {
"backend": "posix",
"base_path": "/tmp/statusDir1"
},
"cloudmig": {
"buckets": {"srcbucket:/": "dstbucket:/"},
"delete-source": true,
"worker-threads": 10,
"create-directories": true
}
}
Currently, the user can only visualize the transfer of the files.
We want to add the status building into those, for the user to be able to know how far the scan is along; and ensure that nothing is blocked.
An option allowing to give the name of the status bucket to be used would prevent problems caused by the bucket names limitations (255 chars, ...).
A fully understandable and usable manpage and help message are needed for the tool.
The source files' deletion is currently driven by the main thread, while it could benefit a lot by having all the available threads work to delete files concurrently.
Currently, the source data is automatically removed at the end of the transfer.
It should not be by default, but be deleted only when an option is given to the program : --delete-source
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.