cobbzilla / s3s3mirror Goto Github PK

View Code? Open in Web Editor NEW

420.0 17.0 90.0 234.89 MB

Mirror one S3 bucket to another S3 bucket, or to/from the local filesystem.

License: Other

Shell 0.68% Java 99.16% Batchfile 0.16%

s3s3mirror's Introduction

s3s3mirror

A utility for mirroring content from one S3 bucket to another.

Designed to be lightning-fast and highly concurrent, with modest CPU and memory requirements.

An object will be copied if and only if at least one of the following holds true:

The object does not exist in the destination bucket.
The "sync strategy" triggers (by default uses the Etag sync strategy)
- Etag Strategy (Default): If the size or Etags don't match between the source and destination bucket.
- Size Strategy: If the sizes don't match between the source and destination bucket.
- Size and Last Modified Strategy: If the source and destination objects have a different size, or the source bucket object has a more recent last modified date.

When copying, the source metadata and ACL lists are also copied to the destination object.

Note: the 2.1-stable branch supports copying to/from local directories.

Motivation

I started with "s3cmd sync" but found that with buckets containing many thousands of objects, it was incredibly slow to start and consumed massive amounts of memory. So I designed s3s3mirror to start copying immediately with an intelligently chosen "chunk size" and to operate in a highly-threaded, streaming fashion, so memory requirements are much lower.

Running with 100 threads, I found the gating factor to be how fast I could list items from the source bucket (!?!) Which makes me wonder if there is any way to do this faster. I'm sure there must be, but this is pretty damn fast.

AWS Credentials

s3s3mirror will first look for credentials in your system environment. If variables named AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are defined, then these will be used.
Next, it checks for a ~/.s3cfg file (which you might have for using s3cmd). If present, the access key and secret key are read from there.
IAM Roles can be used on EC2 instances by specifying the --iam flag
If none of the above is found, it will error out and refuse to run.

System Requirements

Java 8 or higher

Building

mvn package

Note that s3s3mirror now has a prebuilt jar checked in to github, so you'll only need to do this if you've been playing with the source code. The above command requires that Maven 3 is installed.

License

s3s3mirror is available under the Apache 2.0 License.

Usage

s3s3mirror.sh [options] <source-bucket>[/src-prefix/path/...] <destination-bucket>[/dest-prefix/path/...]

Versions

The 1.x branch (currently master) has been in use by the most number of people and is the most battle tested.

The 2.x branch supports copying between S3 and any local filesystem. It has seen heavy use and performs well, but is not as widely used as the 1.x branch.

In the near future, the 1.x branch will offshoot from master, and the 2.x branch will be merged into master. There are a handful of features on the 1.x branch that have not yet been ported to 2.x. If you can live without them, I encourage you to use the 2.x branch. If you really need them, I encourage you to port them to the 2.x branch, if you have the ability.

Options

-c (--ctime) N           : Only copy objects whose Last-Modified date is younger than this many days
                           For other time units, use these suffixes: y (years), M (months), d (days), w (weeks),
                                                                     h (hours), m (minutes), s (seconds)
-i (--iam) : Attempt to use IAM Role if invoked on an EC2 instance
-P (--profile) VAL        : Use a specific profile from your credential file (~/.aws/config)
-m (--max-connections) N  : Maximum number of connections to S3 (default 100)
-n (--dry-run)            : Do not actually do anything, but show what would be done (default false)
-r (--max-retries) N      : Maximum number of retries for S3 requests (default 5)
-p (--prefix) VAL         : Only copy objects whose keys start with this prefix
-d (--dest-prefix) VAL    : Destination prefix (replacing the one specified in --prefix, if any)
-e (--endpoint) VAL       : AWS endpoint to use (or set AWS_ENDPOINT in your environment)
-X (--delete-removed)     : Delete objects from the destination bucket if they do not exist in the source bucket
-t (--max-threads) N      : Maximum number of threads (default 100)
-v (--verbose)            : Verbose output (default false)
-z (--proxy) VAL          : host:port of proxy server to use.
                            Defaults to proxy_host and proxy_port defined in ~/.s3cfg,
                            or no proxy if these values are not found in ~/.s3cfg
-u (--upload-part-size) N : The upload size (in bytes) of each part uploaded as part of a multipart request
                            for files that are greater than the max allowed file size of 5368709120 bytes (5 GB)
                            Defaults to 4294967296 bytes (4 GB)
-C (--cross-account-copy) : Copy across AWS accounts. Only Resource-based policies are supported (as
                            specified by AWS documentation) for cross account copying
                            Default is false (copying within same account, preserving ACLs across copies)
                            If this option is active, the owner of the destination bucket will receive full control
                            
-s (--ssl)                    : Use SSL for all S3 api operations (default false)
-E (--server-side-encryption) : Enable AWS managed server-side encryption (default false)
-l (--storage-class)		  : S3 storage class "Standard" or "ReducedRedundancy" (default Standard)
-S (--size-only)              : Only takes size of objects in consideration when determining if a copy is required.
-L (--size-and-last-modified) : Uses size and last modified to determine if files have change like the AWS CLI and ignores etags. If -S (--size-only) is also specified that strategy is selected over this strategy.

Examples

Copy everything from a bucket named "source" to another bucket named "dest"

s3s3mirror.sh source dest

Copy everything from "source" to "dest", but only copy objects created or modified within the past week

s3s3mirror.sh -c 7 source dest
s3s3mirror.sh -c 7d source dest
s3s3mirror.sh -c 1w source dest
s3s3mirror.sh --ctime 1w source dest

Copy everything from "source/foo" to "dest/bar"

s3s3mirror.sh source/foo dest/bar
s3s3mirror.sh -p foo -d bar source dest

Copy everything from "source/foo" to "dest/bar" and delete anything in "dest/bar" that does not exist in "source/foo"

s3s3mirror.sh -X source/foo dest/bar
s3s3mirror.sh --delete-removed source/foo dest/bar
s3s3mirror.sh -p foo -d bar -X source dest
s3s3mirror.sh -p foo -d bar --delete-removed source dest

Copy within a single bucket -- copy everything from "source/foo" to "source/bar"

s3s3mirror.sh source/foo source/bar
s3s3mirror.sh -p foo -d bar source source

BAD IDEA: If copying within a single bucket, do not put the destination below the source

s3s3mirror.sh source/foo source/foo/subfolder
s3s3mirror.sh -p foo -d foo/subfolder source source

This might cause recursion and raise your AWS bill unnecessarily

If you've enjoyed using s3s3mirror and are looking for a warm-fuzzy feeling, consider dropping a little somethin' into my tip jar

s3s3mirror's People

Contributors

Stargazers

Watchers

Forkers

winzig satya-ak jogaco ghostrocket anishek alnafie lobsterdore pmoust chenshasha asteasolutions bitsofinfo wimnat inspirent dennem andrewjhumphrey hybridknight frankchn meredithcorposs forkonly dbuos egeland ambled ianneub chexov styletag akatasonov daldei fairfaxmedia alexandruast elmobp pekermert candlescience eastlondoner ashish-kalbhor kyroskoh treerootboy klamouri maksimu alinvasile jobvite timor-raiman elliotmoore db82407 danfink lysakaleksey bchalamayya kuthz gschneider-r7 palleas lcheng6 vivekjhaver edgecaseberg indshashank elemental-lf reza ibuystuff tanvir-ahmed-m4 guoyu07 mparaz zimingd garimabathla jkailasam devendrasr nikolabravo brodax jkuester sedinc frankaltobelli ntman4real rahulbsw leafly-com kchaitu4 customink-webops dumpsterfirevip weblogistics baiyongzhen customink mastercontrolincpublic undercovertourist jingood2 sammso agomezrominsait mbcortx screensteps d-kononov aetion

s3s3mirror's Issues

Support for "--delete-removed"

This is awesome. Thank you for building this. I was wondering if you had considered supporting a "--delete-removed" option in the same way rsync and s3cmd does?

I'd be happy to try and take a stab at it, but I'm not sure what the best approach would be from an algorithmic standpoint.

Limit on number of s3 objects read / copied?

I've been trying to copy this huge bucket to another aws account and have been hit by this wall every single time at the same number read:

pool-1-thread-1 ERROR: org.cobbzilla.s3s3mirror.KeyLister - exception listing objects (try #0): com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
pool-1-thread-1 ERROR: org.cobbzilla.s3s3mirror.KeyLister - exception listing objects (try #1): com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
pool-1-thread-1 ERROR: org.cobbzilla.s3s3mirror.KeyLister - exception listing objects (try #2): com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
pool-1-thread-1 ERROR: org.cobbzilla.s3s3mirror.KeyLister - exception listing objects (try #3): com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
pool-1-thread-1 ERROR: org.cobbzilla.s3s3mirror.KeyLister - exception listing objects (try #4): com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
pool-1-thread-1 ERROR: org.cobbzilla.s3s3mirror.KeyLister - Error in run loop, KeyLister thread now exiting: java.lang.IllegalStateException: Too many errors trying to list objects (maxRetries=5)
main INFO : org.cobbzilla.s3s3mirror.MirrorMaster - mirror: completed
main INFO : org.cobbzilla.s3s3mirror.KeyMaster - stopping S3CopyMaster...
main INFO : org.cobbzilla.s3s3mirror.KeyMaster - S3CopyMaster stopped
Thread-1 INFO : org.cobbzilla.s3s3mirror.MirrorStats -
--------------------------------------------------------------------
STATS BEGIN
read: 11112700
copied: 0
copy errors: 0
uploaded: 0
upload errors: 0
deleted: 0
delete errors: 0
duration: 1:31:50
read rate: 120994.4613026119/minute
copy+upload rate: 0.0/minute
delete rate: 0.0/minute
bytes copied: 0 bytes
bytes copied: 0 bytes
GET operations: 111132
COPY operations: 0
PUT operations: 0
DELETE operations: 0
STATS END
--------------------------------------------------------------------

Looks like it might be an issue with the bundled amazon sdk? Not sure if anyone's ever run into this before. Just want to check to see if there indeed is a limit on how many objects that can be read/copied in one go. Thanks. I'm using: VERSION=2.0.2

Incorrect content type

I've run into an intermittent issue with content types.

I am copying images between two folders in the same bucket, the images in the source folder have valid content type meta data, but they do not have file extensions at all. For the most part s3s3mirror copies images across fine, but I am occasionally encountering images that have the default content type (octet stream) in the destination folder, this is when the source image has a valid content type.

How is s3s3mirror determining the content type? From S3's metadata? If so then it's strange that it's getting it wrong on occasion.

Question/RFE

I ran across this after aws-s3-mount lead to s3-bucketloader lead to here.
I like that this explicilty mentions checking ETags.

Questions:

Are multi-part ETag comparisons attempted?
--> I know this is only possible if you know the block size of the original transfer
--> Have you considered storing something like the cannonical ETag or a SHA1 as metadata?
Is User Metatadata copied aka "PUT COPY" ? or is it lost ?
Is there versioning support ? I.e. if Bucket1/File1 has 3 versions are they all copied or just the lastest?
Have you compared this to the recent cross-region bucket copy feature of AWS ?
--> Great feature but only works on newly created files and cross region.

Avoid repeatedly listing the listed blobs in last run

Hi Cobbzilla,

I am not sure how expensive to list blobs in S3, but it seems too much overhead to list those already handled keys in the following run of s3s3mirror.
Suppose you have 100 million object in your source bucket with 1k objects growth every. And you are running s3s3mirror in a frequency of once per day. Then you will end up with re-listing ~100 million extra object every day after the first run of s3s3mirror.

Do you have plan to improve this?

Thank you.

-Shasha

Bucket does not exist error

When I run this command

s3s3mirror.sh -c 90d s3://companyname.com-webappdocs s3://companyname.com-webappdocs-replication

I get this error
main INFO : org.cobbzilla.s3s3mirror.MirrorMaster - version 2.0.2 starting
Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - getFirstBatch: error listing (try #0): com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: 32A860EC2092A184), S3 Extended Request ID: UlRBIKWracQNks5C11jsNVulXwPsASu2dA4WYXEeOQPGrJbr98t1Ku7dcGqEhyfN

The buckets are in different regions but the names are correct. When using the aws sync command I have to specify region with --- region. Is there an equivalent option with S3 Mirror?

No copy operations performed, even when destination is completely empty

I have a backups bucket in us-west-2 that I want to mirror to another bucket in us-east-1 for disaster recovery purposes. It contains ~25000 files consuming ~2TB.

The key structure is:
YYYMMDD/CUSTOMER/TYPE/backup_file_name

I'm running s3s3mirror on an Ubuntu 12.04.3 LTS server, using the master branch of s3s3mirror.

I've got s3s3mirror using the credentials file from s3cmd (which I'm attempting to replace with s3s3mirror).

When I run it, here's what I get:

ubuntu@prod-logstash:/opt/s3s3mirror$ ./s3s3mirror.sh source_backups_bucket destination_backups_bucket
pool-1-thread-1 INFO : org.cobbzilla.s3s3mirror.KeyLister - starting...
pool-1-thread-1 INFO : org.cobbzilla.s3s3mirror.MirrorStats -
--------------------------------------------------------------------
STATS BEGIN
read: 10000
copied: 0
copy errors: 0
duration: 0:00:11
read rate: 50386.29492777964/minute
copy rate: 0.0/minute
bytes copied: 0 bytes
GET operations: 10066
COPY operations: 0
STATS END
--------------------------------------------------------------------

pool-1-thread-1 INFO : org.cobbzilla.s3s3mirror.MirrorStats -
--------------------------------------------------------------------
STATS BEGIN
read: 20000
copied: 0
copy errors: 0
duration: 0:00:17
read rate: 67613.2521974307/minute
copy rate: 0.0/minute
bytes copied: 0 bytes
GET operations: 19992
COPY operations: 0
STATS END
--------------------------------------------------------------------

pool-1-thread-1 INFO : org.cobbzilla.s3s3mirror.KeyLister - No more keys found in source bucket, exiting
Thread-1 INFO : org.cobbzilla.s3s3mirror.MirrorStats -
--------------------------------------------------------------------
STATS BEGIN
read: 25655
copied: 0
copy errors: 0
duration: 0:00:20
read rate: 73965.6912209889/minute
copy rate: 0.0/minute
bytes copied: 0 bytes
GET operations: 25912
COPY operations: 0
STATS END
--------------------------------------------------------------------

Nothing copied, even though there are several hundred files in the source that don't exist in the destination. Weird. I thought, perhaps it's an issue crossing regions, so I created another bucket in the same region and tried again. Same result: nothing copied, even though the destination is now a completely empty bucket.

What gives?

Doesn't run when in folder with space in name

Tried running from
"/Users/username/Sites/tmp/server scripts/s3s3mirror/"

But kept getting error:
Unable to access jarfile /Users/username/Sites/tmp/server

Renaming "server scripts" to "server_scripts" fixed it.

Authentication Issue

I'm receiving an error using s3s3mirror. I do have the environment variables configured as specified in the README, and I also tried using the ~/.s3cfg file. This is the output I receive

$ ./s3s3mirror.sh <bucket>:<source-folder>/ <bucket>:<destination-folder>/
main INFO : org.cobbzilla.s3s3mirror.MirrorMain - Adding shutdown hook
Exception in thread "main" Status Code: 403, AWS Service: Amazon S3, AWS Request ID: X, AWS Error Code: SignatureDoesNotMatch, AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method., S3 Extended Request ID: X
        at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:583)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:317)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:2829)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:2801)
        at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:451)
        at org.cobbzilla.s3s3mirror.KeyLister.<init>(KeyLister.java:37)
        at org.cobbzilla.s3s3mirror.MirrorMaster.mirror(MirrorMaster.java:37)
        at org.cobbzilla.s3s3mirror.MirrorMain.run(MirrorMain.java:45)
        at org.cobbzilla.s3s3mirror.MirrorMain.main(MirrorMain.java:28)
Thread-1 INFO : org.cobbzilla.s3s3mirror.MirrorStats - 
--------------------------------------------------------------------
STATS BEGIN
read: 0
copied: 0
copy errors: 0
duration: 0:00:00
read rate: 0.0/minute
copy rate: 0.0/minute
STATS END

The bucket is the same for the source and the destination. I saw that someone had written a patch in a previous issue to address for intra-bucket copying, so I'm guessing s3s3mirror doesn't support that? The error message I received didn't appear to be related to that, and since the pull request was never issued, I was skeptical of it.

Does this look like the error one would receiving for intra-bucket copying, or is there something else going on here? Let me know if there's any additional info I can provide. Thanks!

Pricing information

One thing that would be really useful is if there was a way to have the new STATS section output the running tally on the things that could cost money, broken down the way Amazon breaks them down in the pricing area. Right now it's sort of a black-box to me when I'm running it. I can see that it read 500K objects in our bucket, but I'm not sure precisely how that translates to the different request types. And I'm guessing based on the options passed into the utility (such as -c), that the effect could change based on filtering by metadata allowing you to skip files instead of comparing.

So e.g. if there was additional stats listing the following, it would be very helpful:

PUT, COPY, POST, or LIST Requests
GET and all other Requests
GB copied (probably not costing the user, but still interesting)

Thanks!

Can s3s3mirror accept a list of files to copy?

Let's say I have:

mybucketA/images/1.png
mybucketA/images/a.png
mybucketA/sounds/2.mp3
mybucketA/sounds/b.mp3
mybucketA/videos/3.mp4
mybucketA/videos/c.mp4

Realistically, there are 20,000 files here.

And I run some script whose job is to output the files I need for some other process and it tells me that this is what I need in bucket B:

mybucketB/images/1.png
mybucketB/images/a.png
mybucketB/videos/c.mp4

Is there a way to make s3s3mirror take in a list of these files and copy just those, currently? I'd like to leverage the concurrency here since this was a snap to copy 1.5GB worth of data, over maybe 17,000 files or so.

compiled java does not support --iam

FYI
The prebuilt jar is: s3s3mirror-1.2.5-SNAPSHOT.jar which does not support --iam as documented.

thanks for your work on this project

ells

Bucket with millions of objects doesn't get copied in one run

I'm using s3s3mirror to copy the contents of a bucket that contains 2-3 million objects. I don't know the exact number, but an Elasticsearch index that indexed the contents of the bucket has ~2.6 million documents.

When I first ran s3s3mirror, it only copied about 110,000 objects. It reported no copy errors. I ran it again, and it copied about another 110k. Now it's running again, and still finding more objects to copy.

I don't think the 110k thing is a limitation, because I've copied other buckets with more objects than that. Still, it's peculiar that it keeps finding more objects every time I run it.

I do see the occasional HTTP timeout error as the output is scrolling by. It's thrown from KeyJob, line 52:

pool-1-thread-74 INFO : com.amazonaws.http.AmazonHttpClient - Unable to execute HTTP request: Connect to [bucket].s3.amazonaws.com/[bucket].s3.amazonaws.com/[S3 IP] timed out
org.apache.http.conn.ConnectTimeoutException: Connect to [bucket].s3.amazonaws.com/[bucket].s3.amazonaws.com/[S3 IP] timed out

No other errors that I can see, and it doesn't report any copy errors in the stats that are printed at the end.

Nothing is jumping out at me in a cursory review of the source code, but I'm looking more closely at your KeyLister and the ObjectListing AWS SDK class to see if that could be the culprit.

Some already copied files are being copied again (E-tag is not the same for multipart)

Steps to reproduce:
Bucket A - 100k+ objects
Bucket B - empty
run s3s3mirror A to B
the log reports that all files have been copied
list both buckets contents with aws s3 ls --recursive, parse key names, make diff, conclude the buckets are the same
run s3s3mirror A to B again
In my specific case, 2730 files (70GB+) ended up being copied again, even if they were the same

STATS BEGIN
read: 118642
copied: 2730
copy errors: 0
deleted: 0
delete errors: 0
duration: 0:00:57
read rate: 122925.97005646791/minute
copy rate: 2828.575869035901/minute
delete rate: 0.0/minute
bytes copied: 73.12731052096933 GB (78519851783 bytes)
GET operations: 124221
COPY operations: 2730
DELETE operations: 0
STATS END

run s3s3mirror A to B again one more time:

STATS BEGIN
read: 118642
copied: 2730
copy errors: 0
deleted: 0
delete errors: 0
duration: 0:01:09
read rate: 102581.20298584891/minute
copy rate: 2360.4346197077557/minute
delete rate: 0.0/minute
bytes copied: 73.12731052096933 GB (78519851783 bytes)
GET operations: 124221
COPY operations: 2730
DELETE operations: 0
STATS END

From the logs, I can see that the same files are being copied again and again.

Update1:
files that ended up being copied again are those having file sizes between DEFAULT_PART_SIZE and MAX_SINGLE_REQUEST_UPLOAD_FILE_SIZE
I modified those values in the code because 4GB was too heavy, and the S3 recommended chunk size is way less than that value.
So, if I use 16MB for DEFAULT_PART_SIZE and 32MB for MAX_SINGLE_REQUEST_UPLOAD_FILE_SIZE, files that have sizes between these values are copied again.

Update2:
It seems that changing back values to default does not change the outcome.

How well does s3s3mirror scale?

I just stumbled across this script and is by far the best tool I have found so far for copying S3 buckets.

My question is how well this tool works on extremely large sets of data? For example I have several million files and thousands of directories I am trying to sync. Do you have any experience with such a large amount of data?

Maybe you can comment or have some tips on how to handle this kind of situation?

The -c option doesn't seem to be working correctly

In order to only copy files added to our bucket in the last 24 hours, I'm running with the parameter: -c 1

Yet I've just added a file to our bucket, and run it again with this command, and no copy operations are happening (the file does not exist in the destination either).

The Last Modified timestamp for the file shows:

Wed Dec 04 16:05:41 GMT-800 2013

And as I write this it's 16:22 GMT-800 (4:22PM Pacific), so the Last Modified is definitely within the last 24 hours.

Log output to file

As far as I can tell the only output is to stdout.

It would be nice if there was a way to log the final results to a log file, maybe a flag that could be set to log to file? How hard would something like that be to do? Love this tool btw.

Support different time units for -c option

@winzig had asked for this as part of another issue. I'm recording it here as a separate issue. Here's what he originally wrote:

Incidentally, it would be awesome if the -c parameter could be specified with an indicator as to the period, and just defaulting to days if none is specified, for backwards compatibility. This would allow us to specify at a different granularity, such as just the last two hours: -c 2h

Multipart upload does not scale

The multipart upload does not scale well.
For example, on a bucket with 120k objects, first 500GB and 119k+ objects get copied in 30 minutes.
There are less than 20 files left with sizes over 5GB. Those take forever to copy, because the chunks are not processed in parallel (almost 60 minutes for remaining 100GB)

Keep getting exceptions trying to move from one bucket to another.

I keep getting the following exceptions.

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we calculated does not match the signature you provided. Check your key and signing method.

I have set env variables for our access key and secret access key that work with aws-cli. But not sure what else could be needed.

./s3s3mirror.sh https:/bucket1NameHere.s3.amazonaws.com/folderName https://bucket2NameHere.s3.amazonaws.com/folderName

Thanks

Mark

Don't log full stack traces for operations that will be retried

I'm seeing this error when attempting to copy between two buckets. I see the exception happen a bunch of times consecutively, then it'll stop and everything continues normally. It appears the copy finishes without any issues, as the final stats output reports no errors.

The exception:

pool-1-thread-5 INFO : com.amazonaws.http.AmazonHttpClient - Unable to execute HTTP request: Connection to http://<BUCKET>.s3.amazonaws.com refused
org.apache.http.conn.HttpHostConnectException: Connection to http://<BUCKET>.s3.amazonaws.com refused
        at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
        at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
        at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:641)
        at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
        at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:635)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:429)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:291)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3655)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:996)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:974)
        at org.cobbzilla.s3s3mirror.KeyJob.getObjectMetadata(KeyJob.java:33)
        at org.cobbzilla.s3s3mirror.KeyCopyJob.run(KeyCopyJob.java:37)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Cannot assign requested address
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
        at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
        ... 18 more

The final stats:

--------------------------------------------------------------------
STATS BEGIN
read: 309101
copied: 309101
copy errors: 0
deleted: 0
delete errors: 0
duration: 0:11:23
read rate: 27141.661276207145/minute
copy rate: 27141.661276207145/minute
delete rate: 0.0/minute
bytes copied: 10.982177948579192 GB (11792023782 bytes)
GET operations: 931170
COPY operations: 309102
DELETE operations: 0
STATS END
--------------------------------------------------------------------

Running on a fresh m3.xlarge instance in the same region as both of the buckets.

Estimate the cost of a mirroring job

Would it be possible to look at the contents of the source s3 bucket and estimate the cost of a full mirroring job if the region is supplied?

Is max-threads really using the value from max-connections?

The argument description states: "-t (--max-threads) N : Maximum number of threads (default is same as --max-connections)," yet from tests I ran on Windows, it seems as though this may not be happening.

I tested with a command line of:

s3s3mirror -c 1 -m 1000 source destination

And I was seeing stats updates every 10,000 reads.

I then ran:

s3s3mirror -c 1 -m 1000 -t 1000 source destination

And I was seeing stats updates every 100,000 reads.

If -t defaults to the value passed into -m, as the description states, then it seems like it should have been the same results each time, right?

allow prefix and dest-prefix to be specified as part of source and dest

Instead of:

s3s3mirror.sh -p foo -d bar source dest

It would be more user-friendly to support a simpler and more intuitive syntax like:

s3s3mirror.sh source/foo dest/bar

Unable Build the source code due to missing Jar's and other dependencies

Respected cobbzilla Team,

      I  downloaded your  s3s3mirror Project from Github for copying objects from one bucket to another.But due to some reason like log customization etc . So we want whole source code .But source code you provided on Github is not building Properly due to missing Jar's and other dependencies  So please give some solution

Unexpectedly returns exit status 0 on error

First, thanks for creating this tool, it's very useful!

I'm using s3s3mirror-2.0.2-SNAPSHOT.jar from the 2.0-stable branch.

Java info:

$ java -version
java version "1.6.0_35"
OpenJDK Runtime Environment (IcedTea6 1.13.7) (6b35-1.13.7-1ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)

If I just run the included s3s3mirror.sh script, giving no arguments, the exit code (after displaying the usage info) is 1, which is fair enough.

However, if I try to use the tool with a deliberately wrong set of AWS keys, the exit code (confusingly) is 0.

The same happens if I just use the jar directly:

$ java -jar s3s3mirror-2.0.2-SNAPSHOT.jar --max-connections 500 --max-threads 500 --endpoint s3.amazonaws.com redacted-source redacted-target
main INFO : org.cobbzilla.s3s3mirror.MirrorMaster - version null starting
Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - getFirstBatch: error listing (try #0): com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Stat
us Code: 403; Error Code: InvalidAccessKeyId; Request ID: 2C95D899071B81BF), S3 Extended Request ID: RYh8tAhvU9TYbeKITTtbCV9Ljl0kNyN8W7yhQCbgHH4vYoxYhDE6KuSDCC4wj+dQ1ck5i6dz53A=
Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - getFirstBatch: error listing (try #1): com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Stat
us Code: 403; Error Code: InvalidAccessKeyId; Request ID: 2B74075641A9B698), S3 Extended Request ID: 0TLLVEAYuMWESQOPAB5aXINpBRvhGe86YJ1QsLDrIT5MhNa3c2qPGgc6lIkA9WweCfQ2DG7RtNQ=
Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - getFirstBatch: error listing (try #2): com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: C171D99635380E2D), S3 Extended Request ID: FE0HteSnYfuiQz7J83lzyx32O0Yl8yJxCy/3c1L0BACVYlfOPSbYHJhcH+QwkISWrUIhMZS1IzQ=
Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - getFirstBatch: error listing (try #3): com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: B185D23FF91A09F4), S3 Extended Request ID: PYvIU1mf/P0BcrmyZxqZFKKaZ0ZbybooUhsSDsczT3/RiUwau5KbIBBoqwy4k/WyN37F/xceGy8=
Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - getFirstBatch: error listing (try #4): com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 8C81A8830D6D9DAD), S3 Extended Request ID: ef8SJMsejbKYd9DA+v6VBU8tpyo6zU8IkXsjaNWLX52hCUyE5irRAe81nXfNnMME1zdnoXXWxV4=
Thread-2 ERROR: org.cobbzilla.s3s3mirror.KeyMaster - Unexpected exception in MirrorMaster (org.cobbzilla.s3s3mirror.store.s3.master.S3CopyMaster): java.lang.IllegalStateException: getFirstBatch: error listing: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 8C81A8830D6D9DAD), S3 Extended Request ID: ef8SJMsejbKYd9DA+v6VBU8tpyo6zU8IkXsjaNWLX52hCUyE5irRAe81nXfNnMME1zdnoXXWxV4=
java.lang.IllegalStateException: getFirstBatch: error listing: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 8C81A8830D6D9DAD), S3 Extended Request ID: ef8SJMsejbKYd9DA+v6VBU8tpyo6zU8IkXsjaNWLX52hCUyE5irRAe81nXfNnMME1zdnoXXWxV4=
        at org.cobbzilla.s3s3mirror.KeyLister.getFirstBatch(KeyLister.java:107)
        at org.cobbzilla.s3s3mirror.KeyLister.<init>(KeyLister.java:36)
        at org.cobbzilla.s3s3mirror.KeyMaster.run(KeyMaster.java:80)
        at java.lang.Thread.run(Thread.java:701)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 8C81A8830D6D9DAD), S3 Extended Request ID: ef8SJMsejbKYd9DA+v6VBU8tpyo6zU8IkXsjaNWLX52hCUyE5irRAe81nXfNnMME1zdnoXXWxV4=
        at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1020)
        at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:675)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:429)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:291)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3655)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3607)
        at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:623)
        at org.cobbzilla.s3s3mirror.store.s3.S3FileStore.listObjects(S3FileStore.java:32)
        at org.cobbzilla.s3s3mirror.KeyLister.getFirstBatch(KeyLister.java:94)
        ... 3 more
main INFO : org.cobbzilla.s3s3mirror.MirrorMaster - mirror: completed
main INFO : org.cobbzilla.s3s3mirror.KeyMaster - stopping S3CopyMaster...
main INFO : org.cobbzilla.s3s3mirror.KeyMaster - S3CopyMaster stopped
Thread-1 INFO : org.cobbzilla.s3s3mirror.MirrorStats - 
--------------------------------------------------------------------
STATS BEGIN
read: 0
copied: 0
copy errors: 0
uploaded: 0
upload errors: 0
deleted: 0
delete errors: 0
duration: 0:00:06
read rate: 0.0/minute
copy+upload rate: 0.0/minute
delete rate: 0.0/minute
bytes copied: 0 bytes
bytes copied: 0 bytes
GET operations: 5
COPY operations: 0
PUT operations: 0
DELETE operations: 0
STATS END
--------------------------------------------------------------------
$ echo $?
0

Would you mind making it exit with 1 when errors are encountered, please? (Or at least non-0?)
Otherwise scripting becomes very hard..

AWS Signature Version 4 algorithm authentication problem

s3s3mirror failed to access S3 bucket in Frankfurt.

From AWS site: "In the China (Beijing), EU (Frankfurt) and Asia Pacific (Seoul) regions, Amazon S3 supports only Signature Version 4. In all other regions, Amazon S3 supports both Signature Version 4 and Signature Version 2."

More information about this issue can be found here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html#specify-signature-version

The error message:

main INFO : org.cobbzilla.s3s3mirror.MirrorMaster - version 1.2.5 starting
Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - s3getFirstBatch: error listing (try #0): com.amazonaws.services.s3.model.AmazonS3Exception: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256. (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID: 3072471487FBF916), S3 Extended Request ID: bM0ur9Ibfl7oTCKolt3h/KBe9DDB/mzCa6rZ8Cah+hYCZC+bPBuNiN+Qoaic/vhOlch6vdK6URg=
Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - s3getFirstBatch: error listing (try #1): com.amazonaws.services.s3.model.AmazonS3Exception: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256. (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID: 8E8145B795F73407), S3 Extended Request ID: gpyKykynb71nD7KgknPff9W6WR/9tit7xAf2sMecjtsF4wd9kQLpdhsnJz+kL1shqO+X09sLF3Y=
Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - s3getFirstBatch: error listing (try #2): com.amazonaws.services.s3.model.AmazonS3Exception: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256. (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID: C5FD86C5E5EDE39C), S3 Extended Request ID: owvOKhepn4l4YWB23xP7fnwgTwkNQOOzX6mE21mvQCJMwsgWZy/uTLtX75KLrRpILLQ44ohsMKY=
Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - s3getFirstBatch: error listing (try #3): com.amazonaws.services.s3.model.AmazonS3Exception: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256. (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID: 2AAD2A3775A0D7E6), S3 Extended Request ID: 2NDIp2Z5EWynnH82wqCxtCdQpvK+GGbMFMQ8B0wu9kljvlMGS8OojumsuCQUjnJlOhApcoAksDE=
Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - s3getFirstBatch: error listing (try #4): com.amazonaws.services.s3.model.AmazonS3Exception: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256. (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID: F9686BB975A72741), S3 Extended Request ID: 80xvNi3MB7XTU7BpjDY7AZ+OY9eJXasWakIS35tnlevhFct6D9m8tLFtLsjmzbY0+GzqsMZMkfQ=
Thread-2 ERROR: org.cobbzilla.s3s3mirror.KeyMaster - Unexpected exception in MirrorMaster: java.lang.IllegalStateException: s3getFirstBatch: error listing: com.amazonaws.services.s3.model.AmazonS3Exception: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256. (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID: F9686BB975A72741), S3 Extended Request ID: 80xvNi3MB7XTU7BpjDY7AZ+OY9eJXasWakIS35tnlevhFct6D9m8tLFtLsjmzbY0+GzqsMZMkfQ=
java.lang.IllegalStateException: s3getFirstBatch: error listing: com.amazonaws.services.s3.model.AmazonS3Exception: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256. (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID: F9686BB975A72741), S3 Extended Request ID: 80xvNi3MB7XTU7BpjDY7AZ+OY9eJXasWakIS35tnlevhFct6D9m8tLFtLsjmzbY0+GzqsMZMkfQ=
at org.cobbzilla.s3s3mirror.KeyLister.s3getFirstBatch(KeyLister.java:109)
at org.cobbzilla.s3s3mirror.KeyLister.(KeyLister.java:37)
at org.cobbzilla.s3s3mirror.KeyMaster.run(KeyMaster.java:80)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256. (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID: F9686BB975A72741), S3 Extended Request ID: 80xvNi3MB7XTU7BpjDY7AZ+OY9eJXasWakIS35tnlevhFct6D9m8tLFtLsjmzbY0+GzqsMZMkfQ=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1020)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:675)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:429)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:291)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3655)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3607)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:623)
at org.cobbzilla.s3s3mirror.KeyLister.s3getFirstBatch(KeyLister.java:96)
... 3 more
main INFO : org.cobbzilla.s3s3mirror.MirrorMaster - mirror: completed
main INFO : org.cobbzilla.s3s3mirror.KeyMaster - stopping CopyMaster...
main INFO : org.cobbzilla.s3s3mirror.KeyMaster - CopyMaster stopped

Thread-1 INFO : org.cobbzilla.s3s3mirror.MirrorStats -

STATS BEGIN
read: 0
copied: 0
copy errors: 0
deleted: 0
delete errors: 0
duration: 0:00:01
read rate: 0.0/minute
copy rate: 0.0/minute
delete rate: 0.0/minute
bytes copied: 0 bytes
GET operations: 5
COPY operations: 0
DELETE operations: 0

STATS END

Other 2 suggestions (logs + errors)

Hello,

It's me again with more suggestions :D
I use s3s3mirror in prod and have to make sure all the data is copied.

Amazon trows errors sometimes and rejects the copy. I estimate that the s3bucket thinks u r DDoSing it and rejects the connection, which is normal, since the rate transfer is "too much".

However, this is prod, so I have to make sure no files are missing, so in my backup script, I run s3cmd after s3s3mirror is ran and I see lots of data missing.

It would be great, if the s3s3mirror output the failed files in a logfile somewhere (that we can define, as with .s3cfg suggestion), it would be easier to sync them later.
It's cool to make the error log optional, since it can kill the disk (IO, eat all the space, etc) and kill the machine running the s3s3mirror.

Also (I'm not sure how easy/hard to do this one), when we use s3cmd and it fails, there's an error message like "permisison denied", "no such bucket", etc.
It would be easier for people who use the tool to have the same kind of output to troubleshoot instead of the angry JABBA error.
Most likely, amazon returns an error code and the tool catches it and trows JAVA error because it can not continue.
Is there somehow a way to print the message amazon trows? It would be such a time saver...

Cheers!

Murg

Support for mirroring to directory

Presently s3s3mirror will only do direct bucket to bucket mirror, appending a path to the bucket name causes s3s3mirror to throw errors such as the following (even if the directory exists).

pool-1-thread-78 INFO : com.amazonaws.http.AmazonHttpClient - Unable to execute HTTP request: bucket-backuptest
java.net.UnknownHostException: iss-alfresco-backuptest
        at java.net.InetAddress.getAllByName0(InetAddress.java:1156)
        at java.net.InetAddress.getAllByName(InetAddress.java:1082)
        at java.net.InetAddress.getAllByName(InetAddress.java:1018)
        at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:242)
        at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:130)
        at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
        at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
        at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:562)
        at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:285)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:2829)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:766)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:746)
        at org.cobbzilla.s3s3mirror.KeyJob.shouldTransfer(KeyJob.java:92)
        at org.cobbzilla.s3s3mirror.KeyJob.run(KeyJob.java:32)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:662)

pool-1-thread-63 INFO : org.cobbzilla.s3s3mirror.KeyJob - Error getting metadata for bucket-backuptest/20130716//contentstore.deleted/-system-/2013/5/30/3/49/55119191-1c73-49eb-a69b-7a665b531383.bin (not copying): com.amazonaws.AmazonClientException: Unable to execute HTTP request: bucket-backuptest

Supporting sub-directories would allow s3s3mirror to be scripted for rolling daily backups (Script would copy latest folder then s3s3mirror would run against it).

Question about long running s3s3mirror task with --ctime option

Hello,

I am going to run s3s3mirror task with --ctime option, let's say --ctime 20d that will be running for a week or so due to huge amount of data that should be copied.
The option --ctime 20d means that s3s3mirror should copy only objects whose Last-Modified date is younger than 20 days. I want to copy objects since May 1. My concern is how s3s3mirror calculate it for current object? If I run the job on May 20, s3s3mirror should copy all objects since May 1, but after one day of execution, let's say on May 21, will s3s3mirror continue to take objects since May 1? Or s3s3mirror calculates it every time for current object and will only copy objects since May 2?

Cross-account copy

Hey there, do you have an example resource policy and command to perform cross account copies that you could add to your readme?

Thanks!

Copy within same bucket, different prefix - patch available

Hi,

This looks very close to what I was looking for... I'm need to copy within a bucket, with just a change of prefix.

i.e. Copy s3://mybucket/July2013 to s3://mybucket/August2013

I've made a patch that does this:
http://david.dw-perspective.org.uk/tmp/changeprefix.patch

Example (I've just realised that that patch does not not patch the documentation - sorry):
sh s3s3mirror.sh -p July2013 -a August2013 mybucket mybucket

Sorry for not knowing how to use GitHub! I also have never touched Java before and still have no idea what Maven is. But thankfully your code is well-enough written that I could work out what to patch by reading the source!

David

Problem with setting proxy

Hi,
In the MirrorMain.java file when setting the proxy, proxy_port will never be set since after setting the proxy_host, options_getHasProxy will become true and it will skip the proxy_port setting.
Here is the code:
else if (!options.getHasProxy() && line.trim().startsWith("proxy_host")) {
options.setProxyHost(line.substring(line.indexOf("=") + 1).trim());
} else if (!options.getHasProxy() && line.trim().startsWith("proxy_port")){
options.setProxyPort(Integer.parseInt(line.substring(line.indexOf("=") + 1).trim()));
}

I modify as follows:
String proxyHost = null;
Integer proxyPort = null;
while ((line = reader.readLine()) != null) {
if (line.trim().startsWith("access_key")) {
options.setAWSAccessKeyId(line.substring(line.indexOf("=") + 1).trim());
} else if (line.trim().startsWith("secret_key")) {
options.setAWSSecretKey(line.substring(line.indexOf("=") + 1).trim());
} else if (!options.getHasProxy() && line.trim().startsWith("proxy_host")) {
proxyHost = line.substring(line.indexOf("=") + 1).trim();
} else if (!options.getHasProxy() && line.trim().startsWith("proxy_port")){
proxyPort = Integer.parseInt(line.substring(line.indexOf("=") + 1).trim());
}
}
if(proxyHost != null && proxyPort != null){
options.setProxyHost(proxyHost);
options.setProxyPort(proxyPort);
}

Best,
Shasha

Add support for SSL, server-side encryption, and storage-class

Would be great to have the following kind of options

./s3s3mirror.sh --ssl --server-side-encryption --storage-class ReducedRedundancy source dest

Doesn't work with GovCloud (but i made it work in my fork)

I had to update the AWS SDK (easy) but then I had to force set the endpoint to s3-us-gov-west-1.amazonaws.com ... which then will make s3s3mirror only work with govcloud. I'm not familiar enough with the java sdk to understand what's a better approach here. any thoughts?

https://github.com/ghostrocket/s3s3mirror

Pull existing code from BucketSyncer back into s3s3mirror

There is code here1 that I will work on integrating back into s3s3mirror.

Specifying a custom .s3cfg instead of setting global variable

Hello,

The tool is great but I have a suggestion.

The tool you have will search the keys in ~/.s3cfg or in system variables.
s3cmd has the -c command to specify the .s3cfg file to use.

If the tool is used with multiple accounts using automation, this is not really practical, since multiple keys must be used from the same server.

This makes launching multiple tasks on different accounts impossible since eighter the variable or the .s3cfg file have to be overwritten...

Cheers,

Murg

logging is filled with exceptions

since there are so many of these stacktrace messages, it is very hard to find the actual status message.

pool-1-thread-40 ERROR: org.cobbzilla.s3s3mirror.KeyCopyJob - unexpected exception copying (try #0) u/b33d6fa7-3564-42d8-9daf-d86e2e9ac6df.png to: u/b33d6fa7-3564-42d8-9daf-d86e2e9ac6df.png: com.amazonaws.AmazonClientException: Unable to execute HTTP request: Connect to media-dev2.trusper.net.s3.amazonaws.com/media-dev2.trusper.net.s3.amazonaws.com/54.240.252.9 timed out
pool-1-thread-15 INFO : com.amazonaws.http.AmazonHttpClient - Unable to execute HTTP request: Connect to media.trusper.net.s3.amazonaws.com/media.trusper.net.s3.amazonaws.com/54.240.252.9 timed out
org.apache.http.conn.ConnectTimeoutException: Connect to media.trusper.net.s3.amazonaws.com/media.trusper.net.s3.amazonaws.com/54.240.252.9 timed out
at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:122)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:562)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:285)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:2829)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:766)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:746)
at org.cobbzilla.s3s3mirror.KeyJob.getObjectMetadata(KeyJob.java:33)
at org.cobbzilla.s3s3mirror.KeyCopyJob.run(KeyCopyJob.java:37)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
pool-1-thread-199 INFO : com.amazonaws.http.AmazonHttpClient - Unable to execute HTTP request: Connect to media.trusper.net.s3.amazonaws.com/media.trusper.net.s3.amazonaws.com/54.240.252.9 timed out
org.apache.http.conn.ConnectTimeoutException: Connect to media.trusper.net.s3.amazonaws.com/media.trusper.net.s3.amazonaws.com/54.240.252.9 timed out
at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:122)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:562)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:285)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:2829)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:766)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:746)
at org.cobbzilla.s3s3mirror.KeyJob.getObjectMetadata(KeyJob.java:33)
at org.cobbzilla.s3s3mirror.KeyCopyJob.run(KeyCopyJob.java:37)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Any way to suppress output?

Is there a flag or any other way to tell s3s3mirror to suppress output when running this as a cron job or other scheduled task?

Some useful exceptions are swallowed and end up looking like success (kind of)

I was getting weird output like:

--------------------------------------------------------------------
STATS BEGIN
read: 0
copied: 0
copy errors: 0
duration: 0:00:01
read rate: 0.0/minute
copy rate: 0.0/minute
STATS END 
--------------------------------------------------------------------

with no exceptions on buckets I knew were full, so I checked out KeyJob.java and noticed the big try/finally in run(). I threw this in there to see what was going on:

} catch(Exception e) {
  log.info("!!!!! Exception: " + e.getMessage());
  e.printStackTrace();
}

and realized that this line:

final AccessControlList objectAcl = client.getObjectAcl(options.getSourceBucket(), key);

was throwing a 403 exception as the IAM account I had setup did not specifically have access to read an object in the source bucket's ACL (though it did have list on the bucket). Even in the verbose output it would just say that everything was going to be copied and then not copy anything. Might be helpful not to swallow the exceptions on those other calls.

Copies are not retried when an error is thrown

I'm running a mirror from a VPC instance against two buckets for the first time (destination bucket is empty). During the job I'm seeing multiple messages such as the following in the log:

pool-1-thread-52 ERROR: org.cobbzilla.s3s3mirror.KeyJob - error copying contentstore/2013/5/7/23/6/2c075128-7d82-4038-a192-ba06bd400d26.bin: Status Code: 200, AWS Service: Amazon S3, AWS Request ID: 3A377AA1B1BC8018, AWS Error Code: InternalError, AWS Error Message: We encountered an internal error. Please try again., S3 Extended Request ID: tJDXPxxS1dQ9DqC3z4IeDtLqflTYecbwsXD0idaEF/+vb7rlQeeV3NxmoNN3Wq6Z
pool-1-thread-52 INFO : org.cobbzilla.s3s3mirror.KeyJob - done with contentstore/2013/5/7/23/6/2c075128-7d82-4038-a192-ba06bd400d26.bin

The first message indicates an error while copying, the second message suggests the file was eventually copied. However when I look in the bucket that file is not present.

The mirror is still running, I'll see if it's copied over with a second run.

Mirror to other clouds

How difficult would it be to modify this to copy from S3 to Google Cloud Storage or Rackspace Files?

S3ToLocalTest failing on branch 2.0-stable

mvn clean package shows:

Results :

Tests in error:
testCopyFromBucketWithPrefixToDestWithOtherPrefix(org.cobbzilla.s3s3mirror.S3ToLocalTest): The bucket name parameter must be specified when uploading an object

Environment:

java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

mvn -version
Apache Maven 3.0.5
Maven home: /usr/share/maven
Java version: 1.7.0_75, vendor: Oracle Corporation
Java home: /home/don/apps/jdk1.7.0_75/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.16.0-43-generic", arch: "amd64", family: "unix"

Nube question - Installation on Amazon Linux

How do I get this to run on Amazon Linux instance. Is there an "install" routine? Sorry to be such a Nube

Does not work between buckets in different regions

Everything works when copying from eu-west-1 to eu-west-1 or US standard to US standard, however, copying from US standard to eu-west-1, I'm getting a permission error (bellow). All source/dest buckets have the same permissions:

pool-1-thread-2 ERROR: org.cobbzilla.s3s3mirror.KeyCopyJob - s3 exception copying (try #0) download1.jpeg to: download1.jpeg: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 29036C2DE77F9CE3), S3 Extended Request ID: QTSW9g8qTN4xItdpVbetLCoK+8E6M/riGpIq9arp/BtgpiFezx3WkhcMYxFJSgJbV8BH1gyEeoE=

Can't copy from bucket TO local

Trying to do something like:

s3s3mirror.sh s3://bucket/folder/ /some/local/path/

Doesn't appear to work. Is there any chance of this becoming a feature?

Less verbose output

It would be nice if there was a less verbose output option, I'm thinking something along the lines of a simple display showing you status, e.g. a ticker 1 / 50000, 2 / 50000 that simply updates (in place) as the files are synced.

This way running over SSH on a remote EC2 box, I can keep an eye on it without it having to push so much info to me remotely.

Strange AWS error - missing Date /x-amz-date header / related to know aws-jdk bug?

Hi Cobbzilla,
thanks for the excellent work & utility. While I had no troubles whatsoever using your utility on physical HW I seem to run into issues with a virtual environment based on OpenStack. When I execute on a HDFS name node to get some S3 files into the local FS before pushing to HDFS I get the following error from AWS:

Thread-2 WARN : org.cobbzilla.s3s3mirror.KeyLister - getFirstBatch: error listing (try #0): com.amazonaws.services.s3.model.AmazonS3Exception: AWS authentication requires a valid Date or x-amz-date header (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 1F67DD15FB259BC8), S3 Extended Request ID: Qpbz4ietwNP+4e/7wj0MitUI5Qn6yRKU/fIYGTxeXMONdcLvRSUZRfS7+e8Ys13lQEoQ2574Hmc=

Might be related to a known aws-jdk-java bug discussed and resolved here - have you ever come across this issue and know sensible way to prevent it?

Thx
-Tom

Feature request: Exclude files

Are they any plans to add inclusion/exlusion of files

so that we can only sync files that either match or dont match a supplied regex?

Make MirrorStats outputs configurable

it seems all outputs are going to stderr including the MirrorStats?

cobbzilla / s3s3mirror Goto Github PK

s3s3mirror's Introduction

s3s3mirror

Motivation

AWS Credentials

System Requirements

Building

License

Usage

Versions

Options

Examples

If you've enjoyed using s3s3mirror and are looking for a warm-fuzzy feeling, consider dropping a little somethin' into my tip jar

s3s3mirror's People

Contributors

Stargazers

Watchers

Forkers

s3s3mirror's Issues

Thread-1 INFO : org.cobbzilla.s3s3mirror.MirrorStats -

STATS END

Recommend Projects

Recommend Topics

Recommend Org