andrewrk / node-s3-client Goto Github PK

View Code? Open in Web Editor NEW

1.0K 27.0 304.0 248 KB

high level amazon s3 client for node.js

License: MIT License

JavaScript 100.00%

node-s3-client's Introduction

High Level Amazon S3 Client

Installation

npm install s3 --save

Features

Automatically retry a configurable number of times when S3 returns an error.
Includes logic to make multiple requests when there is a 1000 object limit.
Ability to set a limit on the maximum parallelization of S3 requests. Retries get pushed to the end of the parallelization queue.
Ability to sync a dir to and from S3.
Progress reporting.
Supports files of any size (up to S3's maximum 5 TB object size limit).
Uploads large files quickly using parallel multipart uploads.
Uses heuristics to compute multipart ETags client-side to avoid uploading or downloading files unnecessarily.
Automatically provide Content-Type for uploads based on file extension.
Support third-party S3-compatible platform services like Ceph

See also the companion CLI tool which is meant to be a drop-in replacement for s3cmd: s3-cli.

Synopsis

Create a client

var s3 = require('s3');

var client = s3.createClient({
  maxAsyncS3: 20,     // this is the default
  s3RetryCount: 3,    // this is the default
  s3RetryDelay: 1000, // this is the default
  multipartUploadThreshold: 20971520, // this is the default (20 MB)
  multipartUploadSize: 15728640, // this is the default (15 MB)
  s3Options: {
    accessKeyId: "your s3 key",
    secretAccessKey: "your s3 secret",
    region: "your region",
    // endpoint: 's3.yourdomain.com',
    // sslEnabled: false
    // any other options are passed to new AWS.S3()
    // See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#constructor-property
  },
});

Create a client from existing AWS.S3 object

var s3 = require('s3');
var awsS3Client = new AWS.S3(s3Options);
var options = {
  s3Client: awsS3Client,
  // more options available. See API docs below.
};
var client = s3.createClient(options);

Upload a file to S3

var params = {
  localFile: "some/local/file",

  s3Params: {
    Bucket: "s3 bucket name",
    Key: "some/remote/file",
    // other options supported by putObject, except Body and ContentLength.
    // See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
  },
};
var uploader = client.uploadFile(params);
uploader.on('error', function(err) {
  console.error("unable to upload:", err.stack);
});
uploader.on('progress', function() {
  console.log("progress", uploader.progressMd5Amount,
            uploader.progressAmount, uploader.progressTotal);
});
uploader.on('end', function() {
  console.log("done uploading");
});

Download a file from S3

var params = {
  localFile: "some/local/file",

  s3Params: {
    Bucket: "s3 bucket name",
    Key: "some/remote/file",
    // other options supported by getObject
    // See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property
  },
};
var downloader = client.downloadFile(params);
downloader.on('error', function(err) {
  console.error("unable to download:", err.stack);
});
downloader.on('progress', function() {
  console.log("progress", downloader.progressAmount, downloader.progressTotal);
});
downloader.on('end', function() {
  console.log("done downloading");
});

Sync a directory to S3

var params = {
  localDir: "some/local/dir",
  deleteRemoved: true, // default false, whether to remove s3 objects
                       // that have no corresponding local file.

  s3Params: {
    Bucket: "s3 bucket name",
    Prefix: "some/remote/dir/",
    // other options supported by putObject, except Body and ContentLength.
    // See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
  },
};
var uploader = client.uploadDir(params);
uploader.on('error', function(err) {
  console.error("unable to sync:", err.stack);
});
uploader.on('progress', function() {
  console.log("progress", uploader.progressAmount, uploader.progressTotal);
});
uploader.on('end', function() {
  console.log("done uploading");
});

Tips

Consider increasing the socket pool size in the http and https global agents. This will improve bandwidth when using uploadDir and downloadDir functions. For example:
```
http.globalAgent.maxSockets = https.globalAgent.maxSockets = 20;
```

API Documentation

s3.AWS

This contains a reference to the aws-sdk module. It is a valid use case to use both this module and the lower level aws-sdk module in tandem.

s3.createClient(options)

Creates an S3 client.

options:

s3Client - optional, an instance of AWS.S3. Leave blank if you provide s3Options.
s3Options - optional. leave blank if you provide s3Client.
- See AWS SDK documentation for available options which are passed to new AWS.S3(): http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#constructor-property
maxAsyncS3 - maximum number of simultaneous requests this client will ever have open to S3. defaults to 20.
s3RetryCount - how many times to try an S3 operation before giving up. Default 3.
s3RetryDelay - how many milliseconds to wait before retrying an S3 operation. Default 1000.
multipartUploadThreshold - if a file is this many bytes or greater, it will be uploaded via a multipart request. Default is 20MB. Minimum is 5MB. Maximum is 5GB.
multipartUploadSize - when uploading via multipart, this is the part size. The minimum size is 5MB. The maximum size is 5GB. Default is 15MB. Note that S3 has a maximum of 10000 parts for a multipart upload, so if this value is too small, it will be ignored in favor of the minimum necessary value required to upload the file.

s3.getPublicUrl(bucket, key, [bucketLocation])

bucket S3 bucket
key S3 key
bucketLocation string, one of these:
- "" (default) - US Standard
- "eu-west-1"
- "us-west-1"
- "us-west-2"
- "ap-southeast-1"
- "ap-southeast-2"
- "ap-northeast-1"
- "sa-east-1"

You can find out your bucket location programatically by using this API: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getBucketLocation-property

returns a string which looks like this:

https://s3.amazonaws.com/bucket/key

or maybe this if you are not in US Standard:

https://s3-eu-west-1.amazonaws.com/bucket/key

s3.getPublicUrlHttp(bucket, key)

bucket S3 Bucket
key S3 Key

Works for any region, and returns a string which looks like this:

http://bucket.s3.amazonaws.com/key

client.uploadFile(params)

See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property

params:

s3Params: params to pass to AWS SDK putObject.
localFile: path to the file on disk you want to upload to S3.
(optional) defaultContentType: Unless you explicitly set the ContentType parameter in s3Params, it will be automatically set for you based on the file extension of localFile. If the extension is unrecognized, defaultContentType will be used instead. Defaults to application/octet-stream.

The difference between using AWS SDK putObject and this one:

This works with files, not streams or buffers.
If the reported MD5 upon upload completion does not match, it retries.
If the file size is large enough, uses multipart upload to upload parts in parallel.
Retry based on the client's retry settings.
Progress reporting.
Sets the ContentType based on file extension if you do not provide it.

Returns an EventEmitter with these properties:

progressMd5Amount
progressAmount
progressTotal

And these events:

'error' (err)
'end' (data) - emitted when the file is uploaded successfully
- data is the same object that you get from putObject in AWS SDK
'progress' - emitted when progressMd5Amount, progressAmount, and progressTotal properties change. Note that it is possible for progress to go backwards when an upload fails and must be retried.
'fileOpened' (fdSlicer) - emitted when localFile has been opened. The file is opened with the fd-slicer module because we might need to read from multiple locations in the file at the same time. fdSlicer is an object for which you can call createReadStream(options). See the fd-slicer README for more information.
'fileClosed' - emitted when localFile has been closed.

And these methods:

abort() - call this to stop the find operation.

client.downloadFile(params)

See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property

params:

localFile - the destination path on disk to write the s3 object into
s3Params: params to pass to AWS SDK getObject.

The difference between using AWS SDK getObject and this one:

This works with a destination file, not a stream or a buffer.
If the reported MD5 upon download completion does not match, it retries.
Retry based on the client's retry settings.
Progress reporting.

Returns an EventEmitter with these properties:

progressAmount
progressTotal

And these events:

'error' (err)
'end' - emitted when the file is downloaded successfully
'progress' - emitted when progressAmount and progressTotal properties change.

client.downloadBuffer(s3Params)

http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property

s3Params: params to pass to AWS SDK getObject.

The difference between using AWS SDK getObject and this one:

This works with a buffer only.
If the reported MD5 upon download completion does not match, it retries.
Retry based on the client's retry settings.
Progress reporting.

Returns an EventEmitter with these properties:

progressAmount
progressTotal

And these events:

'error' (err)
'end' (buffer) - emitted when the file is downloaded successfully. buffer is a Buffer containing the object data.
'progress' - emitted when progressAmount and progressTotal properties change.

client.downloadStream(s3Params)

http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property

s3Params: params to pass to AWS SDK getObject.

The difference between using AWS SDK getObject and this one:

This works with a stream only.

If you want retries, progress, or MD5 checking, you must code it yourself.

Returns a ReadableStream with these additional events:

'httpHeaders' (statusCode, headers) - contains the HTTP response headers and status code.

client.listObjects(params)

See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjects-property

params:

s3Params - params to pass to AWS SDK listObjects.
(optional) recursive - true or false whether or not you want to recurse into directories. Default false.

Note that if you set Delimiter in s3Params then you will get a list of objects and folders in the directory you specify. You probably do not want to set recursive to true at the same time as specifying a Delimiter because this will cause a request per directory. If you want all objects that share a prefix, leave the Delimiter option null or undefined.

Be sure that s3Params.Prefix ends with a trailing slash (/) unless you are requesting the top-level listing, in which case s3Params.Prefix should be empty string.

The difference between using AWS SDK listObjects and this one:

Retries based on the client's retry settings.
Supports recursive directory listing.
Makes multiple requests if the number of objects to list is greater than 1000.

Returns an EventEmitter with these properties:

progressAmount
objectsFound
dirsFound

And these events:

'error' (err)
'end' - emitted when done listing and no more 'data' events will be emitted.
'data' (data) - emitted when a batch of objects are found. This is the same as the data object in AWS SDK.
'progress' - emitted when progressAmount, objectsFound, and dirsFound properties change.

And these methods:

abort() - call this to stop the find operation.

client.deleteObjects(s3Params)

See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#deleteObjects-property

s3Params are the same.

The difference between using AWS SDK deleteObjects and this one:

Retry based on the client's retry settings.
Make multiple requests if the number of objects you want to delete is greater than 1000.

Returns an EventEmitter with these properties:

progressAmount
progressTotal

And these events:

'error' (err)
'end' - emitted when all objects are deleted.
'progress' - emitted when the progressAmount or progressTotal properties change.
'data' (data) - emitted when a request completes. There may be more.

client.uploadDir(params)

Syncs an entire directory to S3.

params:

localDir - source path on local file system to sync to S3
s3Params
- Prefix (required)
- Bucket (required)
(optional) deleteRemoved - delete s3 objects with no corresponding local file. default false
(optional) getS3Params - function which will be called for every file that needs to be uploaded. You can use this to skip some files. See below.
(optional) defaultContentType: Unless you explicitly set the ContentType parameter in s3Params, it will be automatically set for you based on the file extension of localFile. If the extension is unrecognized, defaultContentType will be used instead. Defaults to application/octet-stream.
(optional) followSymlinks - Set this to false to ignore symlinks. Defaults to true.

function getS3Params(localFile, stat, callback) {
  // call callback like this:
  var err = new Error(...); // only if there is an error
  var s3Params = { // if there is no error
    ContentType: getMimeType(localFile), // just an example
  };
  // pass `null` for `s3Params` if you want to skip uploading this file.
  callback(err, s3Params);
}

Returns an EventEmitter with these properties:

progressAmount
progressTotal
progressMd5Amount
progressMd5Total
deleteAmount
deleteTotal
filesFound
objectsFound
doneFindingFiles
doneFindingObjects
doneMd5

And these events:

'error' (err)
'end' - emitted when all files are uploaded
'progress' - emitted when any of the above progress properties change.
'fileUploadStart' (localFilePath, s3Key) - emitted when a file begins uploading.
'fileUploadEnd' (localFilePath, s3Key) - emitted when a file successfully finishes uploading.

uploadDir works like this:

Start listing all S3 objects for the target Prefix. S3 guarantees returned objects to be in sorted order.
Meanwhile, recursively find all files in localDir.
Once all local files are found, we sort them (the same way that S3 sorts).
Next we iterate over the sorted local file list one at a time, computing MD5 sums.
Now S3 object listing and MD5 sum computing are happening in parallel. As each operation progresses we compare both sorted lists side-by-side, iterating over them one at a time, uploading files whose MD5 sums don't match the remote object (or the remote object is missing), and, if deleteRemoved is set, deleting remote objects whose corresponding local files are missing.

client.downloadDir(params)

Syncs an entire directory from S3.

params:

localDir - destination directory on local file system to sync to
s3Params
- Prefix (required)
- Bucket (required)
(optional) deleteRemoved - delete local files with no corresponding s3 object. default false
(optional) getS3Params - function which will be called for every object that needs to be downloaded. You can use this to skip downloading some objects. See below.
(optional) followSymlinks - Set this to false to ignore symlinks. Defaults to true.

function getS3Params(localFile, s3Object, callback) {
  // localFile is the destination path where the object will be written to
  // s3Object is same as one element in the `Contents` array from here:
  // http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjects-property

  // call callback like this:
  var err = new Error(...); // only if there is an error
  var s3Params = { // if there is no error
    VersionId: "abcd", // just an example
  };
  // pass `null` for `s3Params` if you want to skip downloading this object.
  callback(err, s3Params);
}

Returns an EventEmitter with these properties:

progressAmount
progressTotal
progressMd5Amount
progressMd5Total
deleteAmount
deleteTotal
filesFound
objectsFound
doneFindingFiles
doneFindingObjects
doneMd5

And these events:

'error' (err)
'end' - emitted when all files are downloaded
'progress' - emitted when any of the progress properties above change
'fileDownloadStart' (localFilePath, s3Key) - emitted when a file begins downloading.
'fileDownloadEnd' (localFilePath, s3Key) - emitted when a file successfully finishes downloading.

downloadDir works like this:

Start listing all S3 objects for the target Prefix. S3 guarantees returned objects to be in sorted order.
Meanwhile, recursively find all files in localDir.
Once all local files are found, we sort them (the same way that S3 sorts).
Next we iterate over the sorted local file list one at a time, computing MD5 sums.
Now S3 object listing and MD5 sum computing are happening in parallel. As each operation progresses we compare both sorted lists side-by-side, iterating over them one at a time, downloading objects whose MD5 sums don't match the local file (or the local file is missing), and, if deleteRemoved is set, deleting local files whose corresponding objects are missing.

client.deleteDir(s3Params)

Deletes an entire directory on S3.

s3Params:

Bucket
Prefix
(optional) MFA

Returns an EventEmitter with these properties:

progressAmount
progressTotal

And these events:

'error' (err)
'end' - emitted when all objects are deleted.
'progress' - emitted when the progressAmount or progressTotal properties change.

deleteDir works like this:

Start listing all objects in a bucket recursively. S3 returns 1000 objects per response.
For each response that comes back with a list of objects in the bucket, immediately send a delete request for all of them.

client.copyObject(s3Params)

See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#copyObject-property

s3Params are the same. Don't forget that CopySource must contain the source bucket name as well as the source key name.

The difference between using AWS SDK copyObject and this one:

Retry based on the client's retry settings.

Returns an EventEmitter with these events:

'error' (err)
'end' (data)

client.moveObject(s3Params)

See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#copyObject-property

s3Params are the same. Don't forget that CopySource must contain the source bucket name as well as the source key name.

Under the hood, this uses copyObject and then deleteObjects only if the copy succeeded.

Returns an EventEmitter with these events:

'error' (err)
'copySuccess' (data)
'end' (data)

Examples

Check if a file exists in S3

Using the AWS SDK, you can send a HEAD request, which will tell you if a file exists at Key.

See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#headObject-property

var client = require('s3').createClient({ /* options */ });
client.s3.headObject({
  Bucket: 's3 bucket name',
  Key: 'some/remote/file'
}, function(err, data) {
  if (err) {
    // file does not exist (err.statusCode == 404)
    return;
  }
  // file exists
});

Testing

S3_KEY=<valid_s3_key> S3_SECRET=<valid_s3_secret> S3_BUCKET=<valid_s3_bucket> npm test

Tests upload and download large amounts of data to and from S3. The test timeout is set to 40 seconds because Internet connectivity waries wildly.

node-s3-client's People

Contributors

Stargazers

Watchers

Forkers

egtann jvtrigueros kevinkchan kawanet levyitay allen-dolph jinsu35 zahanm mehtryx philippsimon xervoio nickeblewis divshot sothree maxmetagravity alouisos pskupinski max-b sethdford wilsonwc brzpegasus playkot-dev bheller84 kacole2 rgb4268 mgreschke messified vitali-korezki gcallaghan rickevry hypno2000 tobias-khs pillowsoft sachin-gill shayhz rajeshshegokar111 media-alex afloyd ompanda gswalden magazinechannel poyhsiao klique jlone trufa-me alexgerv cperryk samjaninf erikjjohnson amitanjani benjamine trufae gokhanalmas fr- pradeeplogme pascalschoeni welll eastlondoner coursepark crocodile-clips maorshabo spookd gemmi-arts dcflow krazylearner ebuzzing richardcnh favorite-projects jiinus aerobatic saarze xizhao barthlund thenitai kalanaperera shogochiai splagemann money-media adam14four spingo jrhunt42 goodreferences nitingupta89 briandennis kjprice awolden jutoapp recrsn mediamath pragyanatvade deoxxa faceleg tylerevans lukmanralali maki10 hnrami rajesh-kanna vigi16 nehap12 shuky19

node-s3-client's Issues

TypeError: Cannot read property 'flowing' of undefined

_stream_readable.js:655
if (ev === 'data' && !this._readableState.flowing)
^
TypeError: Cannot read property 'flowing' of undefined
at Hash.Readable.on (_stream_readable.js:655:44)
at computeOne (/myproject/node_modules/s3/index.js:886:12)
at startComputingMd5Sums (/myproject/node_modules/s3/index.js:866:5)
at /myproject/node_modules/s3/index.js:860:7
at EventEmitter. (/myproject/node_modules/s3/index.js:939:7)
at EventEmitter.emit (events.js:92:17)
at finish (/myproject/node_modules/s3/node_modules/findit/index.js:107:17)
at check (/myproject/node_modules/s3/node_modules/findit/index.js:103:37)
at onstat (/myproject/node_modules/s3/node_modules/findit/index.js:184:13)
at /myproject/node_modules/s3/node_modules/findit/index.js:133:22

New Authentication Signature?

Hey there,

I'm really liking this extension so far. Great job! 👍

However, I'm currently trying to use your module with the new S3 Location, Frankfurt which apparently only supports signatureVersion AWS4-HMAC-SHA256. Is this extension able to use this signature version?

I can't really figure it out :(

Cheers!

use new and improved api that aws sdk provides

aws/aws-sdk-js#330 (comment)

ability to avoid copy of hidden files

Could we get an option to avoid to copy files starting with '.' when uploading folder ?
thanks.

Ability to get the filenames being uploaded

This is a great module. Would it be possible to publish the names of the files which were synced with the bucket using uploadDir?

I notice the the uploadFile 'end' event is passed a data argument which is the putObject output. Could we pass an array of putObject objects to the 'end' event of uploadDir?

RequestTimeTooSkewed error

@andrewrk,

I have been using this lib extensively in my project as well as works. I experience something that I can always reproduce everytime. Here the case:

Let say I have a file: test.tar.gz, which is about 190MB
I need to upload this file to 3 AWS buckets which have different regions. For example:
bucket_an1/test.tar.gz
bucket_ew1/test.tar.gz
bucket_ue1/test.tar.gz

So I use async.parallel to call this S3 lib to upload for me. However, when I have slow connection, I always encounter this error:

Error: {"message":"The difference between the request time and the current time is too large.","code":"RequestTimeTooSkewed","time":"2014-11-03T06:12:08.476Z","statusCode":403,"retryable":false}RequestTimeTooSkewed: The difference between the request time and the current time is too large.

I never could reproduce this error when I'm at work which have much faster connection. I'm wondering if you have any thought what seem to be the problem. I googled and people said the clock is the issue but I did confirm my machine clock has been updated.

For your info, this is my wrapper that use this S3 lib:

'use strict';

var async = require('async');
var dotty = require('dotty');
var fs = require('fs');
var path = require('path');
var s3 = require('s3');
var _ = require('underscore');
var _s = require('underscore.string');

/**********************************************************************/
// PRIVATES - CONSTANTS
/**********************************************************************/

var MAX_RETRY_COUNT = 5;
var MAX_SIMULTANEOUS_REQUESTS = 20;
var PART_UPLOAD_SIZE_IN_MB = 15;
var PART_UPLOAD_THRESHOLD_SIZE_IN_MB = 20;
var RETRY_DELAY_IN_SECOND = 5;

/**********************************************************************/
// PRIVATES - UTILITIES
/**********************************************************************/

function uploadFilePerRegion(options, parameters, sourceFilePath, destinationFileName, progressCallback) {
  return function (callback) {
    var client = s3.createClient({
      maxAsyncS3: MAX_SIMULTANEOUS_REQUESTS,
      multipartUploadSize: PART_UPLOAD_SIZE_IN_MB * 1024 * 1024,
      multipartUploadThreshold: PART_UPLOAD_THRESHOLD_SIZE_IN_MB * 1024 * 1024,
      s3Options: options,
      s3RetryCount: MAX_RETRY_COUNT,
      s3RetryDelay: RETRY_DELAY_IN_SECOND * 1000
    });

    var source = {
      localFile: sourceFilePath,
      s3Params: parameters
    };

    var key = dotty.get(parameters, 'Key');
    key = (key && !_s.isBlank(key)) ? key + '/' : '';
    key += (destinationFileName && !_s.isBlank(destinationFileName)) ? destinationFileName : path.basename(sourceFilePath);

    dotty.put(source, 's3Params.Key', key);

    var uploader = client.uploadFile(source);

    uploader.on('error', callback);

    if (_.isFunction(progressCallback)) {
      uploader.on('progress', function () {
        progressCallback({
          amount: uploader.progressAmount,
          total: uploader.progressTotal
        });
      });
    }

    uploader.on('end', function (data) {
      callback(null, data);
    });
  };
}

/**********************************************************************/
// EXPORTS
/**********************************************************************/

module.exports.uploadFile = function (credentials, parameters, sourceFilePath, destinationFileName, callback, progressCallback) {
  fs.lstat(sourceFilePath, function (err, stats) {
    if (!err && stats.isFile()) {
      var tasks = [];

      _.each(_.keys(credentials), function (region) {
        tasks.push(uploadFilePerRegion(_.extend(_.clone(credentials[region]), {
          region: region
        }), parameters[region], sourceFilePath, destinationFileName, progressCallback));
      });

      async.parallel(tasks, callback);
    } else {
      callback(new Error('Undefined Upload File'));
    }
  });
};

investigate whether we need to handle non-200 statuses from S3 on download

After we get httpHeaders we only handle successful HTTP statuses. Look into handling other statuses.

support working with files uploaded via multipart

Is the explicit lack of support for handling files uploaded via multipart just because of the wonkiness of S3's eTags for such files? If so couldn't you accept an option to ignore the eTags? I am just ignoring the error event on client#downloadFile and everything works fine. I just trigger my end handler on error. It works, but I miss out on gracefully handling any other sort of error.

Support Buffer Input

Are there any plans to support a buffer as data input rather than a file path? I'm utilizing node-s3-client in an API middleware, where I'm passing data from a client directly to S3 and writing to disk for the upload is unnecessary and expensive. Thanks.

Bug when uploading without 'Prefix'

When uploading with empty( === "") "Prefix" parameter, there are an extra slash on the top of remote path, which results the file in the remote directory without name.

Maybe the code can be changed this way:

function ensureSep(dir) {
if(dir === '') return dir; // <----- New line
return (dir[dir.length - 1] === '/') ? dir : (dir + '/');
}

ability to handle symlinks in uploadDir

Currently symlinks are ignored. Should be an option to follow symlinks.

Tests needed:

make sure the file path comes out right
if it's a directory symlink make sure the dir makes it into the dir list
test a dir with circular symlinks

`deleteRemoved` option with `downloadDir` causes some local directories to disappear entirely

I'm trying to download the content of an entire directory from an S3 bucket. The directory contains two subdirectories, which themselves contain a bunch of files. It essentially looks something like this:

+ some-bucket-name
  |_ test/
      |_ foo/
      |_ bar/

The first time that I call downloadDir() with the deleteRemoved flag set to true, both subdirectories (foo and bar in this example) download to a local directory as expected:

target-dir/
|_ foo/
|_ bar/

However, on each subsequent downloadDir() call, something weird happens: the local target-dir ends up with only one directory left. Sometimes, it's foo, and sometimes, it's bar.

The code for the download is pretty straightforward:

var downloader = client.downloadDir({
  localDir: "target-dir",
  s3Params: {
    Bucket: "some-bucket-name",
    Prefix: "test/"
  },
  deleteRemoved: true
});

I haven't been able to dig into this project's code yet, but is there something that might immediately come to mind?

Stream file directly to client

Is there a way to stream file directly to client without saving it in the server? I have also posted a question on this issue here: http://stackoverflow.com/questions/26312294/streaming-files-directly-to-client-from-amazon-s3-node-js

I have pasted the question below:

I am using sails.js and am trying to stream files from the Amazon s3 server directly to the client.

To connect to S3, I use the s3 Module : https://www.npmjs.org/package/s3 This module provides capabilities like client.downloadFile(params) and client.downloadBuffer(s3Params).

My current code looks like the following:

var view = client.downloadBuffer(params);
view.on('error', function(err) {
    cb({success: 0, message: 'Could not open file.'}, null);
});
view.on('end', function(buffer) {
    cb(null, buffer);
});

I catch this buffer in a controller using:

User.showImage( params , function (err, buffer){
    // this is where I can get the buffer
});

Is it possible to stream this data as an image file (using buffer.pipe(res) doesn't work of course). But is there something similar to completely avoid saving file to server disk first?

The other option client.downloadFile(params) requires a local path (i.e. a server path in our case)

event end is being called before finish writing to stream (in download function).

We have encountered an issue where the file hasn't finish to be written and the end event had been called.

downloadDir: download to temp filename and rename into place

When downloading S3 files to local files, it's a good idea to download to a temp filename (eg prefix with a dot, add randomly generated chars to end), then rename(2) into place once the download is complete. This is the default rsync behavior unless you turn it off with rsync --inplace. Seeing partial files in their final resting place is rarely a good thing.

Add an inplace option which would change the behavior to in place.

File descriptors not getting closed for non-multipart uploads

File descriptors don't seem to get closed, and eventually on large upload jobs (1500 or so files) the OS runs out of file descriptors causing the following error:

EMFILE, open '<FILEPATH>'

This seems to only happen for files less than the MIN_MULTIPART_SIZE an is probably the root cause behind #50

Will be submitting a pull request shortly which resolves the issue

ability to provide additional size guesses to MultipartETag

This affects uploadFile, downloadFile, uploadDir, downloadDir (anywhere that MultipartETag is used.)

uploadDir uploader emits `end`-event too early

I just ran into this issue. When uploading a single image file or a file with no content the uploadDir uploader emits the end-event before all uploads are finished.

I wrapped the uploadDir method into a promise and the promise now resolves before all uploads are done. Any ideas why this is happening?

uploadStream

Here's how this could work:

Start by streaming to a temp file.
Once partSize (default 5MB) is reached, if the stream is complete, do a putObject. Otherwise start a multipart upload.
Use fdSlicer.createReadStream({start: x, end: y}) on the still-open temp file for the upload. This gives us low memory usage.
If the upload completes successfully to S3, then we remove the temp file and celebrate.
If S3 fails, then we retry the upload using the temp file without ever having closed it. The retry will know the Content-Length. If all the retries fail or we succeed then we remove the temp file.

Optimization: If Content-Length is provided, we can start streaming to S3 immediately, without waiting to hit partSize.

s3 keys

Hi, am I missing something with this setup?

var client = s3.createClient({
  maxAsyncS3: Infinity,
  s3RetryCount: 3,
  s3RetryDelay: 1000,
  s3Options: {
    accessKeyId: credentials.accessKeyId,
    secretAccessKey: credentials.secretAccessKey,
  },
});

I keep getting

Error: aws "key" required
    at new Client (/home/gnimmelf/workspace/nodejs/fireshop#4/node_modules/s3/node_modules/knox/lib/client.js:197:27)
    at Function.exports.createClient (/home/gnimmelf/workspace/nodejs/fireshop#4/node_modules/s3/node_modules/knox/lib/client.js:927:10)
    at Object.exports.createClient (/home/gnimmelf/workspace/nodejs/fireshop#4/node_modules/s3/index.js:7:22)

The keys are set correctly on my side in this part:

s3Options: {
    accessKeyId: credentials.accessKeyId,
    secretAccessKey: credentials.secretAccessKey,
  }

uploadFile should compute MD5 and send bytes at the same time

Currently this is how uploadFile works:

stat and md5 sum the local file
once both are complete, use the stat.size to populate progressTotal and emit an error if the file is too big
open the local file again and send the stream to s3, using stat.size for content length
if there is an error, or if the etag does not match the md5 sum, try again

Here is how it should work instead:

stat the local file, use the stat.size to populate progressTotal and emit an error if the file is too big
open the file and pipe to both md5sum and sending the stream to s3, using stat.size for content length
if there is an error, or if the etag does not match the md5 sum, try again

This does 2 things:

avoids waiting until the md5 is done being calculated before uploading
fixes a failure condition when a file is changed between the time that the md5 sum was calculated and the time that it is streamed to S3.

I'm not getting the url back after successful uploads .

Has anyone else ran into this ?

I'm basically building off the example code

uploader.on('end', function(url) {
console.log("file available at", url);

res.send(url) ;

}) ;

fromAwsSdkS3 ?

In the readme it mentions a factory method called fromAwsSdkS3, which doesn't seem to exist any longer.

Node version

Noob question: am i right in thinking this module requires node 0.10.20 (because of the stream-counter dependency)?

Still got the Error : 'ETag does not match MD5 checksum'

download single file using S3, the file can not download completely, like: 53kb/60 kb
the file download from was uploaded by s3-streaming module, not uploaded by s3.

I've checked this aws/aws-sdk-js#312 and also the aws-sdk was v 2.0.8, so i wonder why I got this problem.

uploadDir delete files on S3

uploadDir with deleteRemoved:true - after sync local file with same S3 file, but with difrerent content, file on S3 was removed. I found out it's because localFileCursor and s3ObjectCursor in function checkDoMoreWork have diffrerent values.

Add incrementation of s3ObjectCursor in uploadLocalFile function solved my problem, but I would like to ask if it didn't broke any other part of code.

thanks

Success when deleting a non-existent object

When calling client.deleteObjects with object keys that do not exist, the execution proceeds normally, as if those objects existed and were actually deleted. Is this by design? I would expect an error event is emitted.

I understand that the final effect is the same: objects with those keys are not present in the bucket. But, if it's possible, some kind of error, or at least a console warning, would be quite helpful.

Could you at least point this out in the documentation? I was not able to find a notice about non-existent objects removal in the AWS JavaScript SDK documentation, so I thought that yours could be the right place to get at it, but alas, no luck.

Add putObject method?

http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property

I'm willing to write it for you, keeping to your style, etc. Is it something you're even interested in having?

nextTick() ?

I'm sorry for my noob question. But I do wonder if it's absolutely sure that I can catch all the events emitted by the uploader-object returned by uploadFile()?

The example code attaches the event handlers after the fact that the upload has started. Is this where a nextTick() would be needed in order to guarantee that my event handlers gets connected before events are emitted? Or is this kind of implicit because of the call to fs.open inside of openFile()?

resp.req could be undefined

resp.req seems it'd be undefined in some cases.
In my case, I've added "region" option before facing the error.
I'm not sure that this causes it, however.

/path/to/node_modules/s3/index.js:29
      uploader.emit('end', resp.req.url);
                                   ^
TypeError: Cannot read property 'url' of undefined
    at /path/to/node_modules/s3/index.js:29:36
    at ClientRequest.onResponse (/path/to/node_modules/s3/node_modules/knox/lib/client.js:60:7)
    at ClientRequest.EventEmitter.emit (events.js:96:17)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (http.js:1582:7)
    at HTTPParser.parserOnHeadersComplete [as onHeadersComplete] (http.js:111:23)
    at CleartextStream.socketOnData [as ondata] (http.js:1485:20)
    at CleartextStream.CryptoStream._push (tls.js:544:27)
    at SecurePair.cycle (tls.js:898:20)
    at EncryptedStream.CryptoStream.write (tls.js:285:13)
    at Socket.ondata (stream.js:38:26)

Download to buffer

It would be great if we could download files directly to buffers instead of first hitting the file system. I seldom need to store the files in the local file system since they are already persisted in S3.

knox isn't the backend

just started reading through the code & noticed that knox isn't the backend, even though the description on github says it is... that should be updated

uploadDir should start uploading before all md5s are computed

it really should only have to wait until the list objects is done

Stop download/Upload externally

Is there any function by which I can stop downloading/uploading a file externally/abruptly , i.e ability to abort the operation.

error: callback called twice

Was uploading a rather large directory and got this:

/home/andy/dev/node-s3-cli/node_modules/s3/node_modules/aws-sdk/lib/request.js:34
        throw e;
              ^
Error: callback called twice
    at onCb (/home/andy/dev/node-s3-cli/node_modules/s3/node_modules/pend/index.js:32:23)
    at Response.<anonymous> (/home/andy/dev/node-s3-cli/node_modules/s3/index.js:196:9)
    at Request.<anonymous> (/home/andy/dev/node-s3-cli/node_modules/s3/node_modules/aws-sdk/lib/request.js:347:20)
    at Request.callListeners (/home/andy/dev/node-s3-cli/node_modules/s3/node_modules/aws-sdk/lib/sequential_executor.js:114:20)
    at Request.emit (/home/andy/dev/node-s3-cli/node_modules/s3/node_modules/aws-sdk/lib/sequential_executor.js:81:10)
    at Request.emit (/home/andy/dev/node-s3-cli/node_modules/s3/node_modules/aws-sdk/lib/request.js:578:14)
    at Request.transition (/home/andy/dev/node-s3-cli/node_modules/s3/node_modules/aws-sdk/lib/request.js:12:12)
    at AcceptorStateMachine.runTo (/home/andy/dev/node-s3-cli/node_modules/s3/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /home/andy/dev/node-s3-cli/node_modules/s3/node_modules/aws-sdk/lib/state_machine.js:28:10
    at Request.<anonymous> (/home/andy/dev/node-s3-cli/node_modules/s3/node_modules/aws-sdk/lib/request.js:28:9)

Looks like AWS SDK is calling a callback twice that we don't expect :-(

If I can verify that fact, I suppose I will write logic to work around that.

ability to abort operations

Probably going to need a proper queue rather than using pend. Or maybe pend can be extended to support clear() and failFast.

computing MD5 sums for a large dir is really slow

I guess we should do some kind of pooling on that.

Provide url to uploaded file

Hi @andrewrk

Can you, please, return your previous very useful api feature: url in the end event. I guess it can look something like that:

var uploader = client.uploadFile(params);
uploader.on('end', function(data) {
  console.log("done uploading. File available at " + data.url);
});

ETXTBSY error when trying to unlink uploaded file

After uploading a file, I use the uploader's "end" event to unlink the local file. I get an ETXTBSY error on the unlink, no matter how long I wait. It would appear that the file is being held open by the uploader. I'm probably doing something wrong, so I apologize for my lack of understanding in advance.

Update Knox to 0.8.3

Please update Knox to 0.8.3 due to fix mentioned in Automattic/knox#76

Provide url to uploaded file - region issue

Hi @andrewrk,

there seems to be an issue with getPublicUrl() for buckets located in Europe.
I've read #14 but the Url I get back gives me an Error: PermanentRedirect.
Endpoint should be: BUCKETNAME.s3.amazonaws.com/KEY

I think something like this should work with all bucket-locations.

var parts = {
  protocol: insecure ? 'http:' : 'https:',
  hostname: params.s3Params.Bucket + '.s3.amazonaws.com',
  pathname: '/' + s3.encodeSpecialCharacters(ensureLeadingSlash(params.s3Params.Key))
};

Directory separator in Windows

When executing uploadDir() on Windows, the path separator is fixed in '' so the path also contains '' instead of '/'. I guess it has to be changed to '/' when the files are actually uploaded.

Error: Hostname/IP doesn't match certificate's altnames

Hi. I'm new on integrating s3 on node. I have followed the tutorials and wrote a code for uploading files to s3. So I'm getting this error:

error: Server Error (500)
error: Error: Hostname/IP doesn't match certificate's altnames
    at SecurePair.<anonymous> (tls.js:1366:23)
    at SecurePair.EventEmitter.emit (events.js:92:17)
    at SecurePair.maybeInitFinished (tls.js:970:10)
    at CleartextStream.read [as _read] (tls.js:463:15)
    at CleartextStream.Readable.read (_stream_readable.js:320:10)
    at EncryptedStream.write [as _write] (tls.js:366:25)
    at doWrite (_stream_writable.js:219:10)
    at writeOrBuffer (_stream_writable.js:209:5)
    at EncryptedStream.Writable.write (_stream_writable.js:180:11)
    at write (_stream_readable.js:583:24)
    at flow (_stream_readable.js:592:7)
    at Socket.pipeOnReadable (_stream_readable.js:624:5) { [NetworkingError: Hostname/IP doesn't match certificate's altnames]
  message: 'Hostname/IP doesn\'t match certificate\'s altnames',
  code: 'NetworkingError',
  region: 'us-west-2',
  hostname: 's3.us-west-2.amazonaws.com',
  retryable: true,
  time: Fri Nov 21 2014 15:38:32 GMT+0400 (AMT) }

I'm on 'us-west-2' region and my bucket name looks like this: photos.mysite.com. Any suggestion?

Invalid XML on Delete

Debugging shows that there's a Delete being requested against the root of my bucket with the following body -

<Delete xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Quiet>true</Quiet></Delete>

Path is "/my-bucket-name/?delete". This makes it seem like we're trying to do a multi-object delete ( http://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html ), but the files to delete are not being passed in to the call, so it's generating bogus XML. I'm going to try removing the "deleteRemoved" parameter and see if this goes away.

Here's the stack trace.

unable to upload: MalformedXML: The XML you provided was not well-formed or did not validate against our published schema
  at Request.extractError (.../node_modules/s3/node_modules/aws-sdk/lib/services/s3.js:281:35)
  at Request.callListeners (.../node_modules/s3/node_modules/aws-sdk/lib/sequential_executor.js:114:20)
  at Request.callListeners (.../node_modules/s3/node_modules/aws-sdk/lib/sequential_executor.js:115:16)
  at Request.emit (.../node_modules/s3/node_modules/aws-sdk/lib/sequential_executor.js:81:10)
  at Request.emit (.../node_modules/s3/node_modules/aws-sdk/lib/request.js:593:14)
  at Request.transition (.../node_modules/s3/node_modules/aws-sdk/lib/request.js:24:12)
  at AcceptorStateMachine.runTo (.../node_modules/s3/node_modules/aws-sdk/lib/state_machine.js:14:12)
  at .../node_modules/s3/node_modules/aws-sdk/lib/state_machine.js:26:10
  at Request.<anonymous> (.../node_modules/s3/node_modules/aws-sdk/lib/request.js:31:9)
  at Request.<anonymous> (.../node_modules/s3/node_modules/aws-sdk/lib/request.js:595:12)
  at Request.callListeners (.../node_modules/s3/node_modules/aws-sdk/lib/sequential_executor.js:90:20)
  at Request.callListeners (.../node_modules/s3/node_modules/aws-sdk/lib/sequential_executor.js:115:16)
  at Request.emit (.../node_modules/s3/node_modules/aws-sdk/lib/sequential_executor.js:81:10)
  at Request.emit (.../node_modules/s3/node_modules/aws-sdk/lib/request.js:593:14)
  at Request.transition (.../node_modules/s3/node_modules/aws-sdk/lib/request.js:24:12)
  at AcceptorStateMachine.runTo (.../node_modules/s3/node_modules/aws-sdk/lib/state_machine.js:14:12)
  at .../node_modules/s3/node_modules/aws-sdk/lib/state_machine.js:26:10
  at Request.<anonymous> (.../node_modules/s3/node_modules/aws-sdk/lib/request.js:31:9)
  at Request.<anonymous> (.../node_modules/s3/node_modules/aws-sdk/lib/request.js:595:12)
  at Request.callListeners (.../node_modules/s3/node_modules/aws-sdk/lib/sequential_executor.js:90:20)
  at callNextListener (.../node_modules/s3/node_modules/aws-sdk/lib/sequential_executor.js:105:18)
  at IncomingMessage.onEnd (.../node_modules/s3/node_modules/aws-sdk/lib/event_listeners.js:171:11)
  at IncomingMessage.EventEmitter.emit (events.js:117:20)
  at _stream_readable.js:919:16
  at process._tickDomainCallback (node.js:463:13)

fileClosed event is never called on uploadFile

I'm uploading an image to my server, I'm creating 3 thumbnails and uploading those on my S3. I want to delete the image and the thumbnails from my server but it looked like something is keeping the file open so i can't delete them.

I added the fileOpened event to check if it was called and it is not.
In my console i see:
fileOpened
END

Am i missing something or it is an issue ?

...

var uploader = s3client.uploadFile(params);

uploader.on('error', function(err) {
    callback(err.stack);
});

uploader.on('end', function() {
         console.log("END");
});

uploader.on('fileClosed', function() {
    console.log("fileClosed");
    fs.unlink(path, function(){
        callback(null,  fileName);
    });
});

uploader.on('fileOpened', function() {
    console.log("fileOpened");
});

Test Retry MultiPart (Question)

I'm looking for a way to test or simulate Retry upload for this S3 lib and I'm wondering if you have a way to simulate it?

Thanks

issue with integrating with node server

There seems to be an issue with using this with the http module. I was trying this with the example given in the README and it worked, the whole of the file was uploaded to the s3 bucket. But when I tried to create a server that would accept a file name as its parameter and then upload that file into s3, it doesn't work. It gets stuck somewhere (see screenshot below) and the whole of the file doesn't get uploaded into s3.

Here's the code that I currently have:

var s3 = require('s3');

var client = s3.createClient({
  maxAsyncS3: 14,
  s3RetryCount: 3,
  s3RetryDelay: 1000,
  s3Options: {
    accessKeyId: "xyz",
    secretAccessKey: "abc",
    region: "us-west-2"
  }
});


var http = require('http');
var qs = require('querystring');


var server = http.createServer(function (request, response) {

  var headers = {};
  headers["Access-Control-Allow-Origin"] = "*";
  headers["Access-Control-Allow-Methods"] = "POST, GET, PUT, DELETE, OPTIONS";
  headers["Access-Control-Allow-Credentials"] = false;
  headers["Access-Control-Max-Age"] = '86400'; // 24 hours
  headers["Access-Control-Allow-Headers"] = "X-Requested-With, X-HTTP-Method-Override, Content-Type, Accept";
  headers["Content-Type"] = "text/plain";
  response.writeHead(200, headers);

  var body = '';
  request.on('data', function (data) {
      body += data;
  });
  request.on('end', function () {
    var data = qs.parse(body);
    console.log('now uploading ' + data.filename);
    var params = {
      localFile: "uploads/my-uploads/" + data.filename,

      s3Params: {
        Bucket: "my-uploads",
        Key: "uploads/my-uploads/" + data.filename,
      },
    };

    var uploader = client.uploadFile(params);
    uploader.on('error', function(err) {
      console.error("unable to upload:", err.stack);
      response.end('aww..');
    });

    uploader.on('progress', function() {
      console.log("progress", uploader.progressMd5Amount,
                uploader.progressAmount, uploader.progressTotal);
    });

    uploader.on('end', function() {
      console.log("done uploading");
      response.end('awesome');
    });

  });

});


server.listen(8124);

Any ideas?

receiving data with busboy

I'm stuck with a problem now for far to long,.. maybe I am missing something, but my s3-client is not able to receive the data from an async post.

I know, it's not really an issue with the client, but I would really need a hint on how to receive the full filepath to set it as localFile.
I am now using a mean-stack and I am receiving some file-data by connect-busboy like filename, encoding and mimeType. but where would I get the full filePath from (I would actually run into the same problem by just sending a file by post from a form-action).
Maybe this is a noob-question and I am just missing something.. I jsut can't figure that out.

intermittent "ETag does not match MD5 checksum" errors

It's an AWS SDK bug: aws/aws-sdk-js#312

process silently exits when uploading directory

With this code:

s3 = require('s3');
uuid = require('uuid');
require('graceful-fs');

var client = s3.createClient({
  maxAsyncS3: Infinity,
  s3RetryCount: 3,
  s3RetryDelay: 1000,
  s3Options: {
    accessKeyId: "<key>",
    secretAccessKey: "<secret>",
    // any other options are passed to new AWS.S3()
    // See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#constructor-property
  },
});

var params = {
  localDir: "data",

  s3Params: {
    Bucket: "<bucket-name>",
    Prefix: uuid.v1() + "/"
  },
};
var uploader = client.uploadDir(params);
uploader.on('error', function(err) {
  console.log("unable to sync:", err.stack);
});
uploader.on('progress', function() {
  console.log("progress", uploader.progressAmount, uploader.progressTotal);
  console.log(uploader);
});
uploader.on('end', function() {
  console.log("done uploading");
});

I get a bunch of progress events,

in the progress handler I've got this: console.log(uploader);

The last output I see is:

{ domain: null,
  _events: { error: [Function], progress: [Function], end: [Function] },
  _maxListeners: 10,
  progressTotal: 0,
  progressAmount: 0,
  progressMd5Amount: 14513078,
  progressMd5Total: 14773190,
  objectsFound: 0,
  startedTransfer: false }

And then the process exits. Running on [email protected] [email protected]

That's when installed directly from latest @git

When installed from latest on npm I get this:

$ node s3-up.js 
progress 0 0
{ domain: null,
  _events: { error: [Function], progress: [Function], end: [Function] },
  _maxListeners: 10,
  progressTotal: 0,
  progressAmount: 0,
  objectsFound: 0 }

And then the process exits with no other messages. The scripts do work on a linux machine.

andrewrk / node-s3-client Goto Github PK

node-s3-client's Introduction

High Level Amazon S3 Client

Installation

Features

Synopsis

Create a client

Create a client from existing AWS.S3 object

Upload a file to S3

Download a file from S3

Sync a directory to S3

Tips

API Documentation

s3.AWS

s3.createClient(options)

s3.getPublicUrl(bucket, key, [bucketLocation])

s3.getPublicUrlHttp(bucket, key)

client.uploadFile(params)

client.downloadFile(params)

client.downloadBuffer(s3Params)

client.downloadStream(s3Params)

client.listObjects(params)

client.deleteObjects(s3Params)

client.uploadDir(params)

client.downloadDir(params)

client.deleteDir(s3Params)

client.copyObject(s3Params)

client.moveObject(s3Params)

Examples

Check if a file exists in S3

Testing

node-s3-client's People

Contributors

Stargazers

Watchers

Forkers

node-s3-client's Issues

Recommend Projects

Recommend Topics

Recommend Org