pivotal-cf / azure-blobstore-resource Goto Github PK

View Code? Open in Web Editor NEW

16.0 23.0 15.0 31.71 MB

A concourse resource to interact with the azure blob service.

License: MIT License

Go 97.45% Shell 2.38% Dockerfile 0.16%

azure blob concourse concourse-resource

azure-blobstore-resource's People

Contributors

Stargazers

Watchers

Forkers

lancefrench backnotprop shinji62 rkoster mike-carey eitansuez bottkars svrc engineerbetter pivotal-jleung mastercactapus agiza isabella232 kungfoome crusstu

azure-blobstore-resource's Issues

File uploads fail over 195G

We are trying to upload files to the blob store that are over 195 G and they are failing with a message of 'block list is over 50,000'. Seems like the 4M limit on the chunk size only applied to the REST version < 2016-05-31. The Rest versions after that supports chunk size up to 100M. The current service version used by 'Azure/azure-storage-blob-go' is 2018-11-09. Can we make the chunk size in 'azure-blobstore-resource/azure/client.go' configurable or just be set to 100M?
We are using this through Concourse's azure-blobsotre resource-type, which uses the 'pcfabr/azure-blobstore-resource' docker image.

consider implementing download/upload in terms of azure-storage-blob-go sdk

it appears microsoft has an sdk dedicated to storing all types of blobs (block blobs etc..).
see https://github.com/Azure/azure-storage-blob-go

it might be more performant, and delegate more of the details of the download/upload operation to the sdk, where it likely belongs.

consider upgrading dependencies to latest stable

see my Gopkg.toml for the recipe.

this brings ginkgo, gomega, and azure-sdk-for-go up to latest stable version.

the override of fsnotify is a known issue with dep.

the override of opencensus-proto has to do with incorrect dependency version between two transitive dependencies.

Can't extract object when object key has a "folder

Issue:
When you have a blobname with a folder path in it and when you go to extract it, it can't find the file. The file gets downloaded to the tmp folder.

Example:
object = somefolder/apps/artifact.tar.gz

File gets download to /tmp/build/23rfwef/artifact.tar.gz, but when it goes to extract the file, it is looking for the file at /tmp/build/23rfwef/somefolder/apps/artifact.tar.gz which doesn't exist.

What should happen:
File gets downloaded to /tmp/build/23rfwef/artifact.tar.gz and it extracts from the same place it's downloaded to.

Solution:

azure-blobstore-resource/cmd/in/main.go

Line 77 in e7f086c

err = in.UnpackBlob(filepath.Join(destinationDirectory, blobName))

blobName should be path.Base(blobName).

err = in.UnpackBlob(filepath.Join(destinationDirectory, path.Base(blobName)))

Would it be possible to support unpacking blobs similar to S3?

The S3 Concourse supports unpacking blobs (unpack: true) as part of in, would it be possible to do the same here? The code is already written and would just have to be ported to support Azure blobstore: https://github.com/concourse/s3-resource/blob/master/in/archive.go

tar.gz is not supported

Issue:
When a file is named as tar.gz instead of tgz the mime type comes across as x-gzip instead of gzip. Therefore, it fails to extract the file since there is no case to support this.

What should happen:
If i have a file named as artifact.tar.gz it should be able to extract that file as if it was a gzipped tarball.

Solution:
Add a switch case for x-gzip to do the same thing as gzip.

azure-blobstore-resource/api/in.go

Lines 47 to 49 in e7f086c

 switch fileType { 

 case "application/gzip": 

 cmd = exec.Command("gzip", "-d", filename)

Check/Get shouldn't require a snapshot to exist first

Not having a snapshot throws the following error:

2018/11/15 02:06:42 failed to copy blob: storage: service returned error: StatusCode=400, ErrorCode=OutOfRangeInput, ErrorMessage=One of the request inputs is out of range.
RequestId:93d35261-801e-010b-6f87-7c0af5000000
Time:2018-11-15T02:06:42.0180762Z, RequestInitiated=Thu, 15 Nov 2018 02:06:41 GMT, RequestId=93d35261-801e-010b-6f87-7c0af5000000, API Version=2016-05-31, QueryParameterName=, QueryParameterValue=

A snapshot shouldn't really be required if I just uploaded a file to be retrieved by Concourse.

failed to put blob since update

since the changes from February an upload to the blobstore fails.

we have

- name: test-image-path
  type: azure-blob
  source:
    container: pipeline
    storage_account_name: ((azure-storage-account-name))
    storage_account_key:   ((azure-storage-access-key))
    versioned_file: test-image-path

- task foobar
  - put: test-image-path
    params:
      file: packer_artifacts/vhd_uri

as i see in the azure blobstore it created a blob with the name vhd_uri
but should update the file test-image-path

concourse replies with

2019/02/08 10:34:28 failed to copy blob: storage: service returned error: StatusCode=404, ErrorCode=404 The specified blob does not exist., ErrorMessage=no response body was available for error status code, RequestInitiated=Fri, 08 Feb 2019 10:34:28 GMT, RequestId=32ea0716-801e-007d-2799-bfb435000000, API Version=2016-05-31, QueryParameterName=, QueryParameterValue=

How do I avoid the "failed to copy blob: context deadline exceeded" error?

I have 15GB blob that I am trying to download, but I keep getting the "failed to copy blob: context deadline exceeded" error. I have successfully downloaded smaller blobs, so I know I am doing it the right way. The failures always seems to happen at right around 10 minutes (give or take 30 seconds).

I have tried bumping the retry_timeout to 60m and setting the block_size to 50 and 100.

resource_types:
- name: azure-blobstore
  type: docker-image
  source:
    repository: pcfabr/azure-blobstore-resource
resources:
- name: pas-product
  type: azure-blobstore
  check_every: 4h
  source:
    storage_account_name: ((storage_account_name))
    storage_account_key: ((storage_account_key))
    container: tile-downloads
    regexp: srt-(((pas_major_minor_version)))-(.*).pivotal
    block_size: 50
    retry:
      try_timeout: "60m"
jobs:
- name: upload-and-stage-pas
  serial: true
  plan:
  - aggregate:
    - get: pas-product
      params:
        globs:
        - "srt-*"
  - task: test-config-files
    config:
      platform: linux
      image_resource:
        type: docker-image
        source:
          repository: sandyg1/om-cred-auto
      run:
        path: sh
        args:
        - -ec
        - |
           ls -lah

Large File Download results in out of memory

In trying to download a large file (pas tile) from Azure blob store to a concourse worker, the way azure handles it results in an fatal error: runtime: out of memory

Attached is the error seen in concourse:
error.txt

Resource produces duplicate Concourse versions for the same blob

Hi there 👋 I'm not entirely sure if this is known/expected behaviour and someone can correct my usage of the resource or if this actually a bug, but I've noticed that the azure-blobstore-resource seems to be producing duplicate Concourse resource versions for the same blob. This is an issue for me because some of my jobs are set up to trigger: true on new blob versions, but sometimes they get stuck looking for the newest versions that satisfy my passed criteria and don't trigger as expected.

In my pipeline I take in a blob from another source and put it to an Azure blobstore container via the resource to keep a copy of the blob I can manage myself instead of relying on the original source to be available. Later on in the same pipeline I get the blob using the same azure-blobstore-resource instance I putted to earlier and do some work with it. When I look at the versions produced by the azure-blobstore-resource I can see 2 resource versions for the same blob:

One has just the path field and the other has path and version. Both of the paths are the same and refer to the same blob, but it looks like because the second resource version has the additional version field Concourse treats them as 2 separate resource versions.

I had a quick look over the implementation and it looks like in the out script it produces new Concourse resource versions with just the path field (I'm using regexes, not blobstore snapshots):

azure-blobstore-resource/cmd/out/main.go

Lines 80 to 84 in 0b0c727

 versionsJSON, err := json.Marshal(api.Response{ 

 Version: api.ResponseVersion{ 

 Snapshot: snapshot, 

 Path: path, 

 },

but the check script produces new Concourse resource versions with a path and a version:

azure-blobstore-resource/api/check.go

Lines 152 to 156 in 0b0c727

 newerVersions = append(newerVersions, Version{ 

 Path: stringPtr(blob.Name), 

 Version: stringPtr(ver.AsString()), 

 comparableVersion: ver, 

 })

Unless I'm not using the resource correctly, I would expect to see only 1 resource version for a given blob.

0.8.0: x509: certificate signed by unknown authority

With the latest version, we are seeing failures uploading due to cert issues.

Reverting to 0.7.0 resolved the issue.

Output:

2019/12/04 23:28:46 failed to upload blob: -> github.com/Azure/azure-pipeline-go/pipeline.NewError, /go/pkg/mod/github.com/!azure/[email protected]/pipeline/error.go:154
HTTP request failed

Put https://<REDACTED>.blob.core.windows.net/backup/export/export-2019-12-04T23-26-06+0000.sql.gz?blockid=<REDACTED>&comp=block&timeout=61: x509: certificate signed by unknown authority

Check should return all versions from current to latest

Check currently only returns the latest even if there is a gap between what concourse currently knows and what is the latest. We should return every version from current to latest.

Regexp check version shows path and snapshot

Regexp based checks should not have a snapshot in the version. It is irrelevant.

Would it be possible to support GLOBs/wildcard for PUTs?

Would it be possible to support GLOBs/wildcards for resource PUTs?

Currently in order to PUT a file to this azure blob store resource, we have to call out the entire filename explicitly. Even the simplest GLOBs/regex fails with an unable to find file error.

Regexp broken if Azure container contains a folder with more that one file in it

The way to reproduce:
Create an Azure container with a structure containing a folder and more than one file inside of the folder in addition to files located in the root of the container.
Ex:

<container_name>

<folder_name>
- file_in_folder_a-1.2.3.tgz
- file_in_folder_b-1.3.4.tgz
file_in_root_a-1.2.3.tgz
file_in_root_b-1.2.4.tgz

With a container file/folder structure as depicted above, the capturing group regex is not working anymore.
Ex: file_in_root_a-(.*).tgz
I had only a limited amount of time to troubleshoot but it looks like the API call/ go library is returning an empty array in this case so the root cause might reside in there.

Regexp issue with platform-automation artifiacts

We are experiencing some inconsistencies when using the regexp in the source configuration when retrieving artifacts from our Blobstore in Azure:

Below is an snippet of the files in the blobstore:

Below is the example of our configuration:

resources:
- name: platform-automation-image
 type: azure-blobstore
 source:
   storage_account_name: ((storage_account_name))
   storage_account_key: ((storage_account_key))
   container: ((container))
   regexp: platform-automation-image-(.*).tgz

- name: platform-automation-tasks
 type: azure-blobstore
 source:
   storage_account_name: ((storage_account_name))
   storage_account_key: ((storage_account_key))
   container: ((container))
   regexp: platform-automation-tasks-(.*).zip

Below is the error we are seeing in Concourse:

When we initially flew up the pipeline, this was working just fine. We ran a couple of jobs multiple times with no issues. Now all of a sudden, the resource isn't finding the blob anymore for some odd reason. We've deleted the pipeline and re-flown it up but still no luck.

As a workaround, we switched to using the versioned_file source parameter by giving the explicit file name. This works however, we don't want to do this in the long term due to newer versions of this being released.

Specify block size in terms of units

The block_size param is currently specified in bytes, which isn't the most intuitive way to specify the block size considering most times its going to be somewhere between 4MB-100MB. The block_size param should allow the user to specify the block size in terms of MB. However, to not break existing users specifying the block size in terms of bytes the resource shouldn't just switch to considering the value as MB.

Allow the user to add the unit to the block_size e.g 10M or 10MB will set the block size to 10 megabytes.

Directories not supported

Specifying a versioned_file blob inside a logical directory currently breaks the get (in) operation.

YAML Specified:
versioned_file: platform-automation/0.0.1-rc.248/platform-automation-0.0.1-rc.248-tasks.zip

Error Output:
2018/11/09 19:50:42 failed to copy blob: open /tmp/build/get/platform-automation/0.0.1-rc.248/platform-automation-0.0.1-rc.248-tasks.zip: no such file or directory

Putting the blob at the container root fixes the error.

Occaisionally panics due to upstream bug

This will SIGSEV on download under some circumstances, I suspect due to this issue:
Azure/azure-storage-blob-go#134

This I believe is fixed in 0.7.0 of the upstream. Going to try to build a custom fork of this with the new dependency to validate.

When unarchiving, the CLIs handle edge cases with symlinks

When we had an issue with the GCS resource (the user could not unarchive our docker image properly), we investigated.
It turned out that the resource was using the golang libraries for tar and zip.
The libraries are helpful, but it turned out they did not handle edge cases with symlinks.

We made a PR to the GCS resource to have it use the CLIS.
We did try to use the archiver library in golang, but it could not handle the symlinks.
The PR follows the same patterns done by the native s3-resource.

Add support for initial state

For more feature parity with the s3 resource the initial state params should be supported by this resource.

https://github.com/concourse/s3-resource#initial-state

Issue with sub-microseconds digit in snapshot timestamps

Hi,

Our pipelines are blocked because when the last 7th digit is 0 the resource omits it, whereas Azure is expecting it.

Example:

The Azure blobstore resources creates a terraform.tfstate?snapshot=2019-10-23T14:40:22.186881Z snapshot.
Azure stores 2019-10-23T14:40:22.1868810Z as the snapshot timestamp, please note the last 0, right before Z, that seems to be added by Azure.
When the resource tries to access this snapshot, it uses 2019-10-23T14:40:22.186881Z as the timestamp, but Azure is expecting the last 0 digit to be specified, so an error 400 is returned because the timestamp is deemed invalid by Azure.

The resulting error is:

2019/10/23 15:43:07 failed to copy blob: -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /go/pkg/mod/github.com/!azure/[email protected]/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=) =====
Description=400 Value for one of the query parameters specified in the request URI is invalid., Details: (none)
   HEAD https://<redacted>.blob.core.windows.net/terraform/terraform.tfstate?snapshot=2019-10-23T14%3A40%3A22.186881Z&timeout=61
   Authorization: REDACTED
   User-Agent: [Azure-Storage/0.7 (go1.13.3; linux)]
   X-Ms-Client-Request-Id: [2f79966f-02b2-4382-47ba-9f0083c91dca]
   X-Ms-Date: [Wed, 23 Oct 2019 15:43:07 GMT]
   X-Ms-Version: [2018-11-09]
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 Value for one of the query parameters specified in the request URI is invalid.
   Date: [Wed, 23 Oct 2019 15:43:07 GMT]
   Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]
   X-Ms-Request-Id: [afa3f0b9-601e-0000-30b8-891dd2000000]

When the last digit of the timestamp is not 0, then the resource succeeds at downloading the snapshot.

The issue is the same when storing a new version. Whenever the last digit is 0, it fails, and whenever it is not 0 it succeeds.

Cloud you please push a fix as soon as possible, because our pipelines are experiencing big flaky errors with this issue.

Best,
Benjamin

downloading large file from blob - context deadline exceeded

Trying to download a 500 GB file from Azure container and the get resource fails with:

failed to copy blob: context deadline exceeded

using the default block size

consistency (with s3 blobstore resource) with respect to file placement

i have gone through the exercise of retrofitting the pipelines from platform automation (see http://docs.pivotal.io/platform-automation/v2.1/reference/pipeline.html ) to use the azure blobstore resource instead of the s3 blobstore resource.

in the process, i discovered a difference in behavior between the two resource implementations which i'd like to describe next.

in the first pipeline (retrieving external dependencies), take for example the resource pas-stemcell.
its regexp specifies a subdirectory pas-stemcell/.

in the subsequent pipeline ("installing ops mgr and tiles"), we see pas-stemcell again, with the same regexp.

when the job named 'upload-stemcells' runs, the task named 'upload-pas-stemcell' fails with a "file not found". when i concourse hijacked into the container, i discovered that the path to the stemcell inside the container was pas-stemcell/pas-stemcell/{stemcellfilename}. i.e. it had a nested subdirectory. not so when i use the s3 blobstore resource.

i worked around the issue by adding a step to move the file, like so:

      - task: move-file-shim
        config:
          platform: linux
          inputs:
          - name: pas-stemcell
          run:
            path: /bin/sh
            args:
            - -c
            - mv pas-stemcell/pas-stemcell/* wellplaced-stemcell/
          outputs:
          - name: wellplaced-stemcell

and then in the subsequent task, i replaced the input_mapping -> stemcell to the output wellplaced-stemcell.

i realized a more elegant solution is to consider revising the blobstore resource's implementation to match whatever the s3 resource currently does.

i'm not yet familiar enough with go and with this project to contribute a PR just yet. but in case this is easily captured via a couple of unit tests and easily fixed, at least i could put this on your radar.

thanks in advance for your consideration.

Please configure GITBOT

Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.

Steps:

Fork this repo: cfgitbot-config
Add your project to config-production.yml file
Submit a PR

If there are any questions, please reach out to [email protected].

Expose TryTimeout on In and Out

It seems to be possible to get a context deadline exceeded on very large blobs, being able to increase the timeout seems to be a possible mitigation for this issue.

Azure China Cloud is not supported

Hi there,

We're trying to use this resource with Azure China and it fails to check the resource because the base URL of the blobstore is hardcoded and by default is blob.core.windows.net.
In Azure China, this URL should be blob.core.windows.cn.

There should be a way to provide an input to define which Azure you want to use (defaulting to AzureCloud) like in this resource: https://github.com/pivotal-cloudops/azure-blobstore-concourse-resource

The error you'd get by using this resource against Azure China Cloud is:

resource script '/opt/resource/check []' failed: exit status 1

stderr:
2018/11/05 08:21:26 failed to get latest version: Get https://BLOBSTORENAME.blob.core.windows.net/XXXXXXXX?comp=list&include=snapshots&prefix=FILE.NAME&restype=container: dial tcp: lookup BLOBSTORENAME.blob.core.windows.net on 168.63.129.16:53: no such host

Thanks for your help.
CC: @lakshmantgld @keliangneu

0.6.x fails any resource that requires unpack: true

Was using this with tag "latest" at a customer and all pipelines started failing because they use this resource with unpack: true for Platform Automation image and tasks.

Reverted to 0.5.0 and it works again.

Parallel Uploads - The specified block list is invalid

Running multiple uploads from multiple jobs running in parallel sometimes gives the following error.
We did set the block_size of 100Mb. Not sure if it is related to this StackOverflow - The specified block list is invalid )

2020/11/05 14:58:06 failed to upload blob: -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /go/pkg/mod/github.com/!azure/[email protected]/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=InvalidBlockList) =====
Description=The specified block list is invalid.
RequestId:8fdbe99d-d01e-0004-2484-b3c323000000
Time:2020-11-05T14:58:06.4775739Z, Details: 
   Code: InvalidBlockList
   PUT https://xxxxx.blob.core.windows.net/product-tiles/stemcells/[stemcells-ubuntu-xenial,621.90]bosh-stemcell-621.90-azure-hyperv-ubuntu-xenial-go_agent.tgz?comp=blocklist&timeout=61
   Authorization: REDACTED
   Content-Length: [653]
   Content-Type: [application/xml]
   User-Agent: [Azure-Storage/0.7 (go1.14.2; linux)]
   X-Ms-Blob-Cache-Control: []
   X-Ms-Blob-Content-Disposition: []
   X-Ms-Blob-Content-Encoding: []
   X-Ms-Blob-Content-Language: []
   X-Ms-Blob-Content-Type: []
   X-Ms-Client-Request-Id: [599d2426-0cbd-4acb-60e5-2d8db6fecfb2]
   X-Ms-Date: [Thu, 05 Nov 2020 14:58:06 GMT]
   X-Ms-Version: [2018-11-09]
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 The specified block list is invalid.
   Content-Length: [221]
   Content-Type: [application/xml]
   Date: [Thu, 05 Nov 2020 14:58:05 GMT]
   Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]
   X-Ms-Error-Code: [InvalidBlockList]
   X-Ms-Request-Id: [8fdbe99d-d01e-0004-2484-b3c323000000]
   X-Ms-Version: [2018-11-09]

Release ARM64 images

Hi Team,

I am trying to use the azure-blobstore-resource image on the arm64 platform but it seems the arm64 tag is not available for this image.

I have built the image successfully in the local arm64 machine.

Do you have any plans in releasing the arm64 image?

It will be very helpful if the arm64 supported tag is available. If interested, I can raise a PR.

Upload failing with http/413 Request Entity Too Large files over 4MB

Hey there; I'm trying to use the azure-blobstore-resource within a concourse pipeline and when the pipeline uploads files over 4MB I get this error:

2018/08/13 21:28:07 failed to upload blob: storage: service returned error: StatusCode=413, ErrorCode=RequestBodyTooLarge, ErrorMessage=The request body is too large and exceeds the maximum permissible limit.
RequestId:50459769-801e-0017-184c-33f80a000000
Time:2018-08-13T21:28:06.9913482Z, RequestInitiated=Mon, 13 Aug 2018 21:28:06 GMT, RequestId=50459769-801e-0017-184c-33f80a000000, API Version=2016-05-31, QueryParameterName=, QueryParameterValue=

Confirmed that I'm able to upload smaller files that are smaller than 4MB. It looks like there's some chunking that needs to happen to allow for files larger than 4MB.

	switch fileType {
	case "application/gzip":
	cmd = exec.Command("gzip", "-d", filename)

	versionsJSON, err := json.Marshal(api.Response{
	Version: api.ResponseVersion{
	Snapshot: snapshot,
	Path: path,
	},

	newerVersions = append(newerVersions, Version{
	Path: stringPtr(blob.Name),
	Version: stringPtr(ver.AsString()),
	comparableVersion: ver,
	})