pivotal-cf / azure-blobstore-resource Goto Github PK
View Code? Open in Web Editor NEWA concourse resource to interact with the azure blob service.
License: MIT License
A concourse resource to interact with the azure blob service.
License: MIT License
We are trying to upload files to the blob store that are over 195 G and they are failing with a message of 'block list is over 50,000'. Seems like the 4M limit on the chunk size only applied to the REST version < 2016-05-31. The Rest versions after that supports chunk size up to 100M. The current service version used by 'Azure/azure-storage-blob-go' is 2018-11-09. Can we make the chunk size in 'azure-blobstore-resource/azure/client.go' configurable or just be set to 100M?
We are using this through Concourse's azure-blobsotre resource-type, which uses the 'pcfabr/azure-blobstore-resource' docker image.
it appears microsoft has an sdk dedicated to storing all types of blobs (block blobs etc..).
see https://github.com/Azure/azure-storage-blob-go
it might be more performant, and delegate more of the details of the download/upload operation to the sdk, where it likely belongs.
see my Gopkg.toml for the recipe.
this brings ginkgo, gomega, and azure-sdk-for-go up to latest stable version.
the override of fsnotify is a known issue with dep.
the override of opencensus-proto has to do with incorrect dependency version between two transitive dependencies.
Issue:
When you have a blobname with a folder path in it and when you go to extract it, it can't find the file. The file gets downloaded to the tmp folder.
Example:
object = somefolder/apps/artifact.tar.gz
File gets download to /tmp/build/23rfwef/artifact.tar.gz
, but when it goes to extract the file, it is looking for the file at /tmp/build/23rfwef/somefolder/apps/artifact.tar.gz
which doesn't exist.
What should happen:
File gets downloaded to /tmp/build/23rfwef/artifact.tar.gz and it extracts from the same place it's downloaded to.
Solution:
azure-blobstore-resource/cmd/in/main.go
Line 77 in e7f086c
blobName should be path.Base(blobName).
err = in.UnpackBlob(filepath.Join(destinationDirectory, path.Base(blobName)))
The S3 Concourse supports unpacking blobs (unpack: true
) as part of in
, would it be possible to do the same here? The code is already written and would just have to be ported to support Azure blobstore: https://github.com/concourse/s3-resource/blob/master/in/archive.go
Issue:
When a file is named as tar.gz instead of tgz the mime type comes across as x-gzip instead of gzip. Therefore, it fails to extract the file since there is no case to support this.
What should happen:
If i have a file named as artifact.tar.gz
it should be able to extract that file as if it was a gzipped tarball.
Solution:
Add a switch case for x-gzip to do the same thing as gzip.
azure-blobstore-resource/api/in.go
Lines 47 to 49 in e7f086c
Not having a snapshot throws the following error:
2018/11/15 02:06:42 failed to copy blob: storage: service returned error: StatusCode=400, ErrorCode=OutOfRangeInput, ErrorMessage=One of the request inputs is out of range.
RequestId:93d35261-801e-010b-6f87-7c0af5000000
Time:2018-11-15T02:06:42.0180762Z, RequestInitiated=Thu, 15 Nov 2018 02:06:41 GMT, RequestId=93d35261-801e-010b-6f87-7c0af5000000, API Version=2016-05-31, QueryParameterName=, QueryParameterValue=
A snapshot shouldn't really be required if I just uploaded a file to be retrieved by Concourse.
since the changes from February an upload to the blobstore fails.
we have
- name: test-image-path
type: azure-blob
source:
container: pipeline
storage_account_name: ((azure-storage-account-name))
storage_account_key: ((azure-storage-access-key))
versioned_file: test-image-path
- task foobar
- put: test-image-path
params:
file: packer_artifacts/vhd_uri
as i see in the azure blobstore it created a blob with the name vhd_uri
but should update the file test-image-path
concourse replies with
2019/02/08 10:34:28 failed to copy blob: storage: service returned error: StatusCode=404, ErrorCode=404 The specified blob does not exist., ErrorMessage=no response body was available for error status code, RequestInitiated=Fri, 08 Feb 2019 10:34:28 GMT, RequestId=32ea0716-801e-007d-2799-bfb435000000, API Version=2016-05-31, QueryParameterName=, QueryParameterValue=
I have 15GB blob that I am trying to download, but I keep getting the "failed to copy blob: context deadline exceeded" error. I have successfully downloaded smaller blobs, so I know I am doing it the right way. The failures always seems to happen at right around 10 minutes (give or take 30 seconds).
I have tried bumping the retry_timeout to 60m and setting the block_size to 50 and 100.
resource_types:
- name: azure-blobstore
type: docker-image
source:
repository: pcfabr/azure-blobstore-resource
resources:
- name: pas-product
type: azure-blobstore
check_every: 4h
source:
storage_account_name: ((storage_account_name))
storage_account_key: ((storage_account_key))
container: tile-downloads
regexp: srt-(((pas_major_minor_version)))-(.*).pivotal
block_size: 50
retry:
try_timeout: "60m"
jobs:
- name: upload-and-stage-pas
serial: true
plan:
- aggregate:
- get: pas-product
params:
globs:
- "srt-*"
- task: test-config-files
config:
platform: linux
image_resource:
type: docker-image
source:
repository: sandyg1/om-cred-auto
run:
path: sh
args:
- -ec
- |
ls -lah
In trying to download a large file (pas tile) from Azure blob store to a concourse worker, the way azure handles it results in an fatal error: runtime: out of memory
Attached is the error seen in concourse:
error.txt
Hi there ๐ I'm not entirely sure if this is known/expected behaviour and someone can correct my usage of the resource or if this actually a bug, but I've noticed that the azure-blobstore-resource
seems to be producing duplicate Concourse resource versions for the same blob. This is an issue for me because some of my jobs are set up to trigger: true
on new blob versions, but sometimes they get stuck looking for the newest versions that satisfy my passed
criteria and don't trigger as expected.
In my pipeline I take in a blob from another source and put
it to an Azure blobstore container via the resource to keep a copy of the blob I can manage myself instead of relying on the original source to be available. Later on in the same pipeline I get
the blob using the same azure-blobstore-resource
instance I put
ted to earlier and do some work with it. When I look at the versions produced by the azure-blobstore-resource
I can see 2 resource versions for the same blob:
One has just the path
field and the other has path
and version
. Both of the path
s are the same and refer to the same blob, but it looks like because the second resource version has the additional version
field Concourse treats them as 2 separate resource versions.
I had a quick look over the implementation and it looks like in the out
script it produces new Concourse resource versions with just the path
field (I'm using regexes, not blobstore snapshots):
azure-blobstore-resource/cmd/out/main.go
Lines 80 to 84 in 0b0c727
but the check
script produces new Concourse resource versions with a path
and a version
:
azure-blobstore-resource/api/check.go
Lines 152 to 156 in 0b0c727
Unless I'm not using the resource correctly, I would expect to see only 1 resource version for a given blob.
With the latest version, we are seeing failures uploading due to cert issues.
Reverting to 0.7.0
resolved the issue.
Output:
2019/12/04 23:28:46 failed to upload blob: -> github.com/Azure/azure-pipeline-go/pipeline.NewError, /go/pkg/mod/github.com/!azure/[email protected]/pipeline/error.go:154
HTTP request failed
Put https://<REDACTED>.blob.core.windows.net/backup/export/export-2019-12-04T23-26-06+0000.sql.gz?blockid=<REDACTED>&comp=block&timeout=61: x509: certificate signed by unknown authority
Check currently only returns the latest even if there is a gap between what concourse currently knows and what is the latest. We should return every version from current to latest.
Regexp based checks should not have a snapshot in the version. It is irrelevant.
Would it be possible to support GLOBs/wildcards for resource PUTs?
Currently in order to PUT a file to this azure blob store resource, we have to call out the entire filename explicitly. Even the simplest GLOBs/regex fails with an unable to find file error.
The way to reproduce:
Create an Azure container with a structure containing a folder and more than one file inside of the folder in addition to files located in the root of the container.
Ex:
<container_name>
With a container file/folder structure as depicted above, the capturing group regex is not working anymore.
Ex: file_in_root_a-(.*).tgz
I had only a limited amount of time to troubleshoot but it looks like the API call/ go library is returning an empty array in this case so the root cause might reside in there.
We are experiencing some inconsistencies when using the regexp
in the source configuration when retrieving artifacts from our Blobstore in Azure:
Below is an snippet of the files in the blobstore:
Below is the example of our configuration:
resources:
- name: platform-automation-image
type: azure-blobstore
source:
storage_account_name: ((storage_account_name))
storage_account_key: ((storage_account_key))
container: ((container))
regexp: platform-automation-image-(.*).tgz
- name: platform-automation-tasks
type: azure-blobstore
source:
storage_account_name: ((storage_account_name))
storage_account_key: ((storage_account_key))
container: ((container))
regexp: platform-automation-tasks-(.*).zip
Below is the error we are seeing in Concourse:
When we initially flew up the pipeline, this was working just fine. We ran a couple of jobs multiple times with no issues. Now all of a sudden, the resource isn't finding the blob anymore for some odd reason. We've deleted the pipeline and re-flown it up but still no luck.
As a workaround, we switched to using the versioned_file
source parameter by giving the explicit file name. This works however, we don't want to do this in the long term due to newer versions of this being released.
The block_size
param is currently specified in bytes, which isn't the most intuitive way to specify the block size considering most times its going to be somewhere between 4MB-100MB. The block_size
param should allow the user to specify the block size in terms of MB. However, to not break existing users specifying the block size in terms of bytes the resource shouldn't just switch to considering the value as MB.
Allow the user to add the unit to the block_size
e.g 10M
or 10MB
will set the block size to 10 megabytes.
Specifying a versioned_file
blob inside a logical directory currently breaks the get (in) operation.
YAML Specified:
versioned_file: platform-automation/0.0.1-rc.248/platform-automation-0.0.1-rc.248-tasks.zip
Error Output:
2018/11/09 19:50:42 failed to copy blob: open /tmp/build/get/platform-automation/0.0.1-rc.248/platform-automation-0.0.1-rc.248-tasks.zip: no such file or directory
Putting the blob at the container root fixes the error.
This will SIGSEV on download under some circumstances, I suspect due to this issue:
Azure/azure-storage-blob-go#134
This I believe is fixed in 0.7.0 of the upstream. Going to try to build a custom fork of this with the new dependency to validate.
When we had an issue with the GCS resource (the user could not unarchive our docker image properly), we investigated.
It turned out that the resource was using the golang libraries for tar and zip.
The libraries are helpful, but it turned out they did not handle edge cases with symlinks.
We made a PR to the GCS resource to have it use the CLIS.
We did try to use the archiver
library in golang, but it could not handle the symlinks.
The PR follows the same patterns done by the native s3-resource.
For more feature parity with the s3 resource the initial state params should be supported by this resource.
Hi,
Our pipelines are blocked because when the last 7th digit is 0
the resource omits it, whereas Azure is expecting it.
Example:
terraform.tfstate?snapshot=2019-10-23T14:40:22.186881Z
snapshot.2019-10-23T14:40:22.1868810Z
as the snapshot timestamp, please note the last 0
, right before Z
, that seems to be added by Azure.2019-10-23T14:40:22.186881Z
as the timestamp, but Azure is expecting the last 0
digit to be specified, so an error 400
is returned because the timestamp is deemed invalid by Azure.The resulting error is:
2019/10/23 15:43:07 failed to copy blob: -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /go/pkg/mod/github.com/!azure/[email protected]/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=) =====
Description=400 Value for one of the query parameters specified in the request URI is invalid., Details: (none)
HEAD https://<redacted>.blob.core.windows.net/terraform/terraform.tfstate?snapshot=2019-10-23T14%3A40%3A22.186881Z&timeout=61
Authorization: REDACTED
User-Agent: [Azure-Storage/0.7 (go1.13.3; linux)]
X-Ms-Client-Request-Id: [2f79966f-02b2-4382-47ba-9f0083c91dca]
X-Ms-Date: [Wed, 23 Oct 2019 15:43:07 GMT]
X-Ms-Version: [2018-11-09]
--------------------------------------------------------------------------------
RESPONSE Status: 400 Value for one of the query parameters specified in the request URI is invalid.
Date: [Wed, 23 Oct 2019 15:43:07 GMT]
Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]
X-Ms-Request-Id: [afa3f0b9-601e-0000-30b8-891dd2000000]
When the last digit of the timestamp is not 0
, then the resource succeeds at downloading the snapshot.
The issue is the same when storing a new version. Whenever the last digit is 0
, it fails, and whenever it is not 0
it succeeds.
Cloud you please push a fix as soon as possible, because our pipelines are experiencing big flaky errors with this issue.
Best,
Benjamin
Trying to download a 500 GB file from Azure container and the get resource fails with:
failed to copy blob: context deadline exceeded
using the default block size
i have gone through the exercise of retrofitting the pipelines from platform automation (see http://docs.pivotal.io/platform-automation/v2.1/reference/pipeline.html ) to use the azure blobstore resource instead of the s3 blobstore resource.
in the process, i discovered a difference in behavior between the two resource implementations which i'd like to describe next.
in the first pipeline (retrieving external dependencies), take for example the resource pas-stemcell
.
its regexp specifies a subdirectory pas-stemcell/
.
in the subsequent pipeline ("installing ops mgr and tiles"), we see pas-stemcell again, with the same regexp.
when the job named 'upload-stemcells' runs, the task named 'upload-pas-stemcell' fails with a "file not found". when i concourse hijacked into the container, i discovered that the path to the stemcell inside the container was pas-stemcell/pas-stemcell/{stemcellfilename}. i.e. it had a nested subdirectory. not so when i use the s3 blobstore resource.
i worked around the issue by adding a step to move the file, like so:
- task: move-file-shim
config:
platform: linux
inputs:
- name: pas-stemcell
run:
path: /bin/sh
args:
- -c
- mv pas-stemcell/pas-stemcell/* wellplaced-stemcell/
outputs:
- name: wellplaced-stemcell
and then in the subsequent task, i replaced the input_mapping
-> stemcell
to the output wellplaced-stemcell
.
i realized a more elegant solution is to consider revising the blobstore resource's implementation to match whatever the s3 resource currently does.
i'm not yet familiar enough with go and with this project to contribute a PR just yet. but in case this is easily captured via a couple of unit tests and easily fixed, at least i could put this on your radar.
thanks in advance for your consideration.
Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml
in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.
Steps:
config-production.yml
fileIf there are any questions, please reach out to [email protected].
It seems to be possible to get a context deadline exceeded on very large blobs, being able to increase the timeout seems to be a possible mitigation for this issue.
Hi there,
We're trying to use this resource with Azure China and it fails to check the resource because the base URL of the blobstore is hardcoded and by default is blob.core.windows.net
.
In Azure China, this URL should be blob.core.windows.cn
.
There should be a way to provide an input to define which Azure you want to use (defaulting to AzureCloud) like in this resource: https://github.com/pivotal-cloudops/azure-blobstore-concourse-resource
The error you'd get by using this resource against Azure China Cloud is:
resource script '/opt/resource/check []' failed: exit status 1
stderr:
2018/11/05 08:21:26 failed to get latest version: Get https://BLOBSTORENAME.blob.core.windows.net/XXXXXXXX?comp=list&include=snapshots&prefix=FILE.NAME&restype=container: dial tcp: lookup BLOBSTORENAME.blob.core.windows.net on 168.63.129.16:53: no such host
Thanks for your help.
CC: @lakshmantgld @keliangneu
Was using this with tag "latest" at a customer and all pipelines started failing because they use this resource with unpack: true
for Platform Automation image and tasks.
Reverted to 0.5.0 and it works again.
Running multiple uploads from multiple jobs running in parallel sometimes gives the following error.
We did set the block_size of 100Mb. Not sure if it is related to this StackOverflow - The specified block list is invalid )
2020/11/05 14:58:06 failed to upload blob: -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /go/pkg/mod/github.com/!azure/[email protected]/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=InvalidBlockList) =====
Description=The specified block list is invalid.
RequestId:8fdbe99d-d01e-0004-2484-b3c323000000
Time:2020-11-05T14:58:06.4775739Z, Details:
Code: InvalidBlockList
PUT https://xxxxx.blob.core.windows.net/product-tiles/stemcells/[stemcells-ubuntu-xenial,621.90]bosh-stemcell-621.90-azure-hyperv-ubuntu-xenial-go_agent.tgz?comp=blocklist&timeout=61
Authorization: REDACTED
Content-Length: [653]
Content-Type: [application/xml]
User-Agent: [Azure-Storage/0.7 (go1.14.2; linux)]
X-Ms-Blob-Cache-Control: []
X-Ms-Blob-Content-Disposition: []
X-Ms-Blob-Content-Encoding: []
X-Ms-Blob-Content-Language: []
X-Ms-Blob-Content-Type: []
X-Ms-Client-Request-Id: [599d2426-0cbd-4acb-60e5-2d8db6fecfb2]
X-Ms-Date: [Thu, 05 Nov 2020 14:58:06 GMT]
X-Ms-Version: [2018-11-09]
--------------------------------------------------------------------------------
RESPONSE Status: 400 The specified block list is invalid.
Content-Length: [221]
Content-Type: [application/xml]
Date: [Thu, 05 Nov 2020 14:58:05 GMT]
Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]
X-Ms-Error-Code: [InvalidBlockList]
X-Ms-Request-Id: [8fdbe99d-d01e-0004-2484-b3c323000000]
X-Ms-Version: [2018-11-09]
Hi Team,
I am trying to use the azure-blobstore-resource image on the arm64 platform but it seems the arm64 tag is not available for this image.
I have built the image successfully in the local arm64 machine.
Do you have any plans in releasing the arm64 image?
It will be very helpful if the arm64 supported tag is available. If interested, I can raise a PR.
Hey there; I'm trying to use the azure-blobstore-resource within a concourse pipeline and when the pipeline uploads files over 4MB I get this error:
2018/08/13 21:28:07 failed to upload blob: storage: service returned error: StatusCode=413, ErrorCode=RequestBodyTooLarge, ErrorMessage=The request body is too large and exceeds the maximum permissible limit.
RequestId:50459769-801e-0017-184c-33f80a000000
Time:2018-08-13T21:28:06.9913482Z, RequestInitiated=Mon, 13 Aug 2018 21:28:06 GMT, RequestId=50459769-801e-0017-184c-33f80a000000, API Version=2016-05-31, QueryParameterName=, QueryParameterValue=
Confirmed that I'm able to upload smaller files that are smaller than 4MB. It looks like there's some chunking that needs to happen to allow for files larger than 4MB.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.