Comments (20)
Hello @frankyn
Are you able to consistently able reproduce this issue?
No it's rare in fact, 3 or 2 times in total
Could you provide more context on your Dataflow pipeline and the shortest code snippet so we can reproduce the issue?
This issue is random, Trying to reproduce it in my case would be runninng a pipeline that reads millions of rows in bigquery and grouping them to create strings. after that i store them in a bucket. I used the native storage library to do that instead of storage sink IO in dataflow, The compose request is used to build files with sum of chars that exceeds java array size limitation [2^32].
this code shows how it's done basically :
- create the target file
- loop to create a temp file
- compose the temp file with target file
- delete temp file
- repeat
`
Public void processElement(@element KV<String, List> element) {
String bucketName = bucket.get();
String eventsPath = element.getKey() + "/" + date.get() + "/events.csv";
BlobId blobId = BlobId.of(bucketName, eventsPath);
String tempPath = element.getKey() + "/" + date.get() + "/temp_events.csv";
BlobId blobIdTemp = BlobId.of(bucketName, tempPath);
BlobInfo targetBlob = deleteGcsFile(bucketName, eventsPath, blobId);
createGcsFile(bucketName, eventsPath);
for (String chunk : element.getValue()) {
STORAGE.create(
BlobInfo
.newBuilder(BlobId.of(bucketName, tempPath))
.setContentType("text/plain")
.build(),
chunk.getBytes());
ComposeRequest request =
ComposeRequest.newBuilder()
.setTarget(targetBlob)
.addSource(eventsPath)
.addSource(tempPath)
.build();
STORAGE.compose(request);
STORAGE.delete(blobIdTemp);
}
}
}
`
Is the code using a unique path for every new object created before performing a compose operation?
yes exactly
Is there a delete operation in your pipeline?
Yes
The error
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "No such object: my target file",
"reason" : "notFound"
} ],
"message" : "No such object: my target file"
}
from java-storage.
Hi Swati,
Can you provide a brief repro snippet?
from java-storage.
Below are the code snippets:
How we create storage client using Service Account Key
masterStorageClient = StorageOptions.newBuilder().setCredentials(credentials).setProjectId(GCPMasterProjectId).build().getService();
How we try to get blob
public String getBlob(String pathWithObjectKey, String bucketName, Storage storageClient) throws
StorageException {
logger.info("Retreiving the object with path : " + pathWithObjectKey);
Blob blob = null;
BlobId blobId = BlobId.of(bucketName, pathWithObjectKey);
String blobContent = null;
try {
blob = storageClient.get(blobId);
blobContent = new String(blob.getContent(), Charset.forName("UTF-8"));
} catch (StorageException e) {
logger.error("Error in fetching the blob object " + e);
throw e;
}
return blobContent;
}
The exception occurs on blob.getContent().
from java-storage.
That seems to be the correct way to retrieve an object. I just noticed that this error happens intermittently. Can you provide more information to help debug? Thanks!
- How intermittent it is?
- Is the error that occurs at
blob.getContent()
always aNOT_FOUND
or would it beNullPointerException
or other type of errors in any case? - Is there another client reading from or writing to the same bucket/blob?
from java-storage.
- As per our observation, it happens for the first time after a long gap (of may be 24 hours). After it fails, if we make the call again, it works fine.
- Also, what we observed is that the exception occurs when for the first time we do following operations one after the other as part of our use-case in a matter of few seconds-
- create the blob
- update the blob
- fetch the blob again
- update it again
-
Yes, the error occurs at
blob.getContent()
as is evident in the logs as well. -
When a blob exists in the bucket, we never came across any
NullPointerException
. -
We did not come across any other exception apart from the one that we have reported already.
-
No other client is trying to read/update/delete the blob/bucket.
from java-storage.
So is the error happening in 3. fetch the blob again? Can you please describe what is updated during step 2? storage.update()
can only update the metadata of a blob, but if the operation isn't finished on the server side yet, fetching the blob again might cause a 404.
from java-storage.
Hi @hzyi-google
As mentioned by @swatijain1101 , by "updating the blob", we mean that we override the content of the blob by calling storage.create()
and not storage.update()
. So, we are not updating the metadata of a blob, instead, we call storage.create()
on the same blob and then call the storage.get()
. This is where (step 3) as mentioned above, we get the 404.
from java-storage.
Hi @himanshujain18 ,
I don't see a storage.put()
method in the current version of google-cloud-storage
. Which version is the client library your are using?
from java-storage.
Hi @hzyi-google , I am sorry for the confusion. I mistakenly mentioned storage.put()
method above. Its actually storage.create()
method that we are using , which is for uploading a blob in the bucket. I have updated my earlier comment.
from java-storage.
Hi @himanshujain18 ,
Thanks for the update. There are a few things I can think of that might be causing the 404:
- Please make sure you are not using an outdated
blob
when callingblob.getContent()
. i.e.,
Blob blob = storage.create(blobInfo);
Blob updatedBlob = storage.create(blobInfo, byteArray);
blob.getContent(); // this would cause a 404
updatedBlob.getContent(); // the right way
- If the size of the contents is quite large, consider using
blob.writer()
instead ofstorage.create()
for updating.
Edit: the original link was broken
from java-storage.
Hi @hzyi-google ,
Thanks for clarifying above. But in our case we are not accessing an outdated blob. Once, the two steps are done as mentioned by you -
Blob blob = storage.create(blobInfo);
Blob updatedBlob = storage.create(blobInfo, byteArray);
We perform storage.get()
and then perform blob.getContent()
to get the content of the blob. So, the flow is like this -
Blob blob = storage.create(blobInfo);
Blob updatedBlob = storage.create(blobInfo, byteArray);
Blob fetchedBlob = storage.get(blobId);
// Here blobId is the blob id of the updatedBlob
fetchedBlob.getContent(); // we got 404 as mentioned in the discussion above
from java-storage.
Hi @himanshujain18 ,
In this case can you try changing the second create
to use blob.writer()
? I'll check with the server side if that does not help. Thanks!
from java-storage.
Hi @himanshujain18,
Is it possible that you have more than one process simultaneously operating on this object? If you were to call storage.get(), and then a second process deleted the object, and then the first process called getContent(), that first process could get a 404.
from java-storage.
Is this still happening? We haven't been able to reproduce this on our side with a single client doing the create and then get.
If it is still happening, feel free to reopen.
from java-storage.
hello @hzyi-google @chingor13
i had the same issue recently
what i did is create a file with this :
`
STORAGE.create(
BlobInfo
.newBuilder(BlobId.of(bucketName, tempPath))
.setContentType("text/plain")
.build(),
chunk.getBytes());
`
and i did a request compose like this
ComposeRequest request = ComposeRequest.newBuilder() .setTarget(targetBlob) .addSource(eventsPath) .addSource(tempPath) .build(); STORAGE.compose(request);
this code runs on dataflow job that process > 80K files, i got the 404 error on only 1 file .
but apparently there is some retry process so it did write my file in the end.
Do you have any explanation why this can happen if the 2 codes are sequential ?
Thanks
from java-storage.
@JesseLovelace @frankyn: any help would be appreciated.
from java-storage.
Hi @samir- ,
Thanks for raising your question.
General questions:
- Are you able to consistently able reproduce this issue?
- Could you provide more context on your Dataflow pipeline and the shortest code snippet so we can reproduce the issue?
Based on what you've provided, it looks like you're concatenating a new blob located at tempPath
, each time.
- Is the code using a unique path for every new object created before performing a compose operation?
- Is there a delete operation in your pipeline?
STORAGE.create(BlobInfo
.newBuilder(BlobId.of(bucketName, tempPath))
.setContentType("text/plain")
.build(),
chunk.getBytes());
ComposeRequest request =
ComposeRequest.newBuilder()
.setTarget(targetBlob)
.addSource(eventsPath)
.addSource(tempPath)
.build();
STORAGE.compose(request);
from java-storage.
Hello @samir
Sorry for the long delay.
According to the storage consistency spec read-after-write should never cause 404 Not Found.
After 5 day running of the program similar to you I'm not able to reproduce your 404 .
One can only guess about the reasons which lead to the 404 error. Some of them:
- someone starts the same service on the same bucket (might be nightly build test)
- someone else who has an access to your bucket may delete/move/cleanup your data
- name collision in multithreading environment:
String eventsPath = element.getKey() + "/" + date.get() + "/events.csv";
String tempPath = element.getKey() + "/" + date.get() + "/temp_events.csv";
Such names might be not unique, I quite admit that another thread may perform operations on an object with the same name - your call:
createGcsFile(bucketName, eventsPath);
does not really create a blob. It could:- ignore throwing exceptions
- try create a blob in another thread
- by mistake do not create an object
There could be many other reasons, but I'm pretty sure your application is doing something wrong.
from java-storage.
I'm facing same issue. I'm using spring batch to do validations(partitioning) then write the files to GCS. If there are validation error files then merge those files into one consolidated file in next step. I'm getting below error though file exist in the folder. "com.google.cloud.storage.StorageException: Object 6_Error_permuploadinput.csv (generation: 0) not found." Any help is really appreciative.
request.setTarget(
BlobInfo.newBuilder(bucketName, filePath + VALIDATION_CONSOLIDATED_ERROR_FILE_PREFIX+fileName).build());
List filesToMerge = getFileNamesToMerge(jobExecution,fileKey,fileName);
for (String name : filesToMerge) {
request.addSource(name);
}
Blob compositeObject = gcsStorage.compose(request.build());
from java-storage.
@dmitry-fa - We identified the root cause. This is an application issue. Apologies for inconvenience.
from java-storage.
Related Issues (20)
- Retrying idempotent requests that return 503 backendError HOT 1
- com.google.cloud.storage.StorageException: java.lang.IllegalStateException: Connection pool shut down when trying to check if bucket exists HOT 5
- CVE-2023-33953 through io.grpc.* (1.56.1) HOT 3
- com.example.storage.QuickstartSampleIT: testQuickstartGrpc failed HOT 1
- com.example.storage.QuickstartSampleIT: testQuickstartGrpcDp failed HOT 1
- Google Cloud Storage library using deprecated javax package HOT 4
- Async / streaming support
- com.example.storage.bucket.PrintPubSubNotificationTest: testPrintBucketPubSubNotification failed HOT 2
- Warning: a recent release failed HOT 2
- Add support for Autoclass v2.1 HOT 1
- grpc compatibility with storage 2.27.x HOT 3
- Google Cloud Storage: unexpected exception after successfully writing to a Google Cloud Storage object HOT 2
- grpc compatibility issue 2.27.1 HOT 9
- GCP cloud storage downloaded file corruption HOT 4
- Improve Virtual Thread compatibility
- Unable to initiate resumable uploads from java sdk: Cloud storage v2 API. HOT 4
- Unable to Get or Refresh access token from Google Credentials instance in java sdk HOT 1
- Novel transient error introduced between 2.27.1 and 2.30.1? HOT 5
- Add Support for includeFolders option in ListObjects HOT 3
- Fix retry logic on SSL Connection Reset errors and Error writing request body to server HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from java-storage.