Giter Club home page Giter Club logo

Comments (20)

samir- avatar samir- commented on July 28, 2024 1

Hello @frankyn
Are you able to consistently able reproduce this issue?

No it's rare in fact, 3 or 2 times in total
Could you provide more context on your Dataflow pipeline and the shortest code snippet so we can reproduce the issue?

This issue is random, Trying to reproduce it in my case would be runninng a pipeline that reads millions of rows in bigquery and grouping them to create strings. after that i store them in a bucket. I used the native storage library to do that instead of storage sink IO in dataflow, The compose request is used to build files with sum of chars that exceeds java array size limitation [2^32].

this code shows how it's done basically :

  • create the target file
  • loop to create a temp file
  • compose the temp file with target file
  • delete temp file
  • repeat

`
Public void processElement(@element KV<String, List> element) {

                    String bucketName = bucket.get();
                    String eventsPath = element.getKey() + "/" + date.get() + "/events.csv";
                    BlobId blobId = BlobId.of(bucketName, eventsPath);
                    String tempPath = element.getKey() + "/" + date.get() + "/temp_events.csv";
                    BlobId blobIdTemp = BlobId.of(bucketName, tempPath);

                    BlobInfo targetBlob = deleteGcsFile(bucketName, eventsPath, blobId);
                    createGcsFile(bucketName, eventsPath);

                    for (String chunk : element.getValue()) {
                        STORAGE.create(
                                BlobInfo
                                        .newBuilder(BlobId.of(bucketName, tempPath))
                                        .setContentType("text/plain")
                                        .build(),
                                chunk.getBytes());

                        ComposeRequest request =
                                ComposeRequest.newBuilder()
                                        .setTarget(targetBlob)
                                        .addSource(eventsPath)
                                        .addSource(tempPath)
                                        .build();
                        STORAGE.compose(request);
                        STORAGE.delete(blobIdTemp);
                    }
                }

}
`

Is the code using a unique path for every new object created before performing a compose operation?

yes exactly

Is there a delete operation in your pipeline?

Yes

The error

Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "No such object: my target file",
"reason" : "notFound"
} ],
"message" : "No such object: my target file"
}

from java-storage.

yihanzhen avatar yihanzhen commented on July 28, 2024

Hi Swati,

Can you provide a brief repro snippet?

from java-storage.

himanshujain18 avatar himanshujain18 commented on July 28, 2024

Below are the code snippets:

How we create storage client using Service Account Key

masterStorageClient = StorageOptions.newBuilder().setCredentials(credentials).setProjectId(GCPMasterProjectId).build().getService();

How we try to get blob

    public String getBlob(String pathWithObjectKey, String bucketName, Storage storageClient) throws 
    StorageException {
		logger.info("Retreiving the object with path : " + pathWithObjectKey);
		Blob blob = null;
		BlobId blobId = BlobId.of(bucketName, pathWithObjectKey);
		String blobContent = null;
		try {
			blob = storageClient.get(blobId);
			blobContent = new String(blob.getContent(), Charset.forName("UTF-8"));

		} catch (StorageException e) {
			logger.error("Error in fetching the blob object " + e);
			throw e;
		}
		return blobContent;
	}

The exception occurs on blob.getContent().

from java-storage.

yihanzhen avatar yihanzhen commented on July 28, 2024

That seems to be the correct way to retrieve an object. I just noticed that this error happens intermittently. Can you provide more information to help debug? Thanks!

  • How intermittent it is?
  • Is the error that occurs at blob.getContent() always a NOT_FOUND or would it be NullPointerException or other type of errors in any case?
  • Is there another client reading from or writing to the same bucket/blob?

from java-storage.

swatijain1101 avatar swatijain1101 commented on July 28, 2024
  • As per our observation, it happens for the first time after a long gap (of may be 24 hours). After it fails, if we make the call again, it works fine.
  • Also, what we observed is that the exception occurs when for the first time we do following operations one after the other as part of our use-case in a matter of few seconds-
  1. create the blob
  2. update the blob
  3. fetch the blob again
  4. update it again
  • Yes, the error occurs at blob.getContent() as is evident in the logs as well.

  • When a blob exists in the bucket, we never came across any NullPointerException.

  • We did not come across any other exception apart from the one that we have reported already.

  • No other client is trying to read/update/delete the blob/bucket.

from java-storage.

yihanzhen avatar yihanzhen commented on July 28, 2024

So is the error happening in 3. fetch the blob again? Can you please describe what is updated during step 2? storage.update() can only update the metadata of a blob, but if the operation isn't finished on the server side yet, fetching the blob again might cause a 404.

from java-storage.

himanshujain18 avatar himanshujain18 commented on July 28, 2024

Hi @hzyi-google
As mentioned by @swatijain1101 , by "updating the blob", we mean that we override the content of the blob by calling storage.create() and not storage.update(). So, we are not updating the metadata of a blob, instead, we call storage.create() on the same blob and then call the storage.get() . This is where (step 3) as mentioned above, we get the 404.

from java-storage.

yihanzhen avatar yihanzhen commented on July 28, 2024

Hi @himanshujain18 ,
I don't see a storage.put() method in the current version of google-cloud-storage. Which version is the client library your are using?

from java-storage.

himanshujain18 avatar himanshujain18 commented on July 28, 2024

Hi @hzyi-google , I am sorry for the confusion. I mistakenly mentioned storage.put() method above. Its actually storage.create() method that we are using , which is for uploading a blob in the bucket. I have updated my earlier comment.

from java-storage.

yihanzhen avatar yihanzhen commented on July 28, 2024

Hi @himanshujain18 ,

Thanks for the update. There are a few things I can think of that might be causing the 404:

  • Please make sure you are not using an outdated blob when calling blob.getContent(). i.e.,
Blob blob = storage.create(blobInfo);
Blob updatedBlob = storage.create(blobInfo, byteArray);
blob.getContent();  // this would cause a 404
updatedBlob.getContent();  // the right way
  • If the size of the contents is quite large, consider using blob.writer() instead of storage.create() for updating.

Edit: the original link was broken

from java-storage.

himanshujain18 avatar himanshujain18 commented on July 28, 2024

Hi @hzyi-google ,
Thanks for clarifying above. But in our case we are not accessing an outdated blob. Once, the two steps are done as mentioned by you -

Blob blob = storage.create(blobInfo);
Blob updatedBlob = storage.create(blobInfo, byteArray);

We perform storage.get() and then perform blob.getContent() to get the content of the blob. So, the flow is like this -

Blob blob = storage.create(blobInfo);
Blob updatedBlob = storage.create(blobInfo, byteArray);
Blob fetchedBlob = storage.get(blobId);
// Here blobId is the blob id of the updatedBlob
fetchedBlob.getContent(); // we got 404 as mentioned in the discussion above

from java-storage.

yihanzhen avatar yihanzhen commented on July 28, 2024

Hi @himanshujain18 ,

In this case can you try changing the second create to use blob.writer()? I'll check with the server side if that does not help. Thanks!

from java-storage.

BrandonY avatar BrandonY commented on July 28, 2024

Hi @himanshujain18,

Is it possible that you have more than one process simultaneously operating on this object? If you were to call storage.get(), and then a second process deleted the object, and then the first process called getContent(), that first process could get a 404.

from java-storage.

chingor13 avatar chingor13 commented on July 28, 2024

Is this still happening? We haven't been able to reproduce this on our side with a single client doing the create and then get.

If it is still happening, feel free to reopen.

from java-storage.

samir- avatar samir- commented on July 28, 2024

hello @hzyi-google @chingor13
i had the same issue recently
what i did is create a file with this :

`
STORAGE.create(
BlobInfo
.newBuilder(BlobId.of(bucketName, tempPath))
.setContentType("text/plain")
.build(),
chunk.getBytes());

`
and i did a request compose like this

ComposeRequest request = ComposeRequest.newBuilder() .setTarget(targetBlob) .addSource(eventsPath) .addSource(tempPath) .build(); STORAGE.compose(request);
this code runs on dataflow job that process > 80K files, i got the 404 error on only 1 file .
but apparently there is some retry process so it did write my file in the end.
Do you have any explanation why this can happen if the 2 codes are sequential ?
Thanks

from java-storage.

sduskis avatar sduskis commented on July 28, 2024

@JesseLovelace @frankyn: any help would be appreciated.

from java-storage.

frankyn avatar frankyn commented on July 28, 2024

Hi @samir- ,

Thanks for raising your question.

General questions:

  1. Are you able to consistently able reproduce this issue?
  2. Could you provide more context on your Dataflow pipeline and the shortest code snippet so we can reproduce the issue?

Based on what you've provided, it looks like you're concatenating a new blob located at tempPath, each time.

  1. Is the code using a unique path for every new object created before performing a compose operation?
  2. Is there a delete operation in your pipeline?
STORAGE.create(BlobInfo
        .newBuilder(BlobId.of(bucketName, tempPath))
        .setContentType("text/plain")
        .build(),
        chunk.getBytes());

ComposeRequest request =
         ComposeRequest.newBuilder()
        .setTarget(targetBlob)
        .addSource(eventsPath)
        .addSource(tempPath)
        .build();

STORAGE.compose(request);

from java-storage.

dmitry-fa avatar dmitry-fa commented on July 28, 2024

Hello @samir

Sorry for the long delay.

According to the storage consistency spec read-after-write should never cause 404 Not Found.
After 5 day running of the program similar to you I'm not able to reproduce your 404 .
One can only guess about the reasons which lead to the 404 error. Some of them:

  • someone starts the same service on the same bucket (might be nightly build test)
  • someone else who has an access to your bucket may delete/move/cleanup your data
  • name collision in multithreading environment:
    String eventsPath = element.getKey() + "/" + date.get() + "/events.csv";
    String tempPath = element.getKey() + "/" + date.get() + "/temp_events.csv";
    Such names might be not unique, I quite admit that another thread may perform operations on an object with the same name
  • your call: createGcsFile(bucketName, eventsPath); does not really create a blob. It could:
    • ignore throwing exceptions
    • try create a blob in another thread
    • by mistake do not create an object

There could be many other reasons, but I'm pretty sure your application is doing something wrong.

from java-storage.

vali7394 avatar vali7394 commented on July 28, 2024

I'm facing same issue. I'm using spring batch to do validations(partitioning) then write the files to GCS. If there are validation error files then merge those files into one consolidated file in next step. I'm getting below error though file exist in the folder. "com.google.cloud.storage.StorageException: Object 6_Error_permuploadinput.csv (generation: 0) not found." Any help is really appreciative.

request.setTarget(
BlobInfo.newBuilder(bucketName, filePath + VALIDATION_CONSOLIDATED_ERROR_FILE_PREFIX+fileName).build());
List filesToMerge = getFileNamesToMerge(jobExecution,fileKey,fileName);
for (String name : filesToMerge) {
request.addSource(name);
}
Blob compositeObject = gcsStorage.compose(request.build());

from java-storage.

vali7394 avatar vali7394 commented on July 28, 2024

@dmitry-fa - We identified the root cause. This is an application issue. Apologies for inconvenience.

from java-storage.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.