Comments (10)
I have found something interesting there.
If add doOnRequest(n -> log.info("Requested {}", n));
to the returned Flux, I can observe that by default MonoSendMany
in reactor requests 1 and then 127 chunks.
CRT client seems to use 8MB chunks and as a Publisher it will push (1+127)*8=1024Mb data to the subscriber.
@GetMapping(path="/{filekey}")
public Mono<ResponseEntity<Flux<ByteBuffer>>> downloadFile(@PathVariable("filekey") String filekey) {
GetObjectRequest request = GetObjectRequest.builder()
.bucket(s3config.getBucket())
.key(filekey)
.build();
return Mono.fromFuture(s3client.getObject(request, AsyncResponseTransformer.toPublisher()))
.map(response -> {
checkResult(response.response());
String filename = getMetadataItem(response.response(),"filename",filekey);
return ResponseEntity.ok()
.header(HttpHeaders.CONTENT_TYPE, response.response().contentType())
.header(HttpHeaders.CONTENT_LENGTH, Long.toString(response.response().contentLength()))
.header(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=\"" + filename + "\"")
.body(Flux.from(response));
});
}
I tried to add limit(3)
to the returned Flux and CRT publisher does respect it, however limit()
is not what can be used here as it basically ends the stream.
As I said before I also tried limitRate(2)
and that one is not respected.
Then I tried to wrap the Flux into MyFlux which wraps Subscriber into MySubscriber which wraps Subscription into MySubscription in order to override requested number.
public class MySubscription implements Subscription {
private final Subscription original; //+ constructor
@Override
public void request(long n) {
return original.request(2);
}
}
And that does override requested 127 to 2, however it seems like it completely breaks MonoSendMany logic and I think it stops publishing anything to the channel/consumer. Consumer blocks (remember I am using blocking IO for consumer, so it's completely stuck).
Let me check what are s3-netty chunk sizes and how it handles request 127.
from aws-sdk-java-v2.
Ok, s3-netty uses 8KB chunks instead of 8MB chunks which really helps here.
So, here is what I think could be a solution:
- Make reactor-netty request less than 128/64(refill), but that's hardcoded in
static final
fields of MonoSend (the class constants are defined, but they are used in MonoSendMany). - Make CRT client use smaller download chunks, there is an option for upload chunk sizes, but I am not sure there is one for download..
- Figure out how to make limitRate() work, but CRT doesn't respect it and MonoSendMany may not like it either due to internal calculations based on the 128/64 MAX_SIZE/REFILL_SIZE.
Please advice.
from aws-sdk-java-v2.
Alright, I just tried
S3AsyncClient.crtBuilder()
.minimumPartSizeInBytes(1L * 1024 * 1024) // 1 MB
And even though parameter name is "uploadPartSize" and javadoc refers to Amazon Multipart Upload Limits, it does control both download and upload chunk sizes.
So, the javadoc is fine except the parameter name.
From here, I will let you review this issue to check if you want to update javadoc/param names a little bit, or even split them into download and upload chunk sizes.
Also I will wait for an answer on limitRate()
.
from aws-sdk-java-v2.
Hi @VadimKirilchuk, AWS CRT-based S3 client utilizes S3 multipart upload and ranged GET (GetObject request with ranged header) to achieve parallel transfers. The reason why the default minimumPartSizeInBytes
is high is because S3 requires part size to be at least 5MB. It works for download because ranged GET doesn't have this limitation. Agreed that we should update our documentation to clarify that.
When you invoke GetObject, under the hood, AWS-CRT S3 client will initially downloads a few parts and buffer them in memory (buffer size can be configured via https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3CrtAsyncClientBuilder.html#initialReadBufferSizeInBytes(java.lang.Long)) and after that, the CRT will send more data only after the existing parts have been consumed. Maybe you can try lowering initialReadBufferSizeInBytes
?
from aws-sdk-java-v2.
Hi @zoewangg,
We can definitely play with that setting and this may reduce CRT read buffer, but the problem between reactor-netty and s3 client is that reactor-netty requests 128 part sizes to be pushed from s3 client to reactor-netty. Since the part size is 8MB by default it is equal to 1024MB per single GetObject request.
We reduced the part size to 1MB and overall behavior is more stable now as it will push 128*1=128MB at max until waiting for next request(REFILL_SIZE)
.
So, is there a way to limitRate on the AsyncResponseTransformer publisher instead of changing the part size? I think the answer is no, but maybe I miss something.
Sounds like I should raise another request for reactor-netty as they have these MAX_SIZE/REFILL_SIZE hardcoded which doesn't play nicely with huge chunks.
from aws-sdk-java-v2.
So, is there a way to limitRate on the AsyncResponseTransformer publisher instead of changing the part size? I think the answer is no, but maybe I miss something.
Right, not for AWS-CRT based S3 client. Currently, the chunk size is the same as the part size, so it can only be configured by part size. There's is a plan on CRT side to support streaming with smaller chunks, but we don't have a timeline at the moment.
from aws-sdk-java-v2.
Hi, just posting an update on this one. We haven't experimented with initialReadBufferSizeInBytes
yet.
However, since we reduced part size, we transitively reduced the initialReadBufferSizeInBytes
.
8MB => 80MB initial buffer
1MB => 10MB initial buffer
I am actually thinking about increasing it back to 80MB, or even more.
Reactor-netty was requesting too many parts/chunks, so we fixed that by reducing part size, but as a result there is less buffering/caching now, I guess we can improve that by increasing buffering/caching on s3-crt side using bigger initialReadBufferSizeInBytes
.
Basically, by playing with both initialReadBufferSizeInBytes
and minimumPartSizeInBytes
we can achieve a desired result.
I am also still behind on opening an issue against reactor-netty in regards to hardcoded MAX_SIZE/REFILL_SIZE.
from aws-sdk-java-v2.
I have finally submitted a feature request on the reactor-netty side, let's see what they say.
Reactor-netty feature request
from aws-sdk-java-v2.
Upd. I will be working on a PR for reactor-netty to make request
configurable by property, gonna take some time. I am not sure if we want to close this one since there is no big issue with AWS SDK other than minor javadoc glitch.
from aws-sdk-java-v2.
@zoewangg good news, my pull request on the reactor-netty side was merged, once it's released it will be possible to keep default part size and simply reduce the prefetch size on the reactor-netty side by using: reactor.netty.send.maxPrefetchSize
set to lower values (default is 128).
For instance a value of 32
would make reactor-netty request AWS SDK to push up to 32 * 8MB = 256MB
into reactor-netty buffers per request.
Once again, I think this issue can be closed unless you want to proceed and update some javadocs or any other docs.
from aws-sdk-java-v2.
Related Issues (20)
- Cognito: support for partial UpdateUserPoolClient calls HOT 1
- finalizeMultipartUpload is throwing 404 error HOT 3
- SDK Clients are `Closeable` and documented as thread safe but don't provide an `isClosed()` method HOT 1
- S3Async client fail to upload files (Service: S3, Status Code: 400, Request ID: null) HOT 1
- Getting occasional Crc32MismatchException using DynamoDbClient HOT 5
- Thread locals are lost after the control is returned from SDK methods to the client code HOT 1
- Parse QueryParams when building S3Uri from AWS CLI Style format HOT 3
- InstanceProfileCredentialsProvider unable to refresh/recover from network problems at time of credential cache refresh HOT 2
- Stuck on `java.util.concurrent.Future#get(long, java.util.concurrent.TimeUnit)` when using `java.util.concurrent.Executors#newVirtualThreadPerTaskExecutor` HOT 2
- Equivalent of `com.amazonaws.util.json.Jackson` in sdk v2 HOT 5
- Regression between SDK v1 and v2 around client side encryption HOT 2
- Version 2.25.26 Could not parse POM HOT 1
- S3 Transfer Manager created using S3Aysnc Client with CRT Builder holding significant memory even after downloads are completed
- Setting metadata and retention with UploadDirectoryRequest
- Don't download 200MB of JS for a JavaDoc search feature that most people don't use HOT 1
- Unshaded dependencies in AWS SDK HOT 9
- S3 Presign Head method HOT 2
- S3 CRT Client GetObject Request with Invalid Range crashes JVM with SIGSEGV HOT 1
- ApiName does not implement equals HOT 2
- assumeRoleWithWebIdentity doesn't work from java API but does work from the cli interface HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-sdk-java-v2.