Comments (13)
Looking to a memory dump it looks the heap is full of byte[]
from wave.
Adding a link to the heap dump for further inspection
https://drive.google.com/file/d/171fDdrrBNqta_anLoewiCa2NO48ZD8wo/view?usp=sharing
@jorgeaguileraseqera can you please have a look to it?
update: for the records the layer.tar.gzip
file is 33170190
bytes. Not sure it's related to the 33'039'969
reported above
# ls -la pack/layers/
total 126380
drwxr-xr-x 2 root root 4096 Jan 1 1970 .
drwxr-xr-x 3 root root 4096 Jan 1 1970 ..
-rw-r--r-- 1 root root 395 Apr 4 05:45 layer.json
-rw-r--r-- 1 root root 96225280 Apr 4 05:45 layer.tar
-rw-r--r-- 1 root root 33170190 Apr 4 05:45 layer.tar.gzip
from wave.
Using another tool, there are more interesting info
Look at 22 duplicates of DigestByteArray
I'm also concerned by this
from wave.
it's strange, but yes, seems DigestByteArray is duplicated a lot
I'll run main branch using the profile and try to identify why
from wave.
Not it's the reason, but up to 0.4.8 there was a memoised to this method
from wave.
It improved the cache retentions, but it's still crashing. It looks like now the problem is the Flux streaming
from wave.
Thinking more about this, maybe a different approach to prevent the allocation of a lot byte[]
in the heap consistent using a MappedByteBuffer
that
// this should be a temporary file to be deleted when the stream is transfer is complete
def file = new RandomAccessFile("temp.file", "rw")
def ch = file.getChannel()
// the `for` represents the reading from the body inputstream
for( int i=0; i<100; i++ ) {
// this represent some data read by the body input stream
def rnd = ((i %10).toString() * 40) + '\n'
def bytes = rnd.bytes
// save to the file
ch.write(ByteBuffer.wrap(bytes) )
// read the same region via a mapped bytebuffer
MappedByteBuffer buf = ch.map(FileChannel.MapMode.READ_ONLY, bytes.size() *i, bytes.size())
// the print is just for the sake of this example
// it should be replaced via the flowable.emit
print (new ByteBufferInputStream(buf).text)
}
file.close()
from wave.
For #28 I'm implementing 2 different (configurable) kinds of storage: S3 and File
maybe we can include in this issue the FileStorage and store blobs in a local directory (we can attach a volume) and avoid the use of large byte[] in memory
from wave.
Nope. It's different problem. Here we should use file mapped buffer because I think it's the only way to scale the download with exhausting the heap memory, but it only works with local files. Therefore cannot be used for S3 store.
As nice side effect, this should make easier parallel download using range as you were suggesting
from wave.
are these screenshots meanwhile we have several requests downloaded at the same time?
because I'm inspecting the application and the number of Flux objects no increased over time. Maybe we crash when we are trying to serve some images at the same time? in this situation, if I understand your idea of MappedByte, we need firstly to download the remote image to a temp file and serve it, right ?
from wave.
Maybe we crash when we are trying to serve some images at the same time?
very likely
if I understand your idea of MappedByte, we need firstly to download the remote image to a temp file and serve it, right?
nearly, the idea is to write the stream to a file and at the same time send the chunks out.
In the snipped above the println
should be replaced with flux emit
The thing that it should be solved how to delete the temp file when the flux download is completed. is there a callback that would allow to do that?
from wave.
For the record:
after testing the ByteBuffer approach we've realized when the blob is huge the download is broken.
I've tested using the StreamedFile from Micronaut with 10 concurrent clients requesting different images and the performance seems fine and memory stable.
this docker-compose can be useful for #45 :
services:
a:
image: jpetazzo/dind
privileged: true
environment:
- PORT=1234
volumes:
- .:/benchmark
b:
image: jpetazzo/dind
privileged: true
environment:
- PORT=1234
volumes:
- .:/benchmark
c:
image: jpetazzo/dind
privileged: true
environment:
- PORT=1234
volumes:
- .:/benchmark
d:
image: jpetazzo/dind
privileged: true
environment:
- PORT=1234
volumes:
- .:/benchmark
e:
image: jpetazzo/dind
privileged: true
environment:
- PORT=1234
volumes:
- .:/benchmark
f:
image: jpetazzo/dind
privileged: true
environment:
- PORT=1234
volumes:
- .:/benchmark
g:
image: jpetazzo/dind
privileged: true
environment:
- PORT=1234
volumes:
- .:/benchmark
h:
image: jpetazzo/dind
privileged: true
environment:
- PORT=1234
volumes:
- .:/benchmark
after wakeup the stack you can run commands as
docker-compose exec a sh -c "docker pull c10c-92-185-192-73.ngrok.io/quay.io/team-helium/miner:miner-amd64_2022.03.23.1_GA"
from wave.
Finally, this looks solved. No more duplicate objects
from wave.
Related Issues (20)
- Document how metrics counter works
- Aggregate metrics
- Use K8s Job instead of pod for Blob cache transfer
- Detection and deletion of corrupted uploads HOT 2
- Improve error message when augmenting with a sha256 tag HOT 1
- Use buildkit for container image builds HOT 4
- Wave container builds failing due to private ECR path modifications HOT 5
- Blob Upload Stress Test HOT 1
- Cleanup of complete blob upload jobs HOT 6
- TypeSpec API definition
- Add Trivy SBOM scan for wave build containers HOT 1
- Decouple build status from persistence service HOT 1
- Prevent the use of Community registry with containerImage attribute HOT 1
- Add breakdown per org on stats metrics when a date is specified
- Keep the ASCII color codes of k8s pod logs
- image name strategy is not working as expected
- Update accesstoken when refreshing jwt token HOT 1
- Build page reports invalid error status while the build is running HOT 1
- Add Troubleshoot guide
- Use Pixi to resolve Conda environment
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wave.