Giter Club home page Giter Club logo

Comments (13)

pditommaso avatar pditommaso commented on June 11, 2024

Looking to a memory dump it looks the heap is full of byte[]

Screenshot 2022-04-04 at 08 35 41

Screenshot 2022-04-04 at 08 35 49

from wave.

pditommaso avatar pditommaso commented on June 11, 2024

Adding a link to the heap dump for further inspection

https://drive.google.com/file/d/171fDdrrBNqta_anLoewiCa2NO48ZD8wo/view?usp=sharing

@jorgeaguileraseqera can you please have a look to it?

update: for the records the layer.tar.gzip file is 33170190 bytes. Not sure it's related to the 33'039'969 reported above

# ls -la pack/layers/
total 126380
drwxr-xr-x 2 root root     4096 Jan  1  1970 .
drwxr-xr-x 3 root root     4096 Jan  1  1970 ..
-rw-r--r-- 1 root root      395 Apr  4 05:45 layer.json
-rw-r--r-- 1 root root 96225280 Apr  4 05:45 layer.tar
-rw-r--r-- 1 root root 33170190 Apr  4 05:45 layer.tar.gzip

from wave.

pditommaso avatar pditommaso commented on June 11, 2024

Using another tool, there are more interesting info
Screenshot 2022-04-04 at 10 07 39
Screenshot 2022-04-04 at 10 09 30

Look at 22 duplicates of DigestByteArray

I'm also concerned by this

from wave.

jorgeaguileraseqera avatar jorgeaguileraseqera commented on June 11, 2024

it's strange, but yes, seems DigestByteArray is duplicated a lot

I'll run main branch using the profile and try to identify why

imagen

from wave.

pditommaso avatar pditommaso commented on June 11, 2024

Not it's the reason, but up to 0.4.8 there was a memoised to this method

http://github.com/seqeralabs/tower-reg/blob/a1b3430307fe9402c9ccccff5cba79affd906e0c/src/main/groovy/io/seqera/docker/ContainerScanner.groovy#L203-L203

from wave.

pditommaso avatar pditommaso commented on June 11, 2024

It improved the cache retentions, but it's still crashing. It looks like now the problem is the Flux streaming

Screenshot 2022-04-04 at 21 02 00

Screenshot 2022-04-04 at 21 03 10

Screenshot 2022-04-04 at 21 04 24

Screenshot 2022-04-04 at 21 05 10

from wave.

pditommaso avatar pditommaso commented on June 11, 2024

Thinking more about this, maybe a different approach to prevent the allocation of a lot byte[] in the heap consistent using a MappedByteBuffer that

        // this should be a temporary file to be deleted when the stream is transfer is complete
        def file = new RandomAccessFile("temp.file", "rw")
        def ch = file.getChannel()

        // the `for` represents the reading from the body inputstream 
        for( int i=0; i<100; i++ ) {
            // this represent some data read by the body input stream 
            def rnd = ((i %10).toString() * 40) + '\n'
            def bytes = rnd.bytes
            // save to the file
            ch.write(ByteBuffer.wrap(bytes) )
            // read the same region via a mapped bytebuffer 
            MappedByteBuffer buf = ch.map(FileChannel.MapMode.READ_ONLY, bytes.size() *i, bytes.size())
            // the print is just for the sake of this example 
            // it should be replaced via the flowable.emit 
            print (new ByteBufferInputStream(buf).text)
        }

        file.close()

from wave.

jorgeaguileraseqera avatar jorgeaguileraseqera commented on June 11, 2024

For #28 I'm implementing 2 different (configurable) kinds of storage: S3 and File

maybe we can include in this issue the FileStorage and store blobs in a local directory (we can attach a volume) and avoid the use of large byte[] in memory

from wave.

pditommaso avatar pditommaso commented on June 11, 2024

Nope. It's different problem. Here we should use file mapped buffer because I think it's the only way to scale the download with exhausting the heap memory, but it only works with local files. Therefore cannot be used for S3 store.

As nice side effect, this should make easier parallel download using range as you were suggesting

from wave.

jorgeaguileraseqera avatar jorgeaguileraseqera commented on June 11, 2024

are these screenshots meanwhile we have several requests downloaded at the same time?

because I'm inspecting the application and the number of Flux objects no increased over time. Maybe we crash when we are trying to serve some images at the same time? in this situation, if I understand your idea of MappedByte, we need firstly to download the remote image to a temp file and serve it, right ?

from wave.

pditommaso avatar pditommaso commented on June 11, 2024

Maybe we crash when we are trying to serve some images at the same time?

very likely

if I understand your idea of MappedByte, we need firstly to download the remote image to a temp file and serve it, right?

nearly, the idea is to write the stream to a file and at the same time send the chunks out.

In the snipped above the println should be replaced with flux emit

The thing that it should be solved how to delete the temp file when the flux download is completed. is there a callback that would allow to do that?

from wave.

jorgeaguileraseqera avatar jorgeaguileraseqera commented on June 11, 2024

For the record:

after testing the ByteBuffer approach we've realized when the blob is huge the download is broken.

I've tested using the StreamedFile from Micronaut with 10 concurrent clients requesting different images and the performance seems fine and memory stable.

this docker-compose can be useful for #45 :

services:

   a:
     image: jpetazzo/dind
     privileged: true
     environment:
       - PORT=1234
     volumes:
      - .:/benchmark


   b:
     image: jpetazzo/dind
     privileged: true
     environment:
       - PORT=1234
     volumes:
      - .:/benchmark 

   c:
     image: jpetazzo/dind
     privileged: true
     environment:
       - PORT=1234
     volumes:
      - .:/benchmark 

   d:
     image: jpetazzo/dind
     privileged: true
     environment:
       - PORT=1234
     volumes:
      - .:/benchmark 

   e:
     image: jpetazzo/dind
     privileged: true
     environment:
       - PORT=1234
     volumes:
      - .:/benchmark 

   f:
     image: jpetazzo/dind
     privileged: true
     environment:
       - PORT=1234
     volumes:
      - .:/benchmark 

   g:
     image: jpetazzo/dind
     privileged: true
     environment:
       - PORT=1234
     volumes:
      - .:/benchmark 

   h:
     image: jpetazzo/dind
     privileged: true
     environment:
       - PORT=1234
     volumes:
      - .:/benchmark

after wakeup the stack you can run commands as

docker-compose exec a sh -c "docker pull c10c-92-185-192-73.ngrok.io/quay.io/team-helium/miner:miner-amd64_2022.03.23.1_GA"

from wave.

pditommaso avatar pditommaso commented on June 11, 2024

Finally, this looks solved. No more duplicate objects

Screenshot 2022-04-06 at 23 40 53

Screenshot 2022-04-06 at 23 40 49

Screenshot 2022-04-06 at 23 40 46

from wave.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.