Giter Club home page Giter Club logo

Comments (15)

cmoesel avatar cmoesel commented on June 20, 2024

BLUF

I think the container that does the build is running out of memory and being killed before it finished.

I think this can be fixed by increasing memory available to the container and/or using the --oom-kill-disable options when running the container (but only recommended in conjunction with a -m flag to limit total memory).

Details

I've been trying to determine what's going on by attempting to reproduce using Docker containers (since that is how the auto-build works). I tried to do a variation of the ig-build image in this repo, but I could not get it to build. In short, the container insisted on installing ruby 2.1, but jekyll-asciidoc required a later version of ruby. I tried downgrading jekyll-asciidoc, but that didn't work either.

In the end, I ended up making an even-more-modified container for testing (see Dockerfile). When I ran the container, I mapped my IG source to /usr/src/input:

docker run -it --rm -v /path/to/my/guide:/usr/src/input --name running-ig-publisher ig-publisher

This ran for a while but ultimately was killed. Here's the last few lines of the output:

Generating Outputs in /usr/src/ig/output/guide/                                  (00:02:39.0195sec)
Generating Summary Outputs                                                       (00:05:48.0216sec)
Jekyll: Source: /usr/src/ig/temp                                                 (00:05:49.0611sec)
Jekyll: Generating...                                                            (00:05:49.0612sec)
Jekyll: done in 8.509 seconds.                                                   (00:05:58.0123sec)
Checking Output HTML                                                             (00:06:02.0670sec)
found 2946 files
Killed
$

After some research, I learned that

By default, kernel kills processes in a container if an out-of-memory (OOM) error occurs.

Based on that documentation, I changed my command to this:

docker run -it -m 3G --oom-kill-disable --rm -v /path/to/my/guide:/usr/src/input --name running-ig-publisher ig-publisher

This time, the build succeeded (although it took 21 minutes to complete).

@jmandel or @grahamegrieve -- do you think that it's likely that this is the problem? And if so, might it be possible to update the auto-builder to allocate more memory and/or set the --oom-kill-disable flag (as long as you are also limiting memory via a -m flag)?

from auto-ig-builder.

jmandel avatar jmandel commented on June 20, 2024

Thanks for the detailed report! Just to get started, do you know how much memory the job is expected to take? And when it has been killed is it doing the java build process or during a Jekyll invocation (I was assuming the former)? Also is the 21 minute build time consistent with what you see when you run locally using a normal process?

Things we'll want to investigate:

  1. Is our default to container currently not buildable from scratch?
  2. Can increase the memory limits in our kubernetes deployment to fix the errors you seeing right now
  3. Can we ensure that proper errors are generated when the container runs out of memory

from auto-ig-builder.

cmoesel avatar cmoesel commented on June 20, 2024

I'm not sure how much memory it needs. As noted above, I got it working with -m 3G and --oom-kill-disable -- but as it's not great to disable the oom killer, I'll experiment to see if there is a value for -m that allows is to complete.

A few odd things to note:

  • The full IG build takes about 5 minutes on my Mac OS X if I don't use a Docker container (vs 21 minutes when I got it to work in the container). This is with -Xms2g -Xmx2g allocating 2G to Java.
  • Today I got the docker build to work in 15m when I didn't specify a memory limit -m when running the container (which is odd since yesterday that failed).

According to the logs, it looks like it's dying in the Java process, but since I think I've established that 2GB is enough for Java (on my own Mac), I wonder if it's some other process eating up memory and/or not releasing it?

from auto-ig-builder.

cmoesel avatar cmoesel commented on June 20, 2024

Based on experimentation, I can keep the Java memory allocation to 2GB, but I need to increase the container allocation to 8GB. In other words, this seems to work:

docker run -it -m 8GB --rm -v /path/to/my/guide:/usr/src/input --name running-ig-publisher ig-publisher

Is it possible to increase the allocation on the container by that much in the hosted services?

from auto-ig-builder.

jmandel avatar jmandel commented on June 20, 2024

We can certainly make this change. I would love to understand more about what's happening with memory at the same time. Do you think your Java process is using only two gigabytes? In which case is Jekyll bursting higher?

from auto-ig-builder.

cmoesel avatar cmoesel commented on June 20, 2024

I guess I don't know for sure. I'm assuming that since I'm passing in -Xms2g -Xmx2g that Java is staying within those bounds -- otherwise I think we'd see Java throw an OOM exception. That said, it seems suspicious that some other process would be using almost 6GB on top of Java.

I'll try running it again and using docker stats to see what that reports for memory usage and by what processes. (This is probably something I should have done in the first place).

from auto-ig-builder.

cmoesel avatar cmoesel commented on June 20, 2024

Here is a very weird wrinkle in this story. As I was trying to profile my container to see memory usage, I noticed that no matter what I pass in via -m, my container was never getting more than 2GB memory. I also couldn't reproduce the success I saw this morning with -m 8G.

It turns out that Docker for Mac defaults to an overall 2GB memory max unless you change the configuration in the Docker preferences menu. So all along I have never got more than 2GB memory in a container. This means that my success this morning was due to some stroke of luck in which I didn't exceed 2GB! It also means that my actual memory usage is probably much lower than 8GB (phew).

I've just bumped up my Docker for Mac memory preferences, so I'll try this all again and report back.

from auto-ig-builder.

cmoesel avatar cmoesel commented on June 20, 2024

Wow. Actually increasing my container memory made a big difference. Here's how I ran it:

  • Java command (inside container) was still given -Xms2g -Xmx2g arguments
  • docker run command was given 4GB allocation (via -m 4G)

The result is:

  • Published successfully in ~5 minutes (same as my host without docker)
  • Jekyll went by quick (in top) but it looked like it used about 166 MB memory at its peak
  • Java process went up to 2.90 GB memory when checking output HTML

I was reminded that Xmx only affects max heap size, but the Java process will use additional memory on top of that (see here).

To see how close we are getting to the actual java heap size max (2 GB), I tried running it with -Xmx1500m. It ran a very long time and then finally crashed with java.lang.OutOfMemoryError: Java heap space. This means we may not have much headroom with a 2GB heap.

I did one more run, setting -Xmx2500m and it completed in 4m33s.

@jmandel -- here are my recommendations based on these findings:

  • Within the container, we should run the publisher.jar with -Xmx2500m. This gives us a little headroom and allows for faster process times.
  • When launching the container, we should set the memory limit to 4 GB (-m 4G). Since we're using 2.9 GB in the java process, I think 3 GB would be cutting it too close.

Let me know what you think -- and when you think we might be able to have these changes deployed.

from auto-ig-builder.

jmandel avatar jmandel commented on June 20, 2024

Thanks! Really appreciate the level of detail and suggestions here -- it's a huge help.

Currently we're assigning 2g to the Java process and 3200Mi as a memory limit for the Kubernetes pod.

I'll bump these up today per your suggestions and note here when I've done so.

from auto-ig-builder.

jmandel avatar jmandel commented on June 20, 2024

With 4Gi on the pod and -Xms3g -Xmx3g on the java process, I'm still seeing the java process with exist status 137 on HL7/us-breastcancer (shortly after jekyll runs). Now bumping Java memory up to -Xms3750m -Xmx3750m...

from auto-ig-builder.

jmandel avatar jmandel commented on June 20, 2024

This should be working now!

An underlying issue was actually brittleness in looking for the qa.html file -- we expected to find it as the root of the output from the build tool, but here there's a guide directory. Is this an intentional choice? I think we'll need some way to determine how to find the right qa file after a build, if the convention isn't "grab the root-level file".

from auto-ig-builder.

cmoesel avatar cmoesel commented on June 20, 2024

Hi @jmandel. Great! I can't wait to try it out! Regarding the guide directory, I'll take a look to see if I can remember how/why we did it that way. If it's not standard practice, I'd like to conform to whatever is.

from auto-ig-builder.

cmoesel avatar cmoesel commented on June 20, 2024

Would you be able to find the qa.html by looking at the ig.json file's paths.output? In our ig.json, we have it set to output/guide/, so that seems to indicate where the qa.html file would be.

That said, I just noticed now that in the IG Publisher doc, it documents this field w/ a special disclaimer for auto-build:

"output" : "[output]", // optional; default = "output" (and must be output for auto-build)

Oops.

I can probably change ours to "output". IIRC, I think I put that second-level folder in there because it's a real pain in the neck to find the index.html file in the output folder containing 2000+ files -- so I put the 1000s of files in a sub-folder and have a simple redirecting index.html along-side it. Perhaps I can find another way to make it easy for people running the build locally to quickly open the index.

from auto-ig-builder.

cmoesel avatar cmoesel commented on June 20, 2024

@jmandel -- I just pushed to my repo and got my first successfully triggered auto-build! Thank you so very much for trudging through this with me and spending the time to get it working. It is very much appreciated!

from auto-ig-builder.

jmandel avatar jmandel commented on June 20, 2024

Sure thing! Really appreciate your help investigating along the way. Looking ahead, there's no reason that handling of the configuration Json file needs to be so brittle, and it would be good too tackle that.

from auto-ig-builder.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.