Comments (15)
BLUF
I think the container that does the build is running out of memory and being killed before it finished.
I think this can be fixed by increasing memory available to the container and/or using the --oom-kill-disable
options when running the container (but only recommended in conjunction with a -m
flag to limit total memory).
Details
I've been trying to determine what's going on by attempting to reproduce using Docker containers (since that is how the auto-build works). I tried to do a variation of the ig-build image in this repo, but I could not get it to build. In short, the container insisted on installing ruby 2.1, but jekyll-asciidoc required a later version of ruby. I tried downgrading jekyll-asciidoc, but that didn't work either.
In the end, I ended up making an even-more-modified container for testing (see Dockerfile). When I ran the container, I mapped my IG source to /usr/src/input
:
docker run -it --rm -v /path/to/my/guide:/usr/src/input --name running-ig-publisher ig-publisher
This ran for a while but ultimately was killed. Here's the last few lines of the output:
Generating Outputs in /usr/src/ig/output/guide/ (00:02:39.0195sec)
Generating Summary Outputs (00:05:48.0216sec)
Jekyll: Source: /usr/src/ig/temp (00:05:49.0611sec)
Jekyll: Generating... (00:05:49.0612sec)
Jekyll: done in 8.509 seconds. (00:05:58.0123sec)
Checking Output HTML (00:06:02.0670sec)
found 2946 files
Killed
$
After some research, I learned that
By default, kernel kills processes in a container if an out-of-memory (OOM) error occurs.
Based on that documentation, I changed my command to this:
docker run -it -m 3G --oom-kill-disable --rm -v /path/to/my/guide:/usr/src/input --name running-ig-publisher ig-publisher
This time, the build succeeded (although it took 21 minutes to complete).
@jmandel or @grahamegrieve -- do you think that it's likely that this is the problem? And if so, might it be possible to update the auto-builder to allocate more memory and/or set the --oom-kill-disable
flag (as long as you are also limiting memory via a -m
flag)?
from auto-ig-builder.
Thanks for the detailed report! Just to get started, do you know how much memory the job is expected to take? And when it has been killed is it doing the java build process or during a Jekyll invocation (I was assuming the former)? Also is the 21 minute build time consistent with what you see when you run locally using a normal process?
Things we'll want to investigate:
- Is our default to container currently not buildable from scratch?
- Can increase the memory limits in our kubernetes deployment to fix the errors you seeing right now
- Can we ensure that proper errors are generated when the container runs out of memory
from auto-ig-builder.
I'm not sure how much memory it needs. As noted above, I got it working with -m 3G
and --oom-kill-disable
-- but as it's not great to disable the oom killer, I'll experiment to see if there is a value for -m
that allows is to complete.
A few odd things to note:
- The full IG build takes about 5 minutes on my Mac OS X if I don't use a Docker container (vs 21 minutes when I got it to work in the container). This is with
-Xms2g -Xmx2g
allocating 2G to Java. - Today I got the docker build to work in 15m when I didn't specify a memory limit
-m
when running the container (which is odd since yesterday that failed).
According to the logs, it looks like it's dying in the Java process, but since I think I've established that 2GB
is enough for Java (on my own Mac), I wonder if it's some other process eating up memory and/or not releasing it?
from auto-ig-builder.
Based on experimentation, I can keep the Java memory allocation to 2GB
, but I need to increase the container allocation to 8GB
. In other words, this seems to work:
docker run -it -m 8GB --rm -v /path/to/my/guide:/usr/src/input --name running-ig-publisher ig-publisher
Is it possible to increase the allocation on the container by that much in the hosted services?
from auto-ig-builder.
We can certainly make this change. I would love to understand more about what's happening with memory at the same time. Do you think your Java process is using only two gigabytes? In which case is Jekyll bursting higher?
from auto-ig-builder.
I guess I don't know for sure. I'm assuming that since I'm passing in -Xms2g -Xmx2g
that Java is staying within those bounds -- otherwise I think we'd see Java throw an OOM exception. That said, it seems suspicious that some other process would be using almost 6GB on top of Java.
I'll try running it again and using docker stats
to see what that reports for memory usage and by what processes. (This is probably something I should have done in the first place).
from auto-ig-builder.
Here is a very weird wrinkle in this story. As I was trying to profile my container to see memory usage, I noticed that no matter what I pass in via -m
, my container was never getting more than 2GB memory. I also couldn't reproduce the success I saw this morning with -m 8G
.
It turns out that Docker for Mac defaults to an overall 2GB memory max unless you change the configuration in the Docker preferences menu. So all along I have never got more than 2GB
memory in a container. This means that my success this morning was due to some stroke of luck in which I didn't exceed 2GB! It also means that my actual memory usage is probably much lower than 8GB (phew).
I've just bumped up my Docker for Mac memory preferences, so I'll try this all again and report back.
from auto-ig-builder.
Wow. Actually increasing my container memory made a big difference. Here's how I ran it:
- Java command (inside container) was still given
-Xms2g -Xmx2g
arguments docker run
command was given 4GB allocation (via-m 4G
)
The result is:
- Published successfully in ~5 minutes (same as my host without docker)
- Jekyll went by quick (in
top
) but it looked like it used about 166 MB memory at its peak - Java process went up to 2.90 GB memory when checking output HTML
I was reminded that Xmx
only affects max heap size, but the Java process will use additional memory on top of that (see here).
To see how close we are getting to the actual java heap size max (2 GB), I tried running it with -Xmx1500m
. It ran a very long time and then finally crashed with java.lang.OutOfMemoryError: Java heap space
. This means we may not have much headroom with a 2GB heap.
I did one more run, setting -Xmx2500m
and it completed in 4m33s.
@jmandel -- here are my recommendations based on these findings:
- Within the container, we should run the publisher.jar with
-Xmx2500m
. This gives us a little headroom and allows for faster process times. - When launching the container, we should set the memory limit to 4 GB (
-m 4G
). Since we're using 2.9 GB in the java process, I think 3 GB would be cutting it too close.
Let me know what you think -- and when you think we might be able to have these changes deployed.
from auto-ig-builder.
Thanks! Really appreciate the level of detail and suggestions here -- it's a huge help.
Currently we're assigning 2g to the Java process and 3200Mi as a memory limit for the Kubernetes pod.
I'll bump these up today per your suggestions and note here when I've done so.
from auto-ig-builder.
With 4Gi
on the pod and -Xms3g -Xmx3g
on the java process, I'm still seeing the java process with exist status 137 on HL7/us-breastcancer
(shortly after jekyll runs). Now bumping Java memory up to -Xms3750m -Xmx3750m
...
from auto-ig-builder.
This should be working now!
An underlying issue was actually brittleness in looking for the qa.html
file -- we expected to find it as the root of the output from the build tool, but here there's a guide
directory. Is this an intentional choice? I think we'll need some way to determine how to find the right qa file after a build, if the convention isn't "grab the root-level file".
from auto-ig-builder.
Hi @jmandel. Great! I can't wait to try it out! Regarding the guide
directory, I'll take a look to see if I can remember how/why we did it that way. If it's not standard practice, I'd like to conform to whatever is.
from auto-ig-builder.
Would you be able to find the qa.html
by looking at the ig.json
file's paths.output
? In our ig.json
, we have it set to output/guide/
, so that seems to indicate where the qa.html
file would be.
That said, I just noticed now that in the IG Publisher doc, it documents this field w/ a special disclaimer for auto-build:
"output" : "[output]", // optional; default = "output" (and must be output for auto-build)
Oops.
I can probably change ours to "output". IIRC, I think I put that second-level folder in there because it's a real pain in the neck to find the index.html
file in the output folder containing 2000+ files -- so I put the 1000s of files in a sub-folder and have a simple redirecting index.html along-side it. Perhaps I can find another way to make it easy for people running the build locally to quickly open the index.
from auto-ig-builder.
@jmandel -- I just pushed to my repo and got my first successfully triggered auto-build! Thank you so very much for trudging through this with me and spending the time to get it working. It is very much appreciated!
from auto-ig-builder.
Sure thing! Really appreciate your help investigating along the way. Looking ahead, there's no reason that handling of the configuration Json file needs to be so brittle, and it would be good too tackle that.
from auto-ig-builder.
Related Issues (17)
- Can't "docker build" ig-builder
- Overview page showing all built IGs HOT 9
- Query - is branch aware building possible? HOT 6
- Redeploy to reclaim space from deleted branches HOT 1
- Auto build ignored HOT 5
- Document naming requirements for branches, including the fact that slashes are not allowed HOT 1
- Add ability to ignore commits from the gh-pages branch
- I want to remove fhir-org-starter-project from my GCP HOT 1
- GitLab sends via text/plain HOT 5
- Propagate failure if upload is truncated
- Is `master` branch required? HOT 1
- exempt IHE from CI purge HOT 2
- Update to non-deprecated node runtime HOT 1
- Remove container-deploy-trigger
- Performance issues on WSL HOT 3
- Catch and report when git clone fails HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from auto-ig-builder.