Giter Club home page Giter Club logo

Comments (22)

pbauman avatar pbauman commented on August 21, 2024

I should mention, this behavior is repeatable. I.e. I can kill the running client and start a new one. First time it runs the job, it has the correct BUILD_ROOT. The next time, it uses the "old" one. Clearly it's being cached somewhere, but not sure where.

from civet.

brianmoose avatar brianmoose commented on August 21, 2024

I take it you are just running client.py?
I don't think it caches BUILD_ROOT anywhere...where are you setting it? On the command line or in .bashrc? Let me know what your command line is and I can try to reproduce it.

from civet.

pbauman avatar pbauman commented on August 21, 2024

I'm just setting it in the environment: export BUILD_ROOT=<path>

The command I use for the client is:

./client.py --url https://femputer.eng.buffalo.edu:8000 --build-key <build_key> --configs linux-gnu --name pbauman_client --insecure &

from civet.

brianmoose avatar brianmoose commented on August 21, 2024

Hmm, I can't seem to reproduce it.
Where are you seeing it using the wrong BUILD_ROOT?
If you are using the scripts from civet_example_recipes then it should display what it thinks the build root is in the header of each step. Is it wrong there as well? Is it right at the beginning (like when you clone/update your repository)? Or after, like when building and it is referencing bad library paths or objects files?

Any custom scripts in civet_recipes that could be the culprit?

from civet.

pbauman avatar pbauman commented on August 21, 2024

I see it as an error in the first step of the recipe, in particular when it's calling init_script I believe.

https://femputer.eng.buffalo.edu:8000/job/3/

/tmp/tmpG7CBZ2: line 123: cd: /femptuer/pbauman/civet_build_testing_root: No such file or directory
ERROR: exiting with code 1

I grepped through the scripts and I don't see any of them setting BUILD_ROOT:

[01:06:58][pbauman@femputer:/femputer/pbauman/civet_recipes/scripts][master] $ grep BUILD_ROOT *.sh
cleanup_build_testing_dir.sh:#REQUIRED: BUILD_ROOT
cleanup_build_testing_dir.sh:SUBDIR=$BUILD_ROOT/$FEMPUTER_BUILD_DIRNAME
cleanup_build_testing_dir.sh:cd "$BUILD_ROOT"
functions.sh:# Bad things can happen if BUILD_ROOT is not set
functions.sh:if [ -z "$BUILD_ROOT" ]; then
functions.sh:  echo "You need to set BUILD_ROOT"
functions.sh:  local b="$BUILD_ROOT/"
functions.sh:  local cwd=${p/#$b/BUILD_ROOT/}
functions.sh:  printf "Build Root: $BUILD_ROOT\n"
functions.sh:  export REPO_DIR=$BUILD_ROOT/$APPLICATION_NAME
functions.sh:    cd "$BUILD_ROOT"
make_build_testing_dir.sh:#REQUIRED: BUILD_ROOT
make_build_testing_dir.sh:SUBDIR=$BUILD_ROOT/$FEMPUTER_BUILD_DIRNAME
run_cmd.sh:REPO_DIR=$BUILD_ROOT/$APPLICATION_NAME

I only have one recipe at the moment and it doesn't set BUILD_ROOT.

Could this be cached somehow on the GitHub side? (I'm sure that's a stupid suggestion.)

from civet.

pbauman avatar pbauman commented on August 21, 2024

Annnnnnd now the server isn't responding. Will report back in a few mins.

from civet.

brianmoose avatar brianmoose commented on August 21, 2024

OK, I guess you probably want /femputer/pbauman/civet_build_testing_root.
I am not seeing anywhere where this could be cached. The client gets it scripts from the CIVET server and the client should only be accessing BUILD_ROOT from its local environment. I guess it could get overwritten locally but it would have to be in the last step of whatever recipe you are running. Do you set BUILD_ROOT in one of your recipe .cfg files?
Could you send the results of env from the terminal where you are launching the client?

from civet.

pbauman avatar pbauman commented on August 21, 2024

OK, I guess you probably want /femputer/pbauman/civet_build_testing_root

Correct.

I am not seeing anywhere where this could be cached. The client gets it scripts from the CIVET server and the client should only be accessing BUILD_ROOT from its local environment.

Does the server maybe cache any of the variables anywhere that would be read by the client?

I guess it could get overwritten locally but it would have to be in the last step of whatever recipe you are running. Do you set BUILD_ROOT in one of your recipe .cfg files?

No. I only have one .cfg file in recipes. (https://github.com/ubaceslab/civet_recipes/tree/master/recipes)

I've tried the following.

  1. Killed the running client and shutdown that terminal.
  2. Killed the running server and shutdown that server.
  3. Started a new terminal and started a new server in that terminal.
  4. Started a new terminal and set valid value of BUILD_ROOT. Started a new client in that terminal.
  5. Invalidated job. It reran with correct BUILD_ROOT up to the point of failure (Configure step, still trying to figure out what's wrong with my shell script that's causing a -15 exit code, no output produced).
  6. Made changes to script, invalidated job.
  7. Now failing because can't cd to old BUILD_ROOT from the terminal that died in step 2 (that had been unset and reset to a valid value before that).
  8. Invalidated again. Now getting exit -15 on Bootstrap step (this is the first time this happened)
  9. Invalidated again. Now getting exit -15 on Build step...
  10. Invalidated again. Now failing because of bad BUILD_ROOT (the misspelled one that should've gone away...)
  11. Invalidated again. Now getting exit -15 on Bootstrap step.

There were no changes to recipe or scripts between steps 7, 8, 9, 10, and 11, just invalidating.

Could you send the results of env from the terminal where you are launching the client?

This is from the new terminal started in step 4.
env_new.txt

from civet.

pbauman avatar pbauman commented on August 21, 2024

Yeah, I don't even. I'm getting random behavior with each invalidate. Any clues about what error code -15 could be?

The nuclear option is rebooting the server, wiping out the current civet install, and starting from scratch, but it would be nice to try and understand what's causing this behavior.

from civet.

brianmoose avatar brianmoose commented on August 21, 2024

Nothing seems wrong with your environment.
-15 is the result of being killed by a signal (15 or SIGTERM).
A quick messy debug option would be to sprinkle your scripts to have echo $BUILD_ROOT in various spots to try to track down when it actually changes.
I am going to clone your repo and go through your scripts real quick to see if anything pops out.

from civet.

pbauman avatar pbauman commented on August 21, 2024

I put a print statement at the top of functions.sh and it looks like BUILD_ROOT is corrupted right from the beginning:

/femptuer/pbauman/civet_build_testing_root
Date: Mon Dec 12 16:16:56 EST 2016
Machine: femputer.eng.buffalo.edu
LSB Version:	:base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID:	CentOS
Description:	CentOS release 6.8 (Final)
Release:	6.8
Codename:	Final
Cannot find MOOSE Package version.
Build Root: /femptuer/pbauman/civet_build_testing_root
Trigger: Pull request
Step: Load GRINS Modules (0)

from civet.

pbauman avatar pbauman commented on August 21, 2024

I'm using Lmod. Does Lmod do some weird caching business? Is each step in the recipe starting a new environment?

from civet.

pbauman avatar pbauman commented on August 21, 2024

There must be something funky happening between Lmod and the shell. I'm printing out env at the top of functions.sh and it's (occasionally...) showing the bad BUILD_ROOT value.

from civet.

brianmoose avatar brianmoose commented on August 21, 2024

I have seen weird Lmod caching behavior with environment variables but it is usually solved by quitting the current terminal and starting again.

I notice that you are loading modules in the scripts. I don't think that will work like you think it should. Each step is run in its own process so it shouldn't carry over (which is why I find it weird that BUILD_ROOT is getting set by a child process).
If your recipes need a certain environment then the environment should be setup to how you want it in the terminal that is going to run the client. That way the environment is inherited from the parent process (the client).
If you need multiple different environments then you can use multiple build configs but then you might want to look at our inl_client.py which is setup to handle multiple different module environments.
Not the cause here probably, but I also noticed that you don't clear out grins-devel before you start everything. You will probably want to do that since you can't rely on the cleanup step always executing.

from civet.

pbauman avatar pbauman commented on August 21, 2024

I have seen weird Lmod caching behavior with environment variables but it is usually solved by quitting the current terminal and starting again.

Yes, this was really perplexing.

I notice that you are loading modules in the scripts. I don't think that will work like you think it should. Each step is run in its own process so it shouldn't carry over (which is why I find it weird that BUILD_ROOT is getting set by a child process).
If your recipes need a certain environment then the environment should be setup to how you want it in the terminal that is going to run the client. That way the environment is inherited from the parent process (the client).

Thanks for the info. I'd started inferring this through my trial and error. :)

If you need multiple different environments then you can use multiple build configs but then you might want to look at our inl_client.py which is setup to handle multiple different module environments.
Not the cause here probably, but I also noticed that you don't clear out grins-devel before you start everything. You will probably want to do that since you can't rely on the cleanup step always executing.

I'd planned on controlling the environment with the scripts+module system - any pointers to what I should look at in the inl_client.py?

from civet.

pbauman avatar pbauman commented on August 21, 2024

After rebooting the node and restarting the server and client, everything seems to be working now: I've invalidated twice, I'm not getting -15 exit messages or invalid BUILD_ROOT values. Just need to fix LD_LIBRARY_PATH in one of the modules (because PETSc changed stuff again).

I guess this is good, but it would be nice to understand really what happened. Is there any chance that screen would be interfering? I was running the CIVET server and client within a screen session.

from civet.

brianmoose avatar brianmoose commented on August 21, 2024

I don't see screen could have affected it but then I can't really see how Lmod screwed things up so badly. I bet it has something to do with Lmod. It seems to do some magic under the hood that I haven't gotten around to trying to figure out.

Are you going to be using multiple different module configurations? If not then you could use something like what is in civet/client/scripts/control.sh to setup the environment and run the client. We do that on a couple machines that only have one configuration.

Regardless, I will update the wiki on how to setup the inl_client.py

from civet.

pbauman avatar pbauman commented on August 21, 2024

I don't see screen could have affected it but then I can't really see how Lmod screwed things up so badly. I bet it has something to do with Lmod. It seems to do some magic under the hood that I haven't gotten around to trying to figure out.

OK, hopefully I won't make anymore typos. :P Will report in if I ever figure it out (unlikely).

Are you going to be using multiple different module configurations?

Yes? I guess I'm not quite sure what you mean. I plan to several recipes that will build with different compiler options on the same PR, e.g. dbg vs. opt libMesh. I'll have different modules for the dbg build vs. the opt build. But I can easily just load the different modules in each of those recipes.

If not then you could use something like what is in civet/client/scripts/control.sh to setup the environment and run the client. We do that on a couple machines that only have one configuration.

OK, thanks, I'll have a look.

Regardless, I will update the wiki on how to setup the inl_client.py

Awesome, thanks!

from civet.

brianmoose avatar brianmoose commented on August 21, 2024

So you have different modules for debug vs opt? We don't do that much here, we just have different compiler targets or configuration targets. Like linux-gcc, linux-clang, linux-intel, etc. Each one of those loads a different set of modules. But we build libmesh every time in dbg/opt every time.
For your case you could have a linux-gcc-opt and linux-gcc-dbg and use the inl_client.py to poll for jobs on those two configurations.
However, loading modules in an individual step should work as well. You will just need to make sure the correct modules are loaded in each step (make sure to do a purge first!). Going this route you probably wouldn't need the inl_client.py.

from civet.

brianmoose avatar brianmoose commented on August 21, 2024

I updated the wiki for the INL client . Please let me know if there is something that isn't clear.

from civet.

pbauman avatar pbauman commented on August 21, 2024

Sorry I was slow here (travel+end of semester). This is all clear for me now and my server and client are running smoothly so far. Thanks very much for the very quick and helpful comments and updates!

from civet.

brianmoose avatar brianmoose commented on August 21, 2024

Great! Let us know if you run into any problems or need something added.

from civet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.