Comments (9)
Hi Joey,
A recent Spack update is causing this issue in our build script. To get you unstuck until we update the build script, you can add the following line right before this one:
spack config update packages -y
Hopefully that helps! Either way, we can leave this issue open until this is fixed.
-Pier
from lbann.
Okay. Thank you! I'll give that a shot.
How do I start the build after modifying 'build_lbann.sh'? Activate the spack environment then rerun the 'build_lbann.sh' script?
from lbann.
Yep, you should just need to re-run build_lbann.sh
the same as you did the first time.
from lbann.
That took the build further, but now 'nccl' is failing to build through spack. Have y'all seen that error before?
from lbann.
yeah, NVIDIA/nccl#835 is the root cause of that. Short of convincing LC to install modern compilers on the system, the best fix is to just force NCCL to build at a lower version. For the LBANN build script, add ^[email protected]
to the end of the spec you pass to the build script.
from lbann.
Haha. The compilers strike again. Thanks a ton!
from lbann.
Thanks again for the help y'all! I got things up and running on lassen. Closing the issue.
from lbann.
@pate7 Great! Just an FYI, NCCL 1.18.3 was recently released, which fixes this. Not sure if Spack has it yet.
from lbann.
Cool. I may take a look at that. Thanks for the heads up!
BTW, spack only has up to version 0.102 of LBANN available. I'm not sure if this is the place to raise that issue or if we need to wait for spack to catch up to v0.103
from lbann.
Related Issues (20)
- LBANN cannot be built with "make"
- In-place layer followed by a viewing layer crashes on backprop
- Nonconst reference to locked view buffer with in-place HOT 2
- Strong scalability of LBANN CosmoFlow
- LBANN WAE tests are failing
- gaussian_fill test failing on pascal
- ci_test/unit_tests/test_unit_inplace_distconv.py skipped on Corona
- ci_test/unit_tests/test_unit_layer_convolution_distconv.py is skipped on Tioga
- Data_* Cleanup
- PROBIES integration test lost files from vast
- test_unit_algo_ltfb_trunc_selection.py - fails intermittently HOT 1
- Old driver functionality
- Variability in ResNet integration test
- Spack development branch build issue HOT 1
- openmpi fork() issue with python datareader
- error: invalid operands to binary expression ('const lbann::callback::(anonymous namespace)::MemUsage' and 'const lbann::callback::(anonymous namespace)::MemUsage') HOT 5
- Multiple build errors: error: static_cast from 'const lbann::l2_weight_regularization *' to 'const lbann::objective_function_term *', which are not related by inheritance, is not allowed, etc HOT 3
- Weight demodulation HOT 9
- Potential bug in lbann.Scale HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lbann.