Comments (21)
It appears in a torch.cudnn package update unrelated to 2.3, where a directive was added to compile in parallel with make -J3. Will try patching back to -J1 to reduce the memory usage.
from jetson-reinforcement.
You don't need to recompile L4T kernel, I followed these normal instructions: http://askubuntu.com/a/33703
To tell if it's successful, you should see swap memory appear in /proc/meminfo
from jetson-reinforcement.
@AerialRobotics yeah I noticed this too, have been meaning to dig into the reason. First guess is overfitting and I was going to try saving the model when it reaches 90%+, then switch to inference.
from jetson-reinforcement.
That file should have been created during the cmake configuration step. Can you check if it exists in jetson-reinforcement/build/torch/include
?
from jetson-reinforcement.
That file, and the TH/THC subfolders aren't there. But, everything else is.
I didn't see any error messages when I did the cmake configuration step. It did take a good while, but nothing unexpected.
from jetson-reinforcement.
Yeah actually the root error is earlier in the cmake phase - typing in by hand since Firefox on 2.3 crashes
In function 'THByteVector_vectorDispatchInit"
simd.h:64:3 error: impossible constraint in 'asm'
asm volatile ( "cpuid\n\t"
lib/TH/CMakeFiles/TH.dir/build.make:350: recipe for target 'lib/TH/CMakeFiles/TH.dir/THVector.c.o' failed
The later error is likely a result of this failure.
from jetson-reinforcement.
See these open issues on Torch7 GitHub regarding the issue: 762 766
For now, I just checked in a (temporary) modification to CMakePreBuild.sh which will checkout a slightly older commit of Torch7 repo.
from jetson-reinforcement.
Thanks Dusty,
I removed the build directory and tried again, it gives a slew of errors failing compilation (now it gives the following error after doing a make, but that's just a resultant)
.../c/deepQLearner.cpp:16:21 fatal error: THC/THC.h: no such file or directory
The earlier errors are variations of "cc: internal compiler error: Killed (program cc1plus)" (this is after a reboot) - lots of killed programs while it's trying to build. Usually these occur from some OS issue like limited RAM.
So hum. I'm also getting a killed Firefox, is there some resource limitation in 2.3 causing all these kills?
Otherwise, to try and rebuild is all I need to do is remove the build directory correct?
Thanks -
from jetson-reinforcement.
Hi Dustin, any luck?
from jetson-reinforcement.
OK, I was able to get it building again by patching cutorch rockspec in commit 62af1a1
to force jobs -J1 and mounting swap (SATA or SDcard). One of the cutorch tensor source files was consuming all the memory until the compiler was killed (at the time the system was otherwise consuming ~800MB of memory, i.e. normal range). Attached is the build log of it building again with JetPack 2.3 / Ubuntu 16.04.
log.txt
from jetson-reinforcement.
The only way I was able to get cmake to work correctly was to do what Dustin recommended.
Doing this was a bit tricky though. What it requires is modifying the cutorch-scm-1.rockspec file under build/cutorch/rocks, and then you have to make it read only. If you don't make it read only it ends up being over-written.
The lines I modified were
Line 27 where I changed jopts=1
Line 29 where I changed jopts=1
It's likely not the best way to do it, but it got the job done.
Summary of steps
delete build directory, and recreate it
git the following package from the build directory -> https://github.com/torch/cutorch
modify cutorch-scm-1.rockspec
make the file read-only
run the cmake script
As to firefox it's my understanding it doesn't work with Jetpack 2.3
Edit - The latest commit accomplishes the same thing. But, for some reason I didn't have to create a mounting swap.
from jetson-reinforcement.
I did verify in the build log, that cutorch was Building on 1 cores
and the CMake script change had taken effect. However, when compiling the TensorMathPointwise files I think it was, OOM killer stepped in until I mounted swap.
Since all of the files are self-contained within the build/ directory, torch/ect. doesn't need compiled from source for each Jetson and could be copied around for JetPack 2.3.
Also note that the torch repo is rolled back to a prior commit right now in the CMake script due to the issues mentioned from this post above.
from jetson-reinforcement.
Thanks guys. Tried it last night and still get kills. I had logged out of the desktop to save memory but apparently I still need swap space.
Dusty any issues with setting up swap? I'm not seeing a lot in a search other than I'd need to recompile the kernel, or has that changed now?
from jetson-reinforcement.
OK, I got a 32GB SDCard swap and SSD for building. The box thrashes so badly it's frozen up, but I was able to get a top page which shows some tens or hundreds of cudafe
, cc1plus
andcudafe++
running. This appears to be during compilatino of cutorch
(e.g. THCTensorMathPointwise.cu
). Will see if it manages to get through.
OK it got through the cmake! Now on doing a make it fails here ...
c/deepQLearner.cpp: In member function 'bool deepQLearner::initLua()
c/deepQLearner.cpp:342:51: error: invalid use of incompete type 'struct THCState'
printf("]deepRL] cuTorch numDevices: %i\n", THC->numDevices);
note forward declartion of 'struct THCState'
you guys didn't get this error?
from jetson-reinforcement.
It's because cutorch was updated since my last comment - see cutorch commit 44c5193.
In master, I commented out that line now 32cb67c.
There is also a pre-built archive released here for JetPack 2.3: L4T-R24.2-RC1
from jetson-reinforcement.
It works! Thanks Dusty, neat little program
from jetson-reinforcement.
I could never get this working for JetPack 2.3. The 'cmake' works, but doing 'make' ends up producing the fatal error: "THC/THC.h: No such file or directory. I even tried downloading the pre-built archive 'L4T-R24.2-RC1'. Executing ./deepRL-console hello.lua throws "libluajit.so: cannot open shared object file: No such file or directory. Should I just go back to JetPack 2.2?
from jetson-reinforcement.
OK, I've updated master to build again with latest Torch changes. If you try cloning the repo again, it should work. If you still get the THC/THC.h error, please confirm that build/torch/include/THC/THC.h
file is present, otherwise the cmake config script may not have completed correctly.
Regarding the pre-built archive, does it work if you extract the contents as /home/ubuntu/workspace/jetson-inference
?
from jetson-reinforcement.
The pre-built archive worked once it was executed within the directory you specified above. I was doing it out of /home/ubuntu/jetson-inference. Running the demo I noticed after 400 epochs the wins dramatically decreased from .90 to .50. By 1200 it was back to .90. But then dropped sharply again. Once the algorithm learns, why can't it maintain a high percentage of wins?
from jetson-reinforcement.
The build did not go so well. Could not get cmake to even complete. Started receiving many 'Killed' messages. Please see screenshot.
from jetson-reinforcement.
@AerialRobotics I think the reason cause your problem is the memory size. As the talk above, you can add swap. For me, 12GB swap is OK.
from jetson-reinforcement.
Related Issues (20)
- Error running the Gazebo Arm example HOT 2
- How to install conda in TX2?
- Only for TX2? Can i install on TK1?
- CUDA too many resources
- jetson-reinforcement/build/pytorch/torch/_thnn/utils.py:1: RuntimeWarning: Parent module 'torch._thnn' not found while handling absolute import
- problem building pytorch with jetson tx2 HOT 7
- Patches needed to make this work on L4T 4.4.38-tegra HOT 1
- I need pytorch == 0.3.1
- xavier support? HOT 5
- opencv luarocks install camera
- CMakePreBuild.sh
- run ./gazebo-arm.sh
- Error with Jetson Nano HOT 1
- Error Jetson nano to launch gym-DQN.py
- Issue compiling in Jetson TX2
- Error running catch executable
- torch7 move to ATen? HOT 2
- Building from Source fails HOT 1
- Makefile:83: recipe for target 'all' failed
- Supported platform HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jetson-reinforcement.