Giter Club home page Giter Club logo

Comments (21)

dusty-nv avatar dusty-nv commented on August 10, 2024 2

It appears in a torch.cudnn package update unrelated to 2.3, where a directive was added to compile in parallel with make -J3. Will try patching back to -J1 to reduce the memory usage.

from jetson-reinforcement.

dusty-nv avatar dusty-nv commented on August 10, 2024 1

You don't need to recompile L4T kernel, I followed these normal instructions: http://askubuntu.com/a/33703

To tell if it's successful, you should see swap memory appear in /proc/meminfo

from jetson-reinforcement.

DanMcLaughlin avatar DanMcLaughlin commented on August 10, 2024 1

@AerialRobotics yeah I noticed this too, have been meaning to dig into the reason. First guess is overfitting and I was going to try saving the model when it reaches 90%+, then switch to inference.

from jetson-reinforcement.

dusty-nv avatar dusty-nv commented on August 10, 2024

That file should have been created during the cmake configuration step. Can you check if it exists in jetson-reinforcement/build/torch/include?

from jetson-reinforcement.

S4WRXTTCS avatar S4WRXTTCS commented on August 10, 2024

That file, and the TH/THC subfolders aren't there. But, everything else is.

I didn't see any error messages when I did the cmake configuration step. It did take a good while, but nothing unexpected.

from jetson-reinforcement.

DanMcLaughlin avatar DanMcLaughlin commented on August 10, 2024

Yeah actually the root error is earlier in the cmake phase - typing in by hand since Firefox on 2.3 crashes

In function 'THByteVector_vectorDispatchInit"
simd.h:64:3 error: impossible constraint in 'asm'
asm volatile ( "cpuid\n\t"

lib/TH/CMakeFiles/TH.dir/build.make:350: recipe for target 'lib/TH/CMakeFiles/TH.dir/THVector.c.o' failed

The later error is likely a result of this failure.

from jetson-reinforcement.

dusty-nv avatar dusty-nv commented on August 10, 2024

See these open issues on Torch7 GitHub regarding the issue: 762 766

For now, I just checked in a (temporary) modification to CMakePreBuild.sh which will checkout a slightly older commit of Torch7 repo.

from jetson-reinforcement.

DanMcLaughlin avatar DanMcLaughlin commented on August 10, 2024

Thanks Dusty,
I removed the build directory and tried again, it gives a slew of errors failing compilation (now it gives the following error after doing a make, but that's just a resultant)

.../c/deepQLearner.cpp:16:21 fatal error: THC/THC.h: no such file or directory

The earlier errors are variations of "cc: internal compiler error: Killed (program cc1plus)" (this is after a reboot) - lots of killed programs while it's trying to build. Usually these occur from some OS issue like limited RAM.

So hum. I'm also getting a killed Firefox, is there some resource limitation in 2.3 causing all these kills?

Otherwise, to try and rebuild is all I need to do is remove the build directory correct?

Thanks -

from jetson-reinforcement.

DanMcLaughlin avatar DanMcLaughlin commented on August 10, 2024

Hi Dustin, any luck?

from jetson-reinforcement.

dusty-nv avatar dusty-nv commented on August 10, 2024

OK, I was able to get it building again by patching cutorch rockspec in commit 62af1a1 to force jobs -J1 and mounting swap (SATA or SDcard). One of the cutorch tensor source files was consuming all the memory until the compiler was killed (at the time the system was otherwise consuming ~800MB of memory, i.e. normal range). Attached is the build log of it building again with JetPack 2.3 / Ubuntu 16.04.
log.txt

from jetson-reinforcement.

S4WRXTTCS avatar S4WRXTTCS commented on August 10, 2024

The only way I was able to get cmake to work correctly was to do what Dustin recommended.

Doing this was a bit tricky though. What it requires is modifying the cutorch-scm-1.rockspec file under build/cutorch/rocks, and then you have to make it read only. If you don't make it read only it ends up being over-written.

The lines I modified were

Line 27 where I changed jopts=1
Line 29 where I changed jopts=1

It's likely not the best way to do it, but it got the job done.

Summary of steps
delete build directory, and recreate it
git the following package from the build directory -> https://github.com/torch/cutorch
modify cutorch-scm-1.rockspec
make the file read-only
run the cmake script

As to firefox it's my understanding it doesn't work with Jetpack 2.3

Edit - The latest commit accomplishes the same thing. But, for some reason I didn't have to create a mounting swap.

from jetson-reinforcement.

dusty-nv avatar dusty-nv commented on August 10, 2024

I did verify in the build log, that cutorch was Building on 1 cores and the CMake script change had taken effect. However, when compiling the TensorMathPointwise files I think it was, OOM killer stepped in until I mounted swap.

Since all of the files are self-contained within the build/ directory, torch/ect. doesn't need compiled from source for each Jetson and could be copied around for JetPack 2.3.

Also note that the torch repo is rolled back to a prior commit right now in the CMake script due to the issues mentioned from this post above.

from jetson-reinforcement.

DanMcLaughlin avatar DanMcLaughlin commented on August 10, 2024

Thanks guys. Tried it last night and still get kills. I had logged out of the desktop to save memory but apparently I still need swap space.

Dusty any issues with setting up swap? I'm not seeing a lot in a search other than I'd need to recompile the kernel, or has that changed now?

from jetson-reinforcement.

DanMcLaughlin avatar DanMcLaughlin commented on August 10, 2024

OK, I got a 32GB SDCard swap and SSD for building. The box thrashes so badly it's frozen up, but I was able to get a top page which shows some tens or hundreds of cudafe, cc1plusandcudafe++ running. This appears to be during compilatino of cutorch(e.g. THCTensorMathPointwise.cu). Will see if it manages to get through.

OK it got through the cmake! Now on doing a make it fails here ...

c/deepQLearner.cpp: In member function 'bool deepQLearner::initLua()
c/deepQLearner.cpp:342:51: error: invalid use of incompete type 'struct THCState'
   printf("]deepRL] cuTorch numDevices: %i\n", THC->numDevices);

note forward declartion of 'struct THCState'

you guys didn't get this error?

from jetson-reinforcement.

dusty-nv avatar dusty-nv commented on August 10, 2024

It's because cutorch was updated since my last comment - see cutorch commit 44c5193.

In master, I commented out that line now 32cb67c.
There is also a pre-built archive released here for JetPack 2.3: L4T-R24.2-RC1

from jetson-reinforcement.

DanMcLaughlin avatar DanMcLaughlin commented on August 10, 2024

It works! Thanks Dusty, neat little program

from jetson-reinforcement.

AerialRobotics avatar AerialRobotics commented on August 10, 2024

I could never get this working for JetPack 2.3. The 'cmake' works, but doing 'make' ends up producing the fatal error: "THC/THC.h: No such file or directory. I even tried downloading the pre-built archive 'L4T-R24.2-RC1'. Executing ./deepRL-console hello.lua throws "libluajit.so: cannot open shared object file: No such file or directory. Should I just go back to JetPack 2.2?

from jetson-reinforcement.

dusty-nv avatar dusty-nv commented on August 10, 2024

OK, I've updated master to build again with latest Torch changes. If you try cloning the repo again, it should work. If you still get the THC/THC.h error, please confirm that build/torch/include/THC/THC.h file is present, otherwise the cmake config script may not have completed correctly.

Regarding the pre-built archive, does it work if you extract the contents as /home/ubuntu/workspace/jetson-inference ?

from jetson-reinforcement.

AerialRobotics avatar AerialRobotics commented on August 10, 2024

The pre-built archive worked once it was executed within the directory you specified above. I was doing it out of /home/ubuntu/jetson-inference. Running the demo I noticed after 400 epochs the wins dramatically decreased from .90 to .50. By 1200 it was back to .90. But then dropped sharply again. Once the algorithm learns, why can't it maintain a high percentage of wins?

from jetson-reinforcement.

AerialRobotics avatar AerialRobotics commented on August 10, 2024

The build did not go so well. Could not get cmake to even complete. Started receiving many 'Killed' messages. Please see screenshot.
output

from jetson-reinforcement.

gwljf avatar gwljf commented on August 10, 2024

@AerialRobotics I think the reason cause your problem is the memory size. As the talk above, you can add swap. For me, 12GB swap is OK.

from jetson-reinforcement.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.