Comments (5)
Since you appear to be running as root on Ubuntu, can you first make sure that nvidia-smi
is in that users PATH, and produces sensible output when run from the command line? It doesn't look like this command is being found.
I'd also suggest having a look at the items in our installation FAQ: http://neon.nervanasys.com/docs/latest/faq.html
from neon.
Hi scttl, thanks for your reply. you are right about that the nvidia-smi command is not found.
I've checked my installation and configuration carefully, and it seems that it's a tegra k1 specific problem.
What I am doing is trying to run neon on the nvidia jetson tk1 devkit, and NVML is not supported on Jetson TK1, so nvidia-smi command would not be found, even when CUDA installation is all right.
Is NVML required to run neon demo ? or Is there anyway to solve my problem without NVML ?
ps: I can ran caffe with cudnn with no problem on jetson tk1, so I guess the CUDA installation is all right
from neon.
nvidia-smi
is not required to run any of the examples, we just use it as a proxy to validate that the user has the CUDA SDK installed. Provided you were able to install the cudanet python library ok, for the moment you can work around your issue by editing neon/backends/__init__.py
to replace the line:
gpuflag = (os.system("nvidia-smi > /dev/null 2>&1") == 0)
with
gpuflag = (os.system("nvcc --version > /dev/null 2>&1") == 0)
We made a similar change in the Makefile
a while back, but needed to update the check here as well. I've created a fix, and will get this merged into master for the next release of neon.
from neon.
thanks scttl! editing neon/backends/init.py works, but I still can't run this sample because another erro appears:
root@tegra-ubuntu:/home/hsl/neon# neon --gpu cudanet examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.util.persist:deserializing object from: examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.datasets.imageset:Imageset initialized with dtype <type 'numpy.float32'>
2015-07-01 04:52:29,170 WARNING:neon - setting log level to: 20
2015-07-01 04:52:31,733 INFO:init - Cudanet backend, RNG seed: None, numerr: None
2015-07-01 04:52:31,735 INFO:mlp - Layers:
ImageDataLayer d0: 3 x (224 x 224) nodes
ConvLayer conv1: 3 x (224 x 224) inputs, 64 x (55 x 55) nodes, RectLin act_fn
PoolingLayer pool1: 64 x (55 x 55) inputs, 64 x (27 x 27) nodes, Linear act_fn
ConvLayer conv2: 64 x (27 x 27) inputs, 192 x (27 x 27) nodes, RectLin act_fn
PoolingLayer pool2: 192 x (27 x 27) inputs, 192 x (13 x 13) nodes, Linear act_fn
ConvLayer conv3: 192 x (13 x 13) inputs, 384 x (13 x 13) nodes, RectLin act_fn
ConvLayer conv4: 384 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin act_fn
ConvLayer conv5: 256 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin act_fn
PoolingLayer pool3: 256 x (13 x 13) inputs, 256 x (6 x 6) nodes, Linear act_fn
FCLayer fc4096a: 9216 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout1: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc4096b: 4096 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout2: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc1000: 4096 inputs, 1000 nodes, Softmax act_fn
CostLayer cost: 1000 nodes, CrossEntropy cost_fn
2015-07-01 04:52:31,738 INFO:batch_norm - BatchNormalization set to train mode
2015-07-01 04:52:32,228 INFO:val_init - Generating AutoUniformValGen values of shape (363, 64)
2015-07-01 04:52:32,254 INFO:batch_norm - BatchNormalization set to train mode
2015-07-01 04:52:32,340 INFO:val_init - Generating AutoUniformValGen values of shape (1600, 192)
2015-07-01 04:52:32,370 INFO:batch_norm - BatchNormalization set to train mode
2015-07-01 04:52:32,432 INFO:val_init - Generating AutoUniformValGen values of shape (1728, 384)
2015-07-01 04:52:32,506 INFO:batch_norm - BatchNormalization set to train mode
2015-07-01 04:52:32,552 INFO:val_init - Generating AutoUniformValGen values of shape (3456, 256)
2015-07-01 04:52:32,602 INFO:batch_norm - BatchNormalization set to train mode
2015-07-01 04:52:32,639 INFO:val_init - Generating AutoUniformValGen values of shape (2304, 256)
2015-07-01 04:52:32,691 INFO:batch_norm - BatchNormalization set to train mode
2015-07-01 04:52:32,702 INFO:val_init - Generating AutoUniformValGen values of shape (4096, 9216)
2015-07-01 04:52:34,805 INFO:batch_norm - BatchNormalization set to train mode
2015-07-01 04:52:34,813 INFO:val_init - Generating AutoUniformValGen values of shape (4096, 4096)
2015-07-01 04:52:35,728 INFO:val_init - Generating AutoUniformValGen values of shape (1000, 4096)
Traceback (most recent call last):
File "/usr/local/bin/neon", line 240, in
experiment, result, status = main()
File "/usr/local/bin/neon", line 207, in main
experiment.initialize(backend)
File "/usr/local/lib/python2.7/dist-packages/neon/experiments/fit_predict_err.py", line 62, in initialize
super(FitPredictErrorExperiment, self).initialize(backend)
File "/usr/local/lib/python2.7/dist-packages/neon/experiments/fit.py", line 62, in initialize
self.model.initialize(backend)
File "/usr/local/lib/python2.7/dist-packages/neon/models/mlp.py", line 68, in initialize
dtype=self.layers[1].deltas_dtype)
File "/usr/local/lib/python2.7/dist-packages/neon/backends/cc2.py", line 536, in zeros
dtype=dtype)),
MemoryError
Is memory size a problem? tegra k1 has 2GB memory. or just something else lead to this problem? any advice for me to check out what happened?
from neon.
Try reducing your batch size down to 32 and see if the problem still
exists. If it runs then you probably don't have enough memory to train at
mb=128
Is there any particular reason you are using this system to train? You
would get much better performance by using a more standard graphics card.
On Monday, June 29, 2015, yuehusile [email protected] wrote:
thanks scttl! editing neon/backends/init.py works, but I still can't
run this sample because another erro appears:
root@tegra-ubuntu:/home/hsl/neon# neon --gpu cudanet
examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.util.persist:deserializing object from:
examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.datasets.imageset:Imageset initialized with dtype
2015-07-01 04:52:29,170 WARNING:neon - setting log level to: 20
2015-07-01 04:52:31,733 INFO:init - Cudanet backend, RNG seed: None,
numerr: None
2015-07-01 04:52:31,735 INFO:mlp - Layers:
ImageDataLayer d0: 3 x (224 x 224) nodes
ConvLayer conv1: 3 x (224 x 224) inputs, 64 x (55 x 55) nodes, RectLin
act_fn
PoolingLayer pool1: 64 x (55 x 55) inputs, 64 x (27 x 27) nodes, Linear
act_fn
ConvLayer conv2: 64 x (27 x 27) inputs, 192 x (27 x 27) nodes, RectLin
act_fn
PoolingLayer pool2: 192 x (27 x 27) inputs, 192 x (13 x 13) nodes, Linear
act_fn
ConvLayer conv3: 192 x (13 x 13) inputs, 384 x (13 x 13) nodes, RectLin
act_fn
ConvLayer conv4: 384 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin
act_fn
ConvLayer conv5: 256 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin
act_fn
PoolingLayer pool3: 256 x (13 x 13) inputs, 256 x (6 x 6) nodes, Linear
act_fn
FCLayer fc4096a: 9216 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout1: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc4096b: 4096 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout2: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc1000: 4096 inputs, 1000 nodes, Softmax act_fn
CostLayer cost: 1000 nodes, CrossEntropy cost_fn2015-07-01 04:52:31,738 INFO:batch_norm - BatchNormalization set to train
mode
2015-07-01 04:52:32,228 INFO:val_init - Generating AutoUniformValGen
values of shape (363, 64)
2015-07-01 04:52:32,254 INFO:batch_norm - BatchNormalization set to train
mode
2015-07-01 04:52:32,340 INFO:val_init - Generating AutoUniformValGen
values of shape (1600, 192)
2015-07-01 04:52:32,370 INFO:batch_norm - BatchNormalization set to train
mode
2015-07-01 04:52:32,432 INFO:val_init - Generating AutoUniformValGen
values of shape (1728, 384)
2015-07-01 04:52:32,506 INFO:batch_norm - BatchNormalization set to train
mode
2015-07-01 04:52:32,552 INFO:val_init - Generating AutoUniformValGen
values of shape (3456, 256)
2015-07-01 04:52:32,602 INFO:batch_norm - BatchNormalization set to train
mode
2015-07-01 04:52:32,639 INFO:val_init - Generating AutoUniformValGen
values of shape (2304, 256)
2015-07-01 04:52:32,691 INFO:batch_norm - BatchNormalization set to train
mode
2015-07-01 04:52:32,702 INFO:val_init - Generating AutoUniformValGen
values of shape (4096, 9216)
2015-07-01 04:52:34,805 INFO:batch_norm - BatchNormalization set to train
mode
2015-07-01 04:52:34,813 INFO:val_init - Generating AutoUniformValGen
values of shape (4096, 4096)
2015-07-01 04:52:35,728 INFO:val_init - Generating AutoUniformValGen
values of shape (1000, 4096)
Traceback (most recent call last):
File "/usr/local/bin/neon", line 240, in
experiment, result, status = main()
File "/usr/local/bin/neon", line 207, in main
experiment.initialize(backend)
File
"/usr/local/lib/python2.7/dist-packages/neon/experiments/fit_predict_err.py",
line 62, in initialize
super(FitPredictErrorExperiment, self).initialize(backend)
File "/usr/local/lib/python2.7/dist-packages/neon/experiments/fit.py",
line 62, in initialize
self.model.initialize(backend)
File "/usr/local/lib/python2.7/dist-packages/neon/models/mlp.py", line 68,
in initialize
dtype=self.layers[1].deltas_dtype)
File "/usr/local/lib/python2.7/dist-packages/neon/backends/cc2.py", line
536, in zeros
dtype=dtype)),
MemoryErrorIs memory size a problem? tegra k1 has 2GB memory. or just something else
lead to this problem? any advice for me to check out what happened?—
Reply to this email directly or view it on GitHub
#51 (comment).
from neon.
Related Issues (20)
- has Neon Framework development stopped? Last commit is on 9th feb. HOT 2
- MKL is not installed correctly
- MKL exception
- Neon model weights
- Does mkl backend optimize numpy? HOT 1
- OSError: libmklml_gnu.so: cannot open shared object file: No such file or directory
- Object identification based on images
- OneHot Indices issues
- Makefile Cython Version Outdated
- Misclassification error of example mnist _mlp.py with GPU backend is abnormally high
- Where is Aeon? HOT 2
- Aeon of earlier version cannot found
- video_3d_demo
- how to see symbol of core dump from neon failed HOT 1
- Feature Request: Add Mish activation function
- pip install failed, posix-ipc using sys/time.h failed HOT 2
- IndexError: index 5000 is out of bounds for axis 0 with size 5000
- IndexError: index 200 is out of bounds for axis 0 with size 2
- About installation HOT 1
- No module named neon.util.compat
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neon.