nervanasystems / neon Goto Github PK
View Code? Open in Web Editor NEWIntel® Nervana™ reference deep learning framework committed to best performance on all hardware
Home Page: http://neon.nervanasys.com/docs/latest
License: Apache License 2.0
Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
Home Page: http://neon.nervanasys.com/docs/latest
License: Apache License 2.0
while I am runung 'neon --gpu nervanagpu examples/convnet/i1k-alexnet-fp32.yaml
' on training about 50 hours , I had to restart my computer, when I resume the same command, the error happens, here the log:
ubgpu@ubgpu:~/github/neon/neon$ neon --gpu nervanagpu examples/convnet/i1k-alexnet-fp32.yaml
[sudo] password for ubgpu:
WARNING:neon.util.persist:deserializing object from: examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.datasets.imageset:Imageset initialized with dtype <type 'numpy.float32'>
2015-05-18 20:44:17,102 WARNING:neon - setting log level to: 20
2015-05-18 20:44:17,233 INFO:gpu - Initialized NervanaGPU with stochastic_round=None
2015-05-18 20:44:17,233 INFO:gpu - Seeding random number generator with: None
2015-05-18 20:44:17,234 INFO:init - NervanaGPU backend, RNG seed: None, numerr: None
2015-05-18 20:44:17,234 INFO:mlp - Layers:
ImageDataLayer d0: 3 x (224 x 224) nodes
ConvLayer conv1: 3 x (224 x 224) inputs, 64 x (55 x 55) nodes, RectLin act_fn
PoolingLayer pool1: 64 x (55 x 55) inputs, 64 x (27 x 27) nodes, Linear act_fn
ConvLayer conv2: 64 x (27 x 27) inputs, 192 x (27 x 27) nodes, RectLin act_fn
PoolingLayer pool2: 192 x (27 x 27) inputs, 192 x (13 x 13) nodes, Linear act_fn
ConvLayer conv3: 192 x (13 x 13) inputs, 384 x (13 x 13) nodes, RectLin act_fn
ConvLayer conv4: 384 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin act_fn
ConvLayer conv5: 256 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin act_fn
PoolingLayer pool3: 256 x (13 x 13) inputs, 256 x (6 x 6) nodes, Linear act_fn
FCLayer fc4096a: 9216 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout1: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc4096b: 4096 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout2: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc1000: 4096 inputs, 1000 nodes, Softmax act_fn
CostLayer cost: 1000 nodes, CrossEntropy cost_fn
2015-05-18 20:44:17,234 INFO:batch_norm - BatchNormalization set to train mode
2015-05-18 20:44:17,236 INFO:val_init - Generating AutoUniformValGen values of shape (363, 64)
2015-05-18 20:44:17,237 INFO:batch_norm - BatchNormalization set to train mode
2015-05-18 20:44:17,238 INFO:val_init - Generating AutoUniformValGen values of shape (1600, 192)
2015-05-18 20:44:17,244 INFO:batch_norm - BatchNormalization set to train mode
2015-05-18 20:44:17,245 INFO:val_init - Generating AutoUniformValGen values of shape (1728, 384)
2015-05-18 20:44:17,255 INFO:batch_norm - BatchNormalization set to train mode
2015-05-18 20:44:17,256 INFO:val_init - Generating AutoUniformValGen values of shape (3456, 256)
2015-05-18 20:44:17,269 INFO:batch_norm - BatchNormalization set to train mode
2015-05-18 20:44:17,270 INFO:val_init - Generating AutoUniformValGen values of shape (2304, 256)
2015-05-18 20:44:17,279 INFO:batch_norm - BatchNormalization set to train mode
2015-05-18 20:44:17,280 INFO:val_init - Generating AutoUniformValGen values of shape (4096, 9216)
2015-05-18 20:44:17,748 INFO:batch_norm - BatchNormalization set to train mode
2015-05-18 20:44:17,749 INFO:val_init - Generating AutoUniformValGen values of shape (4096, 4096)
2015-05-18 20:44:17,959 INFO:val_init - Generating AutoUniformValGen values of shape (1000, 4096)
2015-05-18 20:44:18,016 INFO:fit - Unable to find saved model /home/ubgpu/data/I1K/I1K_alexnet_fp32_model.prm, starting over
2015-05-18 20:44:18,017 INFO:mlp - commencing model fitting
Traceback (most recent call last):
File "/usr/local/bin/neon", line 199, in
experiment, result, status = main()
File "/usr/local/bin/neon", line 168, in main
result = experiment.run()
File "/usr/local/lib/python2.7/dist-packages/neon/experiments/fit_predict_err.py", line 97, in run
super(FitPredictErrorExperiment, self).run()
File "/usr/local/lib/python2.7/dist-packages/neon/experiments/fit.py", line 99, in run
self.model.fit(self.dataset)
File "/usr/local/lib/python2.7/dist-packages/neon/models/mlp.py", line 141, in fit
self.fprop()
File "/usr/local/lib/python2.7/dist-packages/neon/models/mlp.py", line 81, in fprop
ll.fprop(y)
File "/usr/local/lib/python2.7/dist-packages/neon/layers/layer.py", line 373, in fprop
self.batch_idx)
File "/usr/local/lib/python2.7/dist-packages/neon/datasets/imageset.py", line 314, in get_mini_batch
self.backend.subtract(self.inp_be, self.mean_be, self.inp_be)
File "/usr/local/lib/python2.7/dist-packages/neon/backends/gpu.py", line 643, in subtract
self.ng.subtract(left, right, out=out)
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/nervanagpu.py", line 801, in subtract
def subtract (self, a, b, out=None): return OpTreeNode.build("sub", a, b, out=out)
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/nervanagpu.py", line 915, in build
return OpTreeNode({ "op" : "assign" }, out, node).execute()
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/nervanagpu.py", line 924, in execute
return call_compound_kernel(_get_rand_state(), _stack)
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/float_ew.py", line 835, in call_compound_kernel
kernel = _get_compound_kernel(tuple(type_args))
File "", line 2, in _get_compound_kernel
File "/usr/local/lib/python2.7/dist-packages/pycuda/tools.py", line 423, in context_dependent_memoize
result = func(_args)
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/float_ew.py", line 670, in _get_compound_kernel
module = _get_module(template, template_vals)
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/float_ew.py", line 313, in _get_module
return SourceModule(code, options=["--use_fast_math" ], keep=False) #,"-G"
File "/usr/local/lib/python2.7/dist-packages/pycuda/compiler.py", line 251, in init
arch, code, cache_dir, include_dirs)
File "/usr/local/lib/python2.7/dist-packages/pycuda/compiler.py", line 241, in compile
return compile_plain(source, options, keep, nvcc, cache_dir)
File "/usr/local/lib/python2.7/dist-packages/pycuda/compiler.py", line 73, in compile_plain
checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))
File "/usr/local/lib/python2.7/dist-packages/pycuda/compiler.py", line 47, in preprocess_source
result, stdout, stderr = call_capture_output(cmdline, error_on_nonzero=False)
File "/usr/local/lib/python2.7/dist-packages/pytools/prefork.py", line 197, in call_capture_output
return forker[0].call_capture_output(cmdline, cwd, error_on_nonzero)
File "/usr/local/lib/python2.7/dist-packages/pytools/prefork.py", line 54, in call_capture_output
% ( " ".join(cmdline), e))
pytools.prefork.ExecError: error invoking 'nvcc --preprocess --use_fast_math -arch sm_52 -I/usr/local/lib/python2.7/dist-packages/pycuda/cuda /tmp/tmpltSwQ9.cu --compiler-options -P': [Errno 2] No such file or directory
We're starting to embark upon a fairly substantial refactoring of the neon codebase in an effort to make it easier to use, and clean out some of the accumulated cruft.
We'd like to use this issue to get feedback from the broader community, and give a heads up that changes are on their way. What do you like, not like? Some of the things we're initially considering include:
Hi, would you please support RMSProp as learning rule?
It would be good to be able to load both the coarse and fine labels.
I want a banana!
I realize that shuffling could be an essential part in NN. However, some people indicate that "If you are using a not-recurrent NN like a traditional MLP you don't NEED to shuffle dataset especially if you're using a batch learning algorithm".
I wonder whether there's a shuffling step before each epoch in neon? If not, how can I implement it?
Related articles:
Any datasets with a number of records that are not an exact multiple of batch_size
end up having the last batch dropped.
I only modify the file neon/examples/convnet/mnist-small.yaml
, setting batch_size: &bs 1
. The log reports that:
ValueError: shapes (400,1) and (32,1) not aligned: 1 (dim 1) != 32 (dim 0)
Due to the pre-allocation of intermediate matrices (like pre-activations), the settings used for batch sizes at training time, will automatically be expected at test time. This creates problems if we expect these to differ (ex. no mini-batch training where the batch size is the number of records).
i had this error when i ran 'make install'. i have a titan x that is working fine. below is my setup.cfg. any help is much appreciated.
sudo make install
No CUDA capable GPU installed. Forcing GPU=0
[neon]
CPU = 1
GPU = nervanagpu
DIST = 0
DEV = 0
I would like to understand if NEON supports data parallelization on multiple GPUs?
I want to use multiple GPUs in order to increase mini-batch size...
The second question if MPI parallelization can be used with -gpu option?
'pythON' in the project description and README.rst should rename to 'Python'
ubgpu@ubgpu:/github/neon$ . .venv/bin/activate/github/neon$ sudo make install
(.venv)ubgpu@ubgpu:
[sudo] password for ubgpu:
The directory '/home/ubgpu/.cache/pip/log' or its parent directory is not owned by the current user and the debug log has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/ubgpu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/ubgpu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.8.1 in /usr/lib/python2.7/dist-packages
Requirement already satisfied (use --upgrade to upgrade): PyYAML>=3.11 in /usr/local/lib/python2.7/dist-packages
The directory '/home/ubgpu/.cache/pip/log' or its parent directory is not owned by the current user and the debug log has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/ubgpu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/ubgpu/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already satisfied (use --upgrade to upgrade): nose>=1.3.0 in /usr/lib/python2.7/dist-packages
Collecting Pillow>=2.5.0
/usr/local/lib/python2.7/dist-packages/pip/vendor/requests/packages/urllib3/util/ssl.py:79: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
Downloading Pillow-2.8.1.tar.gz (9.0MB)
100% |鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 9.0MB 70kB/s
Collecting flake8>=2.2.2
Downloading flake8-2.4.0-py2.py3-none-any.whl
Collecting pep8-naming>=0.2.2
Downloading pep8_naming-0.2.2-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): sphinx>=1.2.2 in /usr/lib/python2.7/dist-packages
Collecting sphinxcontrib-napoleon>=0.2.8
Downloading sphinxcontrib_napoleon-0.3.4-py2.py3-none-any.whl (50kB)
100% |鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 53kB 276kB/s
Collecting scikit-learn>=0.15.2
Downloading scikit-learn-0.16.1.tar.gz (7.3MB)
100% |鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 7.3MB 81kB/s
Collecting matplotlib>=1.4.0
Downloading matplotlib-1.4.3.tar.gz (50.4MB)
100% |鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 50.4MB 14kB/s
Collecting imgworker>=0.2.5 from git+https://github.com/NervanaSystems/imgworker.git#egg=imgworker>=0.2.5
Cloning https://github.com/NervanaSystems/imgworker.git to /tmp/pip-build-b5tfve/imgworker
Collecting cudanet>=0.2.5 from git+https://github.com/NervanaSystems/cuda-convnet2.git#egg=cudanet>=0.2.5
Cloning https://github.com/NervanaSystems/cuda-convnet2.git to /tmp/pip-build-b5tfve/cudanet
Collecting pycuda>=2014.1
/usr/local/lib/python2.7/dist-packages/pip/vendor/requests/packages/urllib3/util/ssl.py:79: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
Downloading pycuda-2014.1.tar.gz (1.6MB)
100% |鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 1.6MB 117kB/s
Complete output from command python setup.py egg_info:
*** WARNING: nvcc not in path.
*************************************************************
*** I have detected that you have not run configure.py.
*************************************************************
*** Additionally, no global config files were found.
*** I will go ahead with the default configuration.
*** In all likelihood, this will not work out.
***
*** See README_SETUP.txt for more information.
***
*** If the build does fail, just re-run configure.py with the
*** correct arguments, and then retry. Good luck!
*************************************************************
*** HIT Ctrl-C NOW IF THIS IS NOT WHAT YOU WANT
*************************************************************
Continuing in 1 seconds...
Traceback (most recent call last):
File "", line 20, in
File "/tmp/pip-build-b5tfve/pycuda/setup.py", line 216, in
main()
File "/tmp/pip-build-b5tfve/pycuda/setup.py", line 88, in main
conf["CUDA_INC_DIR"] = [join(conf["CUDA_ROOT"], "include")]
File "/usr/lib/python2.7/posixpath.py", line 77, in join
elif path == '' or path.endswith('/'):
AttributeError: 'NoneType' object has no attribute 'endswith'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-b5tfve/pycuda
make: *** [deps_install] Error 1
(.venv)ubgpu@ubgpu:~/github/neon$
i've installed nervanagpu sucessfully,when i run "neon --gpu nervanagpu examples/convnet/i1k-alexnet-fp32.yaml" mistakes happens as below:
dsp@dsp:~/neon$ neon --gpu nervanagpu examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.util.persist:deserializing object from: examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.datasets.imageset:Imageset initialized with dtype <type 'numpy.float32'>
2015-06-15 22:35:55,300 WARNING:neon - setting log level to: 20
2015-06-15 22:35:55,385 INFO:gpu - Initialized NervanaGPU with stochastic_round=None
2015-06-15 22:35:55,385 INFO:gpu - Seeding random number generator with: None
2015-06-15 22:35:55,386 INFO:init - NervanaGPU backend, RNG seed: None, numerr: None
2015-06-15 22:35:55,386 INFO:mlp - Layers:
ImageDataLayer d0: 3 x (224 x 224) nodes
ConvLayer conv1: 3 x (224 x 224) inputs, 64 x (55 x 55) nodes, RectLin act_fn
PoolingLayer pool1: 64 x (55 x 55) inputs, 64 x (27 x 27) nodes, Linear act_fn
ConvLayer conv2: 64 x (27 x 27) inputs, 192 x (27 x 27) nodes, RectLin act_fn
PoolingLayer pool2: 192 x (27 x 27) inputs, 192 x (13 x 13) nodes, Linear act_fn
ConvLayer conv3: 192 x (13 x 13) inputs, 384 x (13 x 13) nodes, RectLin act_fn
ConvLayer conv4: 384 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin act_fn
ConvLayer conv5: 256 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin act_fn
PoolingLayer pool3: 256 x (13 x 13) inputs, 256 x (6 x 6) nodes, Linear act_fn
FCLayer fc4096a: 9216 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout1: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc4096b: 4096 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout2: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc1000: 4096 inputs, 1000 nodes, Softmax act_fn
CostLayer cost: 1000 nodes, CrossEntropy cost_fn
2015-06-15 22:35:55,386 INFO:batch_norm - BatchNormalization set to train mode
Traceback (most recent call last):
File "/home/dsp/anaconda/bin/neon", line 6, in
exec(compile(open(file).read(), file, 'exec'))
File "/home/dsp/neon/bin/neon", line 240, in
experiment, result, status = main()
File "/home/dsp/neon/bin/neon", line 207, in main
experiment.initialize(backend)
File "/home/dsp/neon/neon/experiments/fit_predict_err.py", line 62, in initialize
super(FitPredictErrorExperiment, self).initialize(backend)
File "/home/dsp/neon/neon/experiments/fit.py", line 62, in initialize
self.model.initialize(backend)
File "/home/dsp/neon/neon/models/mlp.py", line 61, in initialize
ll.initialize(kwargs)
File "/home/dsp/neon/neon/layers/convolutional.py", line 39, in initialize
super(ConvLayer, self).initialize(kwargs)
File "/home/dsp/neon/neon/layers/layer.py", line 479, in initialize
self.bn.initialize(kwargs)
File "/home/dsp/neon/neon/transforms/batch_norm.py", line 90, in initialize
self._xhat = self.backend.zeros(self.in_shape, dtype=self.dtype)
File "/home/dsp/neon/neon/backends/gpu.py", line 582, in zeros
return self.ng.zeros(shape, dtype=dtype)
File "/home/dsp/anaconda/lib/python2.7/site-packages/nervanagpu/nervanagpu.py", line 483, in zeros
name=name, rounding=self.round_mode)._assign(0)
File "/home/dsp/anaconda/lib/python2.7/site-packages/nervanagpu/nervanagpu.py", line 298, in _assign
drv.memset_d32_async(self.gpudata,
AttributeError: 'module' object has no attribute 'memset_d32_async'
Makefile:
define list_includes
endef
David-Laxers-MacBook-Pro:nervanagpu davidlaxer$ sudo !!
sudo make all
The directory '/Users/davidlaxer/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
installing maxas...
/tmp/nervanagpu.XXXXXXXX.waaER91X
Cloning into 'maxas'...
remote: Counting objects: 170, done.
remote: Total 170 (delta 0), reused 0 (delta 0), pack-reused 170
Receiving objects: 100% (170/170), 163.15 KiB | 0 bytes/s, done.
Resolving deltas: 100% (67/67), done.
Checking connectivity... done.
Checking if your kit is complete...
Looks good
Warning: prerequisite Carp 1.29 not found. We have 1.26.
Warning: prerequisite Data::Dumper 2.145 not found. We have 2.13506.
Writing Makefile for MaxAs::MaxAs
Writing MYMETA.yml and MYMETA.json
cp lib/MaxAs/MaxAs.pm blib/lib/MaxAs/MaxAs.pm
cp lib/MaxAs/Cubin.pm blib/lib/MaxAs/Cubin.pm
cp lib/MaxAs/MaxAsGrammar.pm blib/lib/MaxAs/MaxAsGrammar.pm
cp bin/maxas.pl blib/script/maxas.pl
/opt/local/bin/perl5.16 -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/maxas.pl
Manifying blib/man3/MaxAs::MaxAs.3pm
Appending installation info to /opt/local/lib/perl5/5.16.3/darwin-thread-multi-2level/perllocal.pod
sed: illegal option -- r
usage: sed script [-Ealn] [-i extension] [file ...]
sed [-Ealn] [-i extension] [-e script] ... [-f script_file] ... [file ...]
building kernel: hgemm_nn_128x128...
make: maxas.pl: No such file or directory
make: *** [nervanagpu/kernels/cubin/hgemm_nn_128x128.cubin] Error 1
David-Laxers-MacBook-Pro:nervanagpu davidlaxer$ sed -r
sed: illegal option -- r
usage: sed script [-Ealn] [-i extension] [file ...]
sed [-Ealn] [-i extension] [-e script] ... [-f script_file] ... [file ...]
I would like to use my own text data to train a model. I have read the information here http://neon.nervanasys.com/docs/latest/datasets.html
(1)Unfortunately I haven't understood how to transform text data into the required format, it seems that the datasets should be available with URL information. My data is in my local disk.
(2) What is the required data format, especially for text data?
Any help is more than welcome. Thanks.
Attempts to save predictions (predictions: ['train', 'test']
) or model parameters (serialized_path: "my_model.prm"
) on either the RNN or LSTM example networks result in a ValueError being raised instead of successful saving:
Traceback (most recent call last):
File "bin/neon", line 240, in <module>
experiment, result, status = main()
File "bin/neon", line 208, in main
result = experiment.run()
File "/home/users/scott/repo/neon/neon/experiments/fit_predict_err.py", line 114, in run
pred_set)
File "/home/users/scott/repo/neon/neon/models/mlp.py", line 241, in predict_fullset
reference[:, start:end] = self.cost_layer.get_reference()
File "/home/users/scott/repo/neon/neon/backends/cpu.py", line 157, in __setitem__
self._tensor[clean_key] = np.reshape(self._clean(value), req_shape)
File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 217, in reshape
return _wrapit(a, 'reshape', newshape, order=order)
File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: total size of new array must be unchanged
Add image transformations (rotation, cropping, scaling etc.) to the ImageSet loading pipeline.
I've been making Docker images for deep learning libraries, with CPU-only and CUDA-enabled versions. I've successfully made the base version of neon , but my CUDA version doesn't get built properly. The issue in the Makefile is that nvidia-smi
is used to check that a CUDA-enabled GPU is available, and whilst this isn't available in Docker build environments it has been possible to build libraries with the installed SDK. If a GPU isn't needed for the build, is it possible to change this test to check for something that is e.g. nvcc --version
?
I trained a mlp on CIFAR10, and deserialized it in a later script. (That step works fine, I have the correct weights and everything I need.) When I call mlp.predict_fullset, cudanet raises the error:
Traceback (most recent call last):
File "11_merge_predictions.py", line 202, in
mlp.predict_fullset(data, 'test')
File "/home/ubuntu/neon/neon/models/mlp.py", line 245, in predict_fullset
reference[:, start:end] = batch_refs
File "/home/ubuntu/neon/neon/backends/cc2.py", line 265, in setitem
self._tensor.set_col_slice(start, stop, value)
File "/home/ubuntu/cuda-convnet2/cudanet/cudanet.py", line 767, in set_col_slice
raise generate_exception(err_code)
cudanet.cudanet.CUDANetException: Incompatible matrix dimensions.
I am again predicting on the test set of CIFAR10, and the GPU I use is the one on AWS. (~4gb) It is also the same as the one I used for training. Further more, mlp.predict_generator works fine.
either rectleaky should be implemented here https://github.com/NervanaSystems/neon/blob/master/neon/backends/gpu.py
or the rectleaky transform shouldn't be included here
https://github.com/NervanaSystems/neon/blob/master/examples/convnet/synthetic-sanity_check.yaml
Installation requires several configuration options and dependencies, some of which must be built from separate repos. Encapsulating neon within a Docker image would help newcomers get started, provide an isolated environment, and enable deployment portability.
Placeholder for supporting automatic differentiation
Is there a tutorial on how to use your own data? I have some data and I want to link it to a MLP for example. I would appreciate any help here.
I installed neon on my Macbook (no gpu) and tried the simple mnist-small.yaml example and got the following error trace. This is my first time using Neon!
WARNING:neon.util.persist:deserializing object from: examples/mlp/mnist-small.yaml
2015-05-26 21:04:47,289 WARNING:neon - setting log level to: 20
2015-05-26 21:04:47,290 INFO:cpu - Seeding random number generator with: None
2015-05-26 21:04:47,298 INFO:init - CPU backend, RNG seed: None, numerr: None
2015-05-26 21:04:47,298 INFO:mlp - Layers:
DataLayer d0: 784 nodes
FCLayer h0: 784 inputs, 100 nodes, RectLin act_fn
FCLayer output: 100 inputs, 10 nodes, Logistic act_fn
CostLayer cost: 10 nodes, CrossEntropy cost_fn
2015-05-26 21:04:47,299 INFO:val_init - Generating GaussianValGen values of shape (100, 784)
2015-05-26 21:04:47,305 INFO:val_init - Generating GaussianValGen values of shape (10, 100)
2015-05-26 21:04:47,307 INFO:mnist - loading: train-images-idx3-ubyte
2015-05-26 21:04:47,667 INFO:mnist - loading: train-labels-idx1-ubyte
2015-05-26 21:04:47,675 INFO:mnist - loading: t10k-images-idx3-ubyte
2015-05-26 21:04:47,710 INFO:mnist - loading: t10k-labels-idx1-ubyte
2015-05-26 21:04:47,711 WARNING:dataset - Incompatible batch size. Discarding 16 samples...
2015-05-26 21:04:47,764 WARNING:dataset - Incompatible batch size. Discarding 96 samples...
2015-05-26 21:04:48,146 WARNING:dataset - Incompatible batch size. Discarding 16 samples...
2015-05-26 21:04:48,148 WARNING:dataset - Incompatible batch size. Discarding 96 samples...
2015-05-26 21:04:48,155 INFO:mlp - commencing model fitting
Traceback (most recent call last):
File "/usr/local/bin/neon", line 240, in
experiment, result, status = main()
File "/usr/local/bin/neon", line 208, in main
result = experiment.run()
File "/Library/Python/2.7/site-packages/neon/experiments/fit_predict_err.py", line 99, in run
super(FitPredictErrorExperiment, self).run()
File "/Library/Python/2.7/site-packages/neon/experiments/fit.py", line 102, in run
self.model.fit(self.dataset)
File "/Library/Python/2.7/site-packages/neon/models/mlp.py", line 156, in fit
self.backend.add(error, self.cost_layer.get_cost(), error)
File "/Library/Python/2.7/site-packages/neon/layers/layer.py", line 289, in get_cost
scale_by_batchsize=scale_cost)
File "/Library/Python/2.7/site-packages/neon/transforms/cross_entropy.py", line 237, in apply_function
scale_by_batchsize=scale_by_batchsize)
File "/Library/Python/2.7/site-packages/neon/transforms/cross_entropy.py", line 62, in cross_entropy
return backend.sum(temp[0], axes=None, out=result)
File "/Library/Python/2.7/site-packages/neon/backends/cpu.py", line 764, in sum
np.sum(tsr._tensor, axis=axes, out=out._tensor, keepdims=True)
TypeError: sum() got an unexpected keyword argument 'keepdims'
Hi,
Just installed neon on Ubuntu 14 python 3.4 with the following command:
nir@nir-Satellite-Pro-A50-A:~/neon$ neon examples/mlp/mnist-small.yaml
and getting an error message:
Traceback (most recent call last):
File "/home/nir/anaconda3/bin/neon", line 240, in
experiment, result, status = main()
File "/home/nir/anaconda3/bin/neon", line 126, in main
experiment = deserialize(args.yaml_file)
File "/home/nir/anaconda3/lib/python3.4/site-packages/neon/util/persist.py", line 183, in deserialize
if not isinstance(load_path, file):
NameError: name 'file' is not defined
I check in the directory - this file exists.
Appreciate your assistance
thanks N
ubgpu@ubgpu:~/github/neon/neon$ neon --gpu nervanagpu examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.util.persist:deserializing object from: examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.datasets.imageset:Imageset initialized with dtype <type 'numpy.float32'>
2015-05-15 22:00:54,319 WARNING:neon - setting log level to: 20
2015-05-15 22:00:54,447 INFO:gpu - Initialized NervanaGPU with stochastic_round=None
2015-05-15 22:00:54,447 INFO:gpu - Seeding random number generator with: None
2015-05-15 22:00:54,448 INFO:init - NervanaGPU backend, RNG seed: None, numerr: None
2015-05-15 22:00:54,449 INFO:mlp - Layers:
ImageDataLayer d0: 3 x (224 x 224) nodes
ConvLayer conv1: 3 x (224 x 224) inputs, 64 x (55 x 55) nodes, RectLin act_fn
PoolingLayer pool1: 64 x (55 x 55) inputs, 64 x (27 x 27) nodes, Linear act_fn
ConvLayer conv2: 64 x (27 x 27) inputs, 192 x (27 x 27) nodes, RectLin act_fn
PoolingLayer pool2: 192 x (27 x 27) inputs, 192 x (13 x 13) nodes, Linear act_fn
ConvLayer conv3: 192 x (13 x 13) inputs, 384 x (13 x 13) nodes, RectLin act_fn
ConvLayer conv4: 384 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin act_fn
ConvLayer conv5: 256 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin act_fn
PoolingLayer pool3: 256 x (13 x 13) inputs, 256 x (6 x 6) nodes, Linear act_fn
FCLayer fc4096a: 9216 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout1: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc4096b: 4096 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout2: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc1000: 4096 inputs, 1000 nodes, Softmax act_fn
CostLayer cost: 1000 nodes, CrossEntropy cost_fn
2015-05-15 22:00:54,449 INFO:batch_norm - BatchNormalization set to train mode
2015-05-15 22:00:54,450 INFO:val_init - Generating AutoUniformValGen values of shape (363, 64)
2015-05-15 22:00:54,452 INFO:batch_norm - BatchNormalization set to train mode
2015-05-15 22:00:54,453 INFO:val_init - Generating AutoUniformValGen values of shape (1600, 192)
2015-05-15 22:00:54,458 INFO:batch_norm - BatchNormalization set to train mode
2015-05-15 22:00:54,459 INFO:val_init - Generating AutoUniformValGen values of shape (1728, 384)
2015-05-15 22:00:54,469 INFO:batch_norm - BatchNormalization set to train mode
2015-05-15 22:00:54,470 INFO:val_init - Generating AutoUniformValGen values of shape (3456, 256)
2015-05-15 22:00:54,483 INFO:batch_norm - BatchNormalization set to train mode
2015-05-15 22:00:54,484 INFO:val_init - Generating AutoUniformValGen values of shape (2304, 256)
2015-05-15 22:00:54,492 INFO:batch_norm - BatchNormalization set to train mode
2015-05-15 22:00:54,493 INFO:val_init - Generating AutoUniformValGen values of shape (4096, 9216)
2015-05-15 22:00:54,964 INFO:batch_norm - BatchNormalization set to train mode
2015-05-15 22:00:54,965 INFO:val_init - Generating AutoUniformValGen values of shape (4096, 4096)
2015-05-15 22:00:55,175 INFO:val_init - Generating AutoUniformValGen values of shape (1000, 4096)
2015-05-15 22:00:55,229 WARNING:imageset - Batch dir cache not found in /home/ubgpu/data/I1K/imageset_batches/dataset_cache.pkl:
Press Y to create, otherwise exit: Y
/usr/local/lib/python2.7/dist-packages/neon/util/batch_writer.py:137: RuntimeWarning: divide by zero encountered in log10
self.val_start = 10 ** int(np.log10(self.ntrain * 10))
Traceback (most recent call last):
File "/usr/local/bin/neon", line 199, in
experiment, result, status = main()
File "/usr/local/bin/neon", line 168, in main
result = experiment.run()
File "/usr/local/lib/python2.7/dist-packages/neon/experiments/fit_predict_err.py", line 97, in run
super(FitPredictErrorExperiment, self).run()
File "/usr/local/lib/python2.7/dist-packages/neon/experiments/fit.py", line 70, in run
self.dataset.load()
File "/usr/local/lib/python2.7/dist-packages/neon/datasets/imageset.py", line 176, in load
self.bw.run()
File "/usr/local/lib/python2.7/dist-packages/neon/util/batch_writer.py", line 215, in run
self.write_csv_files()
File "/usr/local/lib/python2.7/dist-packages/neon/util/batch_writer.py", line 137, in write_csv_files
self.val_start = 10 ** int(np.log10(self.ntrain * 10))
OverflowError: cannot convert float infinity to integer
ubgpu@ubgpu:~/github/neon/neon$
ubgpu@ubgpu:/github/neon$ neon --gpu nervanagpu examples/convnet/i1k-alexnet-fp32.yaml/github/neon$ which nvcc
WARNING:neon.util.persist:deserializing object from: examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.datasets.imageset:Imageset initialized with dtype <type 'numpy.float32'>
2015-05-11 01:13:02,940 WARNING:neon - setting log level to: 20
2015-05-11 01:13:02,945 WARNING:init - nervanagpu not found, can't run via GPU
Traceback (most recent call last):
File "/usr/local/bin/neon", line 199, in
experiment, result, status = main()
File "/usr/local/bin/neon", line 162, in main
device_id=args.device_id)
File "/usr/local/lib/python2.7/dist-packages/neon/backends/init.py", line 157, in gen_backend
raise RuntimeError("Can't find CUDA capable GPU")
RuntimeError: Can't find CUDA capable GPU
ubgpu@ubgpu:
/usr/local/cuda/bin/nvcc
ubgpu@ubgpu:~/github/neon$
my GPU is GTX970!!!
root@tegra-ubuntu:/home/hsl/neon# neon --gpu cudanet examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.util.persist:deserializing object from: examples/convnet/i1k-alexnet-fp32.yaml
WARNING:neon.datasets.imageset:Imageset initialized with dtype <type 'numpy.float32'>
2015-06-19 12:47:19,332 WARNING:neon - setting log level to: 20
Traceback (most recent call last):
File "/usr/local/bin/neon", line 240, in
experiment, result, status = main()
File "/usr/local/bin/neon", line 202, in main
device_id=args.device_id)
File "/usr/local/lib/python2.7/dist-packages/neon/backends/init.py", line 157, in gen_backend
raise RuntimeError("Can't find CUDA capable GPU")
RuntimeError: Can't find CUDA capable GPU
I use a tegra k1 GPU
sudo on ubuntu resets the $PATH, so even if you have cuda installed and /usr/local/cuda/bin in your PATH, the makefile can't find nvcc.
As I understand, currently metrics are only evaluated at the end of training session. Is there a way to monitor validation accuracy during training, so that I can cancel training session if results don't improve?
Similar question - are there any existing utilities to plot loss and accuracy over training session?
When I go to to https://sites.google.com/a/nervanasys.com/wiki/algorithms/neon/how-to-write-a-mylearn-model, I get a forbidden access (403) error. I can access the other links under documentation.
Is there a way to define the parallelism in the YAML file ? It could be useful for example when optimizing the hyperparameters and using all the GPUs in the machine. (I am not sure if doing hyperopt -n THREADS will spread the tasks across the GPUs)
I am thinking of adding new types of Dataset.
I follow the step described in the following link:
http://neon.nervanasys.com/docs/latest/datasets.html#adding-a-new-type-of-dataset
I also write a corresponding yaml file, which is "nb.yaml"
After running neon nb.yaml
, it shows that:
AttributeError: 'module' object has no attribute 'NB'
(my subclass is named NB)
What should I do next?
I am on Windows 8.1. When I do
pip install .
or
make install
using MinGW for compilation, I get this error:
neon/backends/flexpt_dtype.c:410:5: error: initializer element is not constant
PyObject_HEAD_INIT(&PyType_Type)
^
neon/backends/flexpt_dtype.c:410:5: error: (near initialization for 'PyFlexPt_Type.ob_type')
error: command 'd:\\MinGW\\bin\\gcc.exe' failed with exit status 1
Seems to be related to this: http://stackoverflow.com/questions/3025050/error-initializer-element-is-not-constant-when-trying-to-initialize-variable-w
I tried it with both Python 2.7 and 3.4 and keep getting the same error.
Having a way to create dataset splits (eg, validation split from the train set) in the YAML would be useful for reporting validation metrics, and to avoid over-fitting the test set when using spearmint.
while I am running the second phase(training) of
neon --gpu nervanagpu examples/convnet/i1k-alexnet-fp32.yaml
I start the first phase of
neon --gpu nervanagpu examples/convnet/i1k-alexnet-fp16.yaml
it is OK!
however,
while the second phase of
neon --gpu nervanagpu examples/convnet/i1k-alexnet-fp32.yaml
still ongoing, I launch the second phase of
neon --gpu nervanagpu examples/convnet/i1k-alexnet-fp16.yaml
it reports the error:
ubgpu@ubgpu:~/github/neon/neon$ neon --gpu nervanagpu examples/convnet/i1k-alexnet-fp16.yaml
WARNING:neon.util.persist:deserializing object from: examples/convnet/i1k-alexnet-fp16.yaml
WARNING:neon.datasets.imageset:Imageset initialized with dtype <type 'numpy.float16'>
2015-05-17 14:50:39,937 WARNING:neon - setting log level to: 20
2015-05-17 14:50:40,856 INFO:gpu - Initialized NervanaGPU with stochastic_round=None
2015-05-17 14:50:40,856 INFO:gpu - Seeding random number generator with: None
2015-05-17 14:50:40,857 INFO:init - NervanaGPU backend, RNG seed: None, numerr: None
2015-05-17 14:50:40,858 INFO:mlp - Layers:
ImageDataLayer d0: 3 x (224 x 224) nodes
ConvLayer conv1: 3 x (224 x 224) inputs, 64 x (55 x 55) nodes, RectLin act_fn
PoolingLayer pool1: 64 x (55 x 55) inputs, 64 x (27 x 27) nodes, Linear act_fn
ConvLayer conv2: 64 x (27 x 27) inputs, 192 x (27 x 27) nodes, RectLin act_fn
PoolingLayer pool2: 192 x (27 x 27) inputs, 192 x (13 x 13) nodes, Linear act_fn
ConvLayer conv3: 192 x (13 x 13) inputs, 384 x (13 x 13) nodes, RectLin act_fn
ConvLayer conv4: 384 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin act_fn
ConvLayer conv5: 256 x (13 x 13) inputs, 256 x (13 x 13) nodes, RectLin act_fn
PoolingLayer pool3: 256 x (13 x 13) inputs, 256 x (6 x 6) nodes, Linear act_fn
FCLayer fc4096a: 9216 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout1: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc4096b: 4096 inputs, 4096 nodes, RectLin act_fn
DropOutLayer dropout2: 4096 inputs, 4096 nodes, Linear act_fn
FCLayer fc1000: 4096 inputs, 1000 nodes, Softmax act_fn
CostLayer cost: 1000 nodes, CrossEntropy cost_fn
2015-05-17 14:50:40,858 INFO:batch_norm - BatchNormalization set to train mode
2015-05-17 14:50:40,860 INFO:val_init - Generating AutoUniformValGen values of shape (363, 64)
2015-05-17 14:50:40,862 INFO:batch_norm - BatchNormalization set to train mode
2015-05-17 14:50:40,863 INFO:val_init - Generating AutoUniformValGen values of shape (1600, 192)
2015-05-17 14:50:40,870 INFO:batch_norm - BatchNormalization set to train mode
2015-05-17 14:50:40,871 INFO:val_init - Generating AutoUniformValGen values of shape (1728, 384)
2015-05-17 14:50:40,888 INFO:batch_norm - BatchNormalization set to train mode
2015-05-17 14:50:40,889 INFO:val_init - Generating AutoUniformValGen values of shape (3456, 256)
2015-05-17 14:50:40,914 INFO:batch_norm - BatchNormalization set to train mode
2015-05-17 14:50:40,915 INFO:val_init - Generating AutoUniformValGen values of shape (2304, 256)
2015-05-17 14:50:40,931 INFO:batch_norm - BatchNormalization set to train mode
2015-05-17 14:50:40,932 INFO:val_init - Generating AutoUniformValGen values of shape (4096, 9216)
2015-05-17 14:50:42,483 INFO:batch_norm - BatchNormalization set to train mode
2015-05-17 14:50:42,484 INFO:val_init - Generating AutoUniformValGen values of shape (4096, 4096)
2015-05-17 14:50:43,188 INFO:val_init - Generating AutoUniformValGen values of shape (1000, 4096)
2015-05-17 14:50:43,391 INFO:fit - Unable to find saved model /home/ubgpu/data/I1K/I1K_alexnet_fp16_model.prm, starting over
2015-05-17 14:50:43,393 INFO:mlp - commencing model fitting
Traceback (most recent call last):
File "/usr/local/bin/neon", line 199, in
experiment, result, status = main()
File "/usr/local/bin/neon", line 168, in main
result = experiment.run()
File "/usr/local/lib/python2.7/dist-packages/neon/experiments/fit_predict_err.py", line 97, in run
super(FitPredictErrorExperiment, self).run()
File "/usr/local/lib/python2.7/dist-packages/neon/experiments/fit.py", line 99, in run
self.model.fit(self.dataset)
File "/usr/local/lib/python2.7/dist-packages/neon/models/mlp.py", line 141, in fit
self.fprop()
File "/usr/local/lib/python2.7/dist-packages/neon/models/mlp.py", line 81, in fprop
ll.fprop(y)
File "/usr/local/lib/python2.7/dist-packages/neon/layers/layer.py", line 373, in fprop
self.batch_idx)
File "/usr/local/lib/python2.7/dist-packages/neon/datasets/imageset.py", line 314, in get_mini_batch
self.backend.subtract(self.inp_be, self.mean_be, self.inp_be)
File "/usr/local/lib/python2.7/dist-packages/neon/backends/gpu.py", line 643, in subtract
self.ng.subtract(left, right, out=out)
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/nervanagpu.py", line 801, in subtract
def subtract (self, a, b, out=None): return OpTreeNode.build("sub", a, b, out=out)
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/nervanagpu.py", line 915, in build
return OpTreeNode({ "op" : "assign" }, out, node).execute()
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/nervanagpu.py", line 924, in execute
return call_compound_kernel(_get_rand_state(), _stack)
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/float_ew.py", line 835, in call_compound_kernel
kernel = _get_compound_kernel(tuple(type_args))
File "", line 2, in _get_compound_kernel
File "/usr/local/lib/python2.7/dist-packages/pycuda/tools.py", line 423, in context_dependent_memoize
result = func(_args)
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/float_ew.py", line 670, in _get_compound_kernel
module = _get_module(template, template_vals)
File "/usr/local/lib/python2.7/dist-packages/nervanagpu/float_ew.py", line 313, in _get_module
return SourceModule(code, options=["--use_fast_math" ], keep=False) #,"-G"
File "/usr/local/lib/python2.7/dist-packages/pycuda/compiler.py", line 251, in init
arch, code, cache_dir, include_dirs)
File "/usr/local/lib/python2.7/dist-packages/pycuda/compiler.py", line 241, in compile
return compile_plain(source, options, keep, nvcc, cache_dir)
File "/usr/local/lib/python2.7/dist-packages/pycuda/compiler.py", line 73, in compile_plain
checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))
File "/usr/local/lib/python2.7/dist-packages/pycuda/compiler.py", line 47, in preprocess_source
result, stdout, stderr = call_capture_output(cmdline, error_on_nonzero=False)
File "/usr/local/lib/python2.7/dist-packages/pytools/prefork.py", line 197, in call_capture_output
return forker[0].call_capture_output(cmdline, cwd, error_on_nonzero)
File "/usr/local/lib/python2.7/dist-packages/pytools/prefork.py", line 54, in call_capture_output
% ( " ".join(cmdline), e))
pytools.prefork.ExecError: error invoking 'nvcc --preprocess --use_fast_math -arch sm_52 -I/usr/local/lib/python2.7/dist-packages/pycuda/cuda /tmp/tmpIRAwNd.cu --compiler-options -P': [Errno 2] No such file or directory
ubgpu@ubgpu:~/github/neon/neon$
2015-05-25 01:59:04,478 INFO:mlp - commencing model fitting
2015-05-25 01:59:52,257 INFO:mlp - 0.0 training error: 7063.82959
2015-05-25 01:59:59,646 INFO:mlp - 0.1 training error: 6882.83252---!!!!!!
2015-05-25 02:00:07,203 INFO:mlp - 0.2 training error: 6941.65332
2015-05-25 02:00:14,705 INFO:mlp - 0.3 training error: 6853.07715
2015-05-25 02:00:22,345 INFO:mlp - 0.4 training error: 6767.80225
2015-05-25 02:00:29,725 INFO:mlp - 0.5 training error: 6804.38818
2015-05-25 02:00:40,553 INFO:mlp - 0.6 training error: 6707.24219
2015-05-25 02:00:54,380 INFO:mlp - 0.7 training error: 6803.98682
2015-05-25 02:01:07,099 INFO:mlp - 0.8 training error: 6675.78418
2015-05-25 02:01:20,458 INFO:mlp - 0.9 training error: 6502.79785
2015-05-25 02:01:33,850 INFO:mlp - 0.10 training error: 6715.76221----!!!!!!
2015-05-25 02:01:47,022 INFO:mlp - 0.11 training error: 6435.54492
2015-05-25 02:01:59,376 INFO:mlp - 0.12 training error: 6411.22900
as we can see:
the output '0.1 training error' at beginning and '0.10 training error' later,it doesn't make sense
should be the output '0.01 training error' at beginning
The fprop() function computes the derivative of the activation function. This is unnecessary during inference.
The version of neon on the demo box needs to be updated to the most recent version.
Hi,
I am trying to let NEON use cudanet as its backend. I already built cudanet from source and install the libraries into /usr/local/lib (and it is in LD_LIBRARY_PATH). But when I try to build neon with gpu=cudanet, it still tries to download cudanet and build it again. The build will fail because it can not find the helper_cuda.h, which i specified manually when building cudanet from source on my own. So I am wondering how to let NEON recognize already-built cudanet ?
Also, in NEON's Makefile, I found this
ifeq ($(GPU), cudanet)
INSTALL_REQUIRES := $(INSTALL_REQUIRES) \
'git+https://github.com/NervanaSystems/cuda-convnet2.git\#egg=cudanet>=0.2.7' \
'pycuda>=2014.1'
endif
So it seems NEON is trying to check whether cudanet is installed as a module within python. I checked with pip freeze | grep cudanet and nothing shows up. So I need to install cudanet model into python. Any thoughts on how to achieve that ?
Many thanks in advance.
I tried to checkout this branch, and run the example/convnet/mnist-small.yaml. But there are even indentation errors in the file:
WARNING:neon.util.persist:deserializing object from: mnist-small.yaml
2015-06-26 11:26:21,774 WARNING:neon - setting log level to: 20
Traceback (most recent call last):
File "/home/zxx/Install/python2.7.9/bin/neon", line 6, in
exec(compile(open(file).read(), file, 'exec'))
File "/home/zxx/neon_dev/bin/neon", line 236, in
experiment, result, status = main()
File "/home/zxx/neon_dev/bin/neon", line 198, in main
device_id=args.device_id)
File "/home/zxx/neon_dev/neon/backends/init.py", line 98, in gen_backend
from neon.backends.cc2 import GPU
File "/home/zxx/neon_dev/neon/backends/cc2.py", line 1367
def update_fc_bias(self, err, out):
^
IndentationError: unindent does not match any outer indentation level
Does that indicate the master is the only stable branch we could use?
Several objects now have separate initialization() methods that need to be called in a specific order after initial yaml parsing and object construction.
We should re-think how this is done to reduce the amount of coupling now present.
I have a K40, and the install neon succeed.
I have tested the installation by run the mnist-small.yaml on GPU and here is the output. I think I have installed neon succeed.
(venv)↪ /home/gys/ocz/NervanaSys/neon/examples/convnet git:(master) ▸
↪ neon -g cudanet mnist-small.yaml
WARNING:neon.util.persist:deserializing object from: mnist-small.yaml
2015-05-18 14:38:13,823 WARNING:neon - setting log level to: 20
2015-05-18 14:38:14,048 INFO:__init__ - Cudanet backend, RNG seed: None, numerr: None
2015-05-18 14:38:14,049 INFO:mlp - Layers:
DataLayer d0: 784 nodes
ConvLayer layer1: 1 x (28 x 28) inputs, 16 x (24 x 24) nodes, Linear act_fn
PoolingLayer layer2: 16 x (24 x 24) inputs, 16 x (12 x 12) nodes, Linear act_fn
ConvLayer layer3: 16 x (12 x 12) inputs, 32 x (8 x 8) nodes, Linear act_fn
PoolingLayer layer4: 32 x (8 x 8) inputs, 32 x (4 x 4) nodes, Linear act_fn
FCLayer layer5: 512 inputs, 500 nodes, RectLin act_fn
FCLayer output: 500 inputs, 10 nodes, Logistic act_fn
CostLayer cost: 10 nodes, CrossEntropy cost_fn
2015-05-18 14:38:14,051 INFO:val_init - Generating UniformValGen values of shape (25, 16)
2015-05-18 14:38:14,052 INFO:val_init - Generating UniformValGen values of shape (400, 32)
2015-05-18 14:38:14,053 INFO:val_init - Generating UniformValGen values of shape (500, 512)
2015-05-18 14:38:14,057 INFO:val_init - Generating UniformValGen values of shape (10, 500)
2015-05-18 14:38:14,059 INFO:mnist - loading: train-images-idx3-ubyte
2015-05-18 14:38:14,106 INFO:mnist - loading: train-labels-idx1-ubyte
2015-05-18 14:38:14,108 INFO:mnist - loading: t10k-images-idx3-ubyte
2015-05-18 14:38:14,117 INFO:mnist - loading: t10k-labels-idx1-ubyte
2015-05-18 14:38:14,127 WARNING:dataset - Incompatible batch size. Discarding 16 samples...
2015-05-18 14:38:14,158 WARNING:dataset - Incompatible batch size. Discarding 112 samples...
2015-05-18 14:38:14,176 WARNING:dataset - Incompatible batch size. Discarding 16 samples...
2015-05-18 14:38:14,179 WARNING:dataset - Incompatible batch size. Discarding 112 samples...
2015-05-18 14:38:14,182 INFO:mlp - commencing model fitting
2015-05-18 14:38:14,329 INFO:mlp - epoch: 0, training error: 2.67295
2015-05-18 14:38:14,477 INFO:mlp - epoch: 1, training error: 0.73958
2015-05-18 14:38:14,625 INFO:mlp - epoch: 2, training error: 0.44819
2015-05-18 14:38:14,773 INFO:mlp - epoch: 3, training error: 0.33237
2015-05-18 14:38:14,921 INFO:mlp - epoch: 4, training error: 0.25886
2015-05-18 14:38:14,967 INFO:fit_predict_err - test set MisclassPercentage_TOP_1 3.65585
2015-05-18 14:38:14,995 INFO:fit_predict_err - train set MisclassPercentage_TOP_1 2.30978
But when I run the LSTM example by
neon -g cudanet mobydick-lstm-small.yaml
the issues is
2015-05-18 14:41:31,891 DEBUG:cc2 - Copying to GPU
2015-05-18 14:41:31,949 INFO:mlp - Layers:
DataLayer d0: 128 nodes
RecurrentLSTMLayer recurrent: 128 inputs, 64 nodes, Tanh act_fn
RecurrentOutputLayer output: 64 inputs, 128 nodes, Logistic act_fn
Layer cost: 128 nodes, CrossEntropy cost_fn, utilizing GPU backend
2015-05-18 14:41:31,949 INFO:rnn - DataLayer d0: 128 nodes
2015-05-18 14:41:31,949 INFO:rnn - RecurrentLSTMLayer recurrent: 128 inputs, 64 nodes, Tanh act_fn
2015-05-18 14:41:31,949 INFO:rnn - RecurrentOutputLayer output: 64 inputs, 128 nodes, Logistic act_fn
2015-05-18 14:41:31,949 INFO:rnn - Layer cost: 128 nodes, CrossEntropy cost_fn, utilizing GPU backend
Traceback (most recent call last):
File "/ocz/gys/NervanaSys/neon/venv/bin/neon", line 240, in <module>
experiment, result, status = main()
File "/ocz/gys/NervanaSys/neon/venv/bin/neon", line 208, in main
result = experiment.run()
File "/usr/local/lib/python2.7/dist-packages/neon/experiments/fit_predict_err.py", line 98, in run
super(FitPredictErrorExperiment, self).run()
File "/usr/local/lib/python2.7/dist-packages/neon/experiments/fit.py", line 101, in run
self.model.fit(self.dataset)
File "/usr/local/lib/python2.7/dist-packages/neon/models/rnn.py", line 117, in fit
self.grad_checker(numgrad="output")
File "/usr/local/lib/python2.7/dist-packages/neon/models/rnn.py", line 369, in grad_checker
num_target=num_target, num_i=num_i, num_j=num_j)
File "/usr/local/lib/python2.7/dist-packages/neon/models/rnn.py", line 216, in fprop
num_target[num_i, num_j] = (numpy_target + eps)
File "/usr/local/lib/python2.7/dist-packages/neon/backends/cc2.py", line 285, in __setitem__
raise TooSlowToImplementError("arbitrary "
neon.util.error.TooSlowToImplementError: arbitrary indexing
if I run the lstm example on cpu, everything goes well.
Where the output is real values (R^n)
and the loss function is something like MSE.
email from a user:
I am trying to get involved with text classification. I would like to start with the classical example of movie recomendations since there are a lot of examples using different kind of software
to illustrate And solve the problem.
Vowpal wobbit, scikit learn, stanford nlp etc...
https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews
I would also like to classify the typical sentiment140 dataset which contains tweet text
Which can be noisy sometimes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.