Giter Club home page Giter Club logo

Comments (8)

hobofan avatar hobofan commented on August 24, 2024

But it would be a security hole if CUDA doesn't zero memory allocations, so my guess may be completely wrong.

As unintuitive as it seems that's actually the case, and that behaviour recently got some more exposure (https://charliehorse55.wordpress.com/2016/01/09/how-nvidia-breaks-chrome-incognito/).

However that shouldn't have any impact on the way Leaf learns.
When the network is created, the weights of Linear and Convolution layers are randomly initialized (See https://github.com/autumnai/leaf/blob/master/src/layers/common/linear.rs#L100), so the initial state of the memory shouldn't really matter. Maybe there is a problem with the filled weights not being synchronized correctly?

Generally I would assume that when one of the examples doesn't learn it's due to bad hyperparameters (batch-size, learning-rate, etc.), but your findings certainly are interesting and I'll look into it.

from leaf-examples.

KodrAus avatar KodrAus commented on August 24, 2024

I'm getting the same results on my setup:

target/release/leaf-examples mnist linear --batch-size 10
Last sample: Prediction: 2, Target: 3 | Accuracy 1/10 = 10.00%
Last sample: Prediction: 9, Target: 4 | Accuracy 2/20 = 10.00%
Last sample: Prediction: 9, Target: 3 | Accuracy 3/30 = 10.00%
Last sample: Prediction: 9, Target: 1 | Accuracy 4/40 = 10.00%
Last sample: Prediction: 9, Target: 3 | Accuracy 7/50 = 14.00%
Last sample: Prediction: 9, Target: 4 | Accuracy 9/60 = 15.00%
Last sample: Prediction: 9, Target: 1 | Accuracy 9/70 = 12.86%
Last sample: Prediction: 9, Target: 9 | Accuracy 10/80 = 12.50%
Last sample: Prediction: 9, Target: 6 | Accuracy 11/90 = 12.22%
...
CUDA version 7.5.18
rustc 1.9.0-nightly
cudnn v4
Nvidia GTX Titan X

from leaf-examples.

hobofan avatar hobofan commented on August 24, 2024

I didn't get around to it on the weekend but was able to run it now, and it learned correctly and from the first try:

cargo run --release --  mnist linear --batch-size 10 
   Compiling collenchyma v0.0.8
   Compiling collenchyma-nn v0.3.4
   Compiling collenchyma-blas v0.2.0
   Compiling leaf v0.2.0
   Compiling leaf-examples v0.1.0 (file:///home/hobofan/autumn/leaf-examples)
     Running `target/release/leaf-examples mnist linear --batch-size 10`
target/release/leaf-examples: /opt/cuda/lib64/libOpenCL.so.1: no version information available (required by target/release/leaf-examples)
Last sample: Prediction: 2, Target: 3 | Accuracy 1/10 = 10.00%
Last sample: Prediction: 3, Target: 4 | Accuracy 3/20 = 15.00%
Last sample: Prediction: 4, Target: 3 | Accuracy 4/30 = 13.33%
Last sample: Prediction: 1, Target: 1 | Accuracy 7/40 = 17.50%
Last sample: Prediction: 0, Target: 3 | Accuracy 10/50 = 20.00%
Last sample: Prediction: 2, Target: 4 | Accuracy 12/60 = 20.00%
Last sample: Prediction: 9, Target: 1 | Accuracy 15/70 = 21.43%
Last sample: Prediction: 0, Target: 9 | Accuracy 21/80 = 26.25%
Last sample: Prediction: 6, Target: 6 | Accuracy 26/90 = 28.89%
Last sample: Prediction: 0, Target: 5 | Accuracy 29/100 = 29.00%
Last sample: Prediction: 4, Target: 9 | Accuracy 33/110 = 30.00%
Last sample: Prediction: 3, Target: 2 | Accuracy 40/120 = 33.33%
Last sample: Prediction: 1, Target: 3 | Accuracy 46/130 = 35.38%
Last sample: Prediction: 7, Target: 7 | Accuracy 52/140 = 37.14%
Last sample: Prediction: 5, Target: 4 | Accuracy 56/150 = 37.33%
Last sample: Prediction: 7, Target: 8 | Accuracy 63/160 = 39.38%
Last sample: Prediction: 9, Target: 9 | Accuracy 69/170 = 40.59%

Rust 1.7.0-stable
CUDA version 7.5.17
cuDNN v4
NVIDIA GT 750M (2GB RAM)

EDIT:

It also works with my other machine:
Rust 1.5.0-stable
CUDA version 7.5.17
cuDNN v4
NVIDIA Titan X

from leaf-examples.

KodrAus avatar KodrAus commented on August 24, 2024

Hmm, I'll try using the same CUDA and Rust versions as you and see if it changes my results. Will edit with details.

EDIT: No combination of driver or cuda versions seems to work for me:

Ubuntu 15.10
Rust 1.7.0 Stable

nvidia-361.28 (os)
nvidia-352.79 (prop)
nvidia-352.63 (prop)

from leaf-examples.

MarcoPolo avatar MarcoPolo commented on August 24, 2024

Same results here (not learning and always predicting 9)

Machine info:

Rust 1.7 stable
Ubuntu 14.04
CUDA v7.5.17
cuDNN v4
nvidia GTX 680

from leaf-examples.

alexandermorozov avatar alexandermorozov commented on August 24, 2024

Yesterday I got it learn from the first try with linear net. Second run also worked. Then I switched to conv and it always returned 9. After that subsequent runs of linearnet returned 9 too.

I've simplified this example a bit by reducing input dimension to 1 and autogenerating training samples, code is here. It has the same behaviour -- sometimes it gets stuck, sometimes it doesn't. Effect doesn't depend on number of layers and batch sizes -- I've got same thing with only one linear layer and batch_size=1. In cases it gets stuck, output of nll layer after the first generation contais some sensible values. On later generations it degrades to all NaNs. Even if learning_rate=0 and values shouldn't change.

I'm currently looking into how to dump intermidiate values and weights to find out when they turn to NaNs. I've got a bit more time now, hopefully'll figure it out this time.

from leaf-examples.

KodrAus avatar KodrAus commented on August 24, 2024

@alexandermorozov I'm getting the same NaN results as you on your test code, so far I haven't been able to get any nets to learn.

On another note I had to add a build.rs to your test code to get it to link cu* properly on my machine. How have you got cuda set up on your machine?

from leaf-examples.

alexandermorozov avatar alexandermorozov commented on August 24, 2024

I'm getting the same NaN results as you on your test code, so far I haven't been able to get any nets to learn.

You can try to start two tasks simultaneously. It generally works for me: second task learns more often than not. Though it's difficult to tell if net works as expected: half of neurons might be dead and net may still learn somewhat.

On another note I had to add a build.rs to your test code to get it to link cu* properly on my machine. How have you got cuda set up on your machine?

I'm on Debian testing, common cuda packages are installed from distro repos. libcudnn.so* are manually placed in /usr/local/lib, cudnn.h in /usr/local/include. More importantly Rust switched linker from ld to ld.gold about 3 month ago, and ld.gold doesn't search in /usr/local/lib by default, so environment variable should be set like this: export LIBRARY_PATH="/usr/local/lib". If this doesn't help, can you post error message or content of build.rs? It may be better to create another bug to stay on topic here.

from leaf-examples.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.