Giter Club home page Giter Club logo

Comments (5)

frederickayala avatar frederickayala commented on July 3, 2024 1

Have you verify that theano is configured properly? Check your .theanorc and you can validate if theano is using the GPU with theano.config.device

http://deeplearning.net/software/theano/library/config.html

from gru4rec.

hidasib avatar hidasib commented on July 3, 2024

I would be suspicious with those results. Those training times seem to be extremely low. How much data do you use for training? Do you get any errors?

In practice, training is much faster on GPU than on CPU. There are two bottlenecks on GPU at the moment, but neither hinder the execution so much that it would slow below the training speed of a CPU.

from gru4rec.

loretoparisi avatar loretoparisi commented on July 3, 2024

@hidasib Do you have specific training time for different configurations/gpu units?

The paper only states that

The running time depends on the parameters and the dataset.
Generally speaking the difference in runtime between the smaller and the larger variant is not too high on a GeForce GTX Titan X GPU and the training of the network can be done in a few hours.
On CPU, the smaller network can be trained in a practically acceptable timeframe.

and

The GRU-based approach has substantial gain over the item-KNN in both evaluation metrics on both datasets, even if the number of units is 100. Increasing the number of units further improves the results for pairwise losses, but the accuracy decreases for cross-entropy...
Although, increasing the number of units increases the training times, we found that it was not too expensive to move from 100 units to 1000 on GPU.

A note on theano specifies that

Using Theano with fixes for the subtensor operators on GPU

I'm running via nvidia-docker on a

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   37C    P8    17W / 125W |      2MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The training process

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                   
  131 root      20   0 32.726g 3.766g  39908 R 399.3 25.6   4357:04 python                                                                                                    

I beat that I'm running on CPU. Printing Theano configuration attributes will reveal it:

python -c 'import theano; print(theano.config)' | less

from gru4rec.

loretoparisi avatar loretoparisi commented on July 3, 2024

[UPDATE]

Ok, I have figured out. First create a .theanorc file in $HOME, with this minimal configuration attributes

[global]
floatX = float32
device = cuda0

[lib]
cnmem = 1

[nvcc]
fastmath = True

Then to check it out write this python script

from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

and test if the device is detected:

root@d842fc00a358:~/GRU4Rec/examples/rsc15# /root/yes/lib/python3.5/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
  warnings.warn("Your cuDNN version is more recent than "
taiUsing cuDNN version 6021 on context None
lMapped name None to device cuda0: GRID K520 (0000:00:03.0)

In my case I can see a warning about cuDNN but this depends on its version. If the gpu device has been detected, since your configuration states device = cuda0, you can restart the training and see what happens. I get a segmentation fault in few minutes, so it's possibile due to the previous warning...

from gru4rec.

loretoparisi avatar loretoparisi commented on July 3, 2024

I have reported the segmentation fault to Theano, since the training on cpu it works, so it maybe due to something else.

from gru4rec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.