Hi, Great article and code! I find that the training is not fast

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[UPDATE] Ok, I have figured out. First create a <co

I have <a href="https://github.com/Theano/Theano/issues/6202" data-hovercard-type="iss

code not faster on GPU about gru4rec HOT 5 CLOSED

hidasib commented on July 3, 2024

code not faster on GPU

from gru4rec.

Comments (5)

frederickayala commented on July 3, 2024 1

Have you verify that theano is configured properly? Check your .theanorc and you can validate if theano is using the GPU with theano.config.device

http://deeplearning.net/software/theano/library/config.html

from gru4rec.

hidasib commented on July 3, 2024

I would be suspicious with those results. Those training times seem to be extremely low. How much data do you use for training? Do you get any errors?

In practice, training is much faster on GPU than on CPU. There are two bottlenecks on GPU at the moment, but neither hinder the execution so much that it would slow below the training speed of a CPU.

from gru4rec.

loretoparisi commented on July 3, 2024

@hidasib Do you have specific training time for different configurations/gpu units?

The paper only states that

The running time depends on the parameters and the dataset.
Generally speaking the difference in runtime between the smaller and the larger variant is not too high on a GeForce GTX Titan X GPU and the training of the network can be done in a few hours.
On CPU, the smaller network can be trained in a practically acceptable timeframe.

and

The GRU-based approach has substantial gain over the item-KNN in both evaluation metrics on both datasets, even if the number of units is 100. Increasing the number of units further improves the results for pairwise losses, but the accuracy decreases for cross-entropy...
Although, increasing the number of units increases the training times, we found that it was not too expensive to move from 100 units to 1000 on GPU.

A note on theano specifies that

Using Theano with fixes for the subtensor operators on GPU

I'm running via nvidia-docker on a

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   37C    P8    17W / 125W |      2MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The training process

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                   
  131 root      20   0 32.726g 3.766g  39908 R 399.3 25.6   4357:04 python

I beat that I'm running on CPU. Printing Theano configuration attributes will reveal it:

python -c 'import theano; print(theano.config)' | less

from gru4rec.

loretoparisi commented on July 3, 2024

[UPDATE]

Ok, I have figured out. First create a .theanorc file in $HOME, with this minimal configuration attributes

[global]
floatX = float32
device = cuda0

[lib]
cnmem = 1

[nvcc]
fastmath = True

Then to check it out write this python script

from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

and test if the device is detected:

root@d842fc00a358:~/GRU4Rec/examples/rsc15# /root/yes/lib/python3.5/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
  warnings.warn("Your cuDNN version is more recent than "
taiUsing cuDNN version 6021 on context None
lMapped name None to device cuda0: GRID K520 (0000:00:03.0)

In my case I can see a warning about cuDNN but this depends on its version. If the gpu device has been detected, since your configuration states device = cuda0, you can restart the training and see what happens. I get a segmentation fault in few minutes, so it's possibile due to the previous warning...

from gru4rec.

loretoparisi commented on July 3, 2024

I have reported the segmentation fault to Theano, since the training on cpu it works, so it maybe due to something else.

from gru4rec.

code not faster on GPU about gru4rec HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent