Giter Club home page Giter Club logo

Comments (13)

amiralansary avatar amiralansary commented on July 1, 2024 2

I usually get around ~3-4 it/sec using the default big architecture on a GTX 1080. You can try using a tiny architecture first till you see some decent results, then try a bigger one. And if you do not need 3D convs then it will be much faster if you use 2D convs.

You can also monitor the performance using tensorboard and passing the train_log directory. Signs like playing more games with time or increasing the number of successful episodes/games, are healthy indicators that the agent is learning.

from rl-medical.

amiralansary avatar amiralansary commented on July 1, 2024

I am using 3D frames not 2D, so this should be fine with your data if it is 3D too. There are two models in the code 2D and 3D, both should be working so far.

from rl-medical.

amiralansary avatar amiralansary commented on July 1, 2024

@crypdick Did you manage to make the code work on your data?

from rl-medical.

crypdick avatar crypdick commented on July 1, 2024

@amiralansary I can't be sure. I finally got my network training yesterday evening, epoch 3 just started (training is slow-- only ~0.8 iterations per second). When I use --task play with the latest model checkpoint, at all timesteps the DQN always outputs the same exact Q-value array and thus the agent always steps in the same direction. In --task train the model is off-policy so it's taking actions randomly.

from rl-medical.

crypdick avatar crypdick commented on July 1, 2024

@amiralansary I wanted to double check something with you. In the network graph, you slice into the comb_state to get the current state and future state. So, the minimum FRAME_HISTORY has to be 2, right? I don't understand why not just return (pre-state, action, reward, terminal, post-state).

from rl-medical.

amiralansary avatar amiralansary commented on July 1, 2024

Yes, comb_state is combined of current and next state. Both states are used in the graph during training for the DQN loss function:
image

In this case, the size of one comb_state will be (image_shape[0], image_shape[0], image_shape[0], FRAME_HISTORY + 1)

from rl-medical.

sunalbert avatar sunalbert commented on July 1, 2024

Hi @amiralansary , I have a similar question. I want to apply your code to landmark detection on the common natural images (with 3 dims: H, W, C). How should I modify your code? Where should I pay more attention? (changing the conv operation from conv3D to conv2D; the image size from (45, 45, 45) to (X, X, 3)).

from rl-medical.

gravity1989 avatar gravity1989 commented on July 1, 2024

Hi @amiralansary , At what point of time, in terms of number of epochs did you start getting reasonable results? Do you have some graph which shows number of success v/s epochs?

Thanks.

from rl-medical.

amiralansary avatar amiralansary commented on July 1, 2024

@sunalbert You may need to use conv2d instead of conv3d if the input is 2d (check Model2D). You will have to track other processes as well and make sure it process inputs and outputs correctly. Last thing, if your input is coloured images and not grayscale, then you may end up using conv3d as the channel dimension is already used for storing frame history.

from rl-medical.

amiralansary avatar amiralansary commented on July 1, 2024

@gravity1989 time or number of iterations vary based on the complexity of the target landmark. You can visual the performance of the agent during training using the saved models (with the task play). Or Use tensorboard to visual the curves and monitor the performance.

Here are some of the performance curves on training data for a cardiac landmark
image

And on the validation data:
image

For results reported in the paper, training was stopped at 500.0k iter for all experiments. But in practice you can keep it training till the agent performs with a satisfactory accuracy.

from rl-medical.

gravity1989 avatar gravity1989 commented on July 1, 2024

Thanks, @amiralansary .

from rl-medical.

sunalbert avatar sunalbert commented on July 1, 2024

Thanks, @amiralansary

from rl-medical.

courins avatar courins commented on July 1, 2024

@amiralansary @crypdick @gravity1989 @sunalbert

too slow training speed

my current env is
win7 x64 System
Nvidia Geforce GTX 1080 (8G)
CUDA9.0
cuDNN7.0.5
tensorflow-gpu(1.6.0)
tensorpack (0.8.0)
gym now use(0.12.1)

and i used examples data for training
\tensorpack-medical\examples\LandmarkDetection\DQN\data\filenames\image_files
\tensorpack-medical\examples\LandmarkDetection\DQN\data\filenames\landmark_files

for gpu memory limit, and i used parameters:
BATCH_SIZE = 24

and GPU and CPU setting:
mem_fraction = 0.8
# conf = tf.ConfigProto(log_device_placement=True)
conf = tf.ConfigProto()
# conf.allow_soft_placement = True
conf.intra_op_parallelism_threads = 6
conf.inter_op_parallelism_threads = 6
conf.gpu_options.per_process_gpu_memory_fraction = mem_fraction
conf.gpu_options.allow_growth = True

and exclude Data Load's effect.i used FakeData

dataflow = FakeData([[BATCH_SIZE,45,45,45,5],[BATCH_SIZE],[BATCH_SIZE],[BATCH_SIZE]],size=1000,random=False, dtype=

['uint8','float32','int8','bool'])

and minimal training setting:

return TrainConfig(
data=QueueInput(dataflow),
model=Model(),
callbacks=[],
# steps_per_epoch=10,
steps_per_epoch=10,
max_epoch=1000,
session_config= conf,
)

the training speed is 28 seconds per iter.

even i reduce the model complexness (by commented Conv3D and Pool3D ):

with argscope(Conv3D, nl=PReLU.symbolic_function, use_bias=True):
# core layers of the network
conv = (LinearWrap(image)
.Conv3D('conv0', out_channel=32,
kernel_shape=[5,5,5], stride=[1,1,1])
.MaxPooling3D('pool0',16)
# .Conv3D('conv1', out_channel=32,
# kernel_shape=[5,5,5], stride=[1,1,1])
# .MaxPooling3D('pool1',2)
# .Conv3D('conv2', out_channel=64,
# kernel_shape=[4,4,4], stride=[1,1,1])
# .MaxPooling3D('pool2',2)
# .Conv3D('conv3', out_channel=64,
# kernel_shape=[3,3,3], stride=[1,1,1])
)

the training speed is 22 seconds per iter.

it is 100x slow by comparison with your training speed
{around ~3-4 it/sec using the default big architecture on a GTX 1080}

I want to know why and
please give me some suggestions about reduce the training time.

from rl-medical.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.