Giter Club home page Giter Club logo

Comments (11)

chivee avatar chivee commented on May 8, 2024

Can you trying to set deviceId to 0 other than "auto" to select GPU manually.

from cntk.

frankseide avatar frankseide commented on May 8, 2024

I have seen this before, but I thought we fixed it. Will have a look tomorrow morning.

Thanks for reporting it!

Sent from Outlookhttp://aka.ms/Ox5hz3

On Thu, Jan 28, 2016 at 9:05 PM -0800, "Aerosoul" <[email protected]mailto:[email protected]> wrote:

Can you trying to set deviceId to 0 other than "auto" to select GPU manually.

Reply to this email directly or view it on GitHubhttps://github.com//issues/55#issuecomment-176576409.

from cntk.

such87 avatar such87 commented on May 8, 2024

Yes setting deviceId=0 solves it for MNIST.

I also tried running CIFAR10 where I run this :

$ cntk configFile=01_Conv.config configName=01_Conv deviceId=0

And now I get this error :

Validating --> conv1_act.y = RectifiedLinear(conv1_act.p[32 x 32 x 32 x 1 x ]) -> [32 x 32 x 32 x 1 x *]
Validating --> pool1 = MaxPooling(conv1_act.y[32 x 32 x 32 x 1 x *])
[CALL STACK]
/scratch-shared/mch/scratch/dipsank/CUDNN/CNTK/bin/../lib/libcntkmath.so ( Microsoft::MSR::CNTK::DebugUtil::PrintCallStack() + 0xb4 ) [0x7fd7b9bdfd44]
cntk ( void Microsoft::MSR::CNTK::ThrowFormattedstd::invalid_argument(char const
, ...) + 0xc0 ) [0x5366d0]
cntk ( Microsoft::MSR::CNTK::PoolingNodeBase::Validate(bool) + 0x325 ) [0x5a47b5]
cntk ( Microsoft::MSR::CNTK::MaxPoolingNode::Validate(bool) + 0x14 ) [0x5a48d4]
cntk ( Microsoft::MSR::CNTK::ComputationNetwork::ValidateNodes(std::liststd::shared_ptr<Microsoft::MSR::CNTK::ComputationNodeBase, std::allocatorstd::shared_ptr<Microsoft::MSR::CNTK::ComputationNodeBase > >, bool, unsigned long&) + 0x372 ) [0x6c0252]
cntk ( Microsoft::MSR::CNTK::ComputationNetwork::ValidateSubNetwork(std::shared_ptrMicrosoft::MSR::CNTK::ComputationNodeBase const&) + 0x205 ) [0x6c0b35]
cntk ( Microsoft::MSR::CNTK::ComputationNetwork::CompileNetwork() + 0x21f ) [0x6c35af]
cntk ( Microsoft::MSR::CNTK::NDLBuilder::LoadFromConfig(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) + 0x1de ) [0x5961ee]
cntk ( std::_Function_handlerstd::shared_ptr<Microsoft::MSR::CNTK::ComputationNetwork (int), void DoTrain<Microsoft::MSR::CNTK::ConfigParameters, float>(Microsoft::MSR::CNTK::ConfigParameters const&)::{lambda(int)#2}>::M_invoke(std::Any_data const&, int) + 0x7f ) [0x7627ef]
cntk ( Microsoft::MSR::CNTK::SGD::Train(std::functionstd::shared_ptr<Microsoft::MSR::CNTK::ComputationNetwork (int)>, int, Microsoft::MSR::CNTK::IDataReader
, Microsoft::MSR::CNTK::IDataReader
, bool) + 0x4c8 ) [0x74b538]
cntk ( void DoTrain<Microsoft::MSR::CNTK::ConfigParameters, float>(Microsoft::MSR::CNTK::ConfigParameters const&) + 0x21a ) [0x76134a]
cntk ( void DoCommands(Microsoft::MSR::CNTK::ConfigParameters const&) + 0x7a4 ) [0x5926e4]
cntk ( wmainOldCNTKConfig(int, wchar_t**) + 0xaa1 ) [0x52a941]
cntk ( wmain1(int, wchar_t**) + 0x62 ) [0x52b0f2]
cntk ( main + 0xcc ) [0x51e06c]
/lib64/libc.so.6 ( __libc_start_main + 0xfd ) [0x344e61ed5d]
cntk ( ) [0x521b09]
EXCEPTION occurred: Convolution operation currently only supports 1D or 2D convolution on 3D tensors.

Works fine for configFile=02_BatchNormConv.config

If I do not put the deviceId for CIFAR10 then I get the same error reported above.

from cntk.

such87 avatar such87 commented on May 8, 2024

Is it possible to run with multiple GPUs using only cntk ?
Or do I have to use MPI to launch cntk on multiple GPUs ?

from cntk.

frankseide avatar frankseide commented on May 8, 2024

mpiexec is necessary to launch multi-GPU jobs.

(You can run independent jobs of course.)

Thanks,

Frank

From: such87 [mailto:[email protected]]
Sent: Thursday, January 28, 2016 22:12
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] Error with MNIST Dataset (#55)

Is it possible to run with multiple GPUs using only cntk ?
Or do I have to use MPI to launch cntk on multiple GPUs ?


Reply to this email directly or view it on GitHubhttps://github.com//issues/55#issuecomment-176594264.

from cntk.

such87 avatar such87 commented on May 8, 2024

If I run like this :

mpiexec -n 4 cntk Config/config_file deviceId=0

Then each process will run only on GPU 0.
My platform is having 4 GPUs and I want each process to select
a unique GPU like proc0 selects GPU0, proc1 selects GPU 1 and so on.

The code is failing with deviceId not set (which I guess defaults to auto).
It generates the error that I first reported.

Is this a known issue ?

from cntk.

frankseide avatar frankseide commented on May 8, 2024

EXCEPTION occurred: DeviceFromConfig: unexpected failure

This may be due to not being able to write to /var/lock. I have refined the error message (will take a while to land).

from cntk.

frankseide avatar frankseide commented on May 8, 2024

Could you try if you can write to /var/lock? E.g.

echo test > /var/lock/test.txt

/var/lock is used to implement a global lock through the file system. If you do not have write access on your system, could you try to make it accessible for you? If that is not possible, a stopgap would be to manually edit CrossProcesMutex.h and change /var/lock to /tmp or something, and recompile. We have on our todo list to make this lock location configurable.

from cntk.

such87 avatar such87 commented on May 8, 2024

Tried changing /var/lock to lock on a local directory.
Not helping, the same error occurs.

from cntk.

saonim avatar saonim commented on May 8, 2024

I get the same error even when I add deviceId=0 or deviceId=1. Note that I have two GPUs on my system. Is it resolved?

from cntk.

mahilleb-msft avatar mahilleb-msft commented on May 8, 2024

Original issue was solved, closing this.

@such87: can you retry the CIFAR-10 example? There should have been some fixes in the mean time addressing this. If it doesn't run, please open a new issue. Thank you!
@such87, @saonim: for MPI execution, can you also try with the latest changes and post a new issue if it's still failing? Thank you...

For lock file location there already #62 to track.

from cntk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.