Giter Club home page Giter Club logo

Comments (7)

jperez999 avatar jperez999 commented on June 3, 2024

Hello @Arnie0426, can you give us a little more information about the particular AMI you were using and your instance settings (storage amount, instance type). This will help us recreate the issue. Currently, using the Deep Learning AMI available from Amazon with a P3 instance, we are able to successfully run the merlin container on AWS.

from merlin.

Arnie0426 avatar Arnie0426 commented on June 3, 2024

Hi @jperez999, I was trying to use the 1.18-v20210125 AMI on P3.8xlarge instances. Are there certain AMIs not supported by Merlin?

from merlin.

jperez999 avatar jperez999 commented on June 3, 2024

@Arnie0426
So I have been trying to repro you case, been having a hard time. When I try to create an instance on AWS with the AMI 1.18-v20210125, I see three options:

amazon-eks-node-1.18-v20210125 - ami-0b9bf042ba04abfa6
amazon-eks-gpu-node-1.18-v20210125 - ami-0f9f34596af53c9b8
amazon-eks-arm64-node-1.18-v20210125 - ami-0fb66d99b073e5c27

Are any of these the ones you used? (the one I tested, "amazon-eks-gpu-node", did not work because of a docker service failure)
Can you check if you have nvidia-docker installed on that AMI?
It would be best if I could repro your environment, to really drill down on the problem.

from merlin.

Arnie0426 avatar Arnie0426 commented on June 3, 2024

Other docker images work fine, and yes, I am using the eks-gpu-node-1.18.

Would you also be able to tell me which AMI you used when you were able to successfully run the merlin container? Was it the newest one?

from merlin.

jperez999 avatar jperez999 commented on June 3, 2024

The image I tested on was the Deep Learning AMI (Amazon Linux 2) Version 37.0

For your image, you may need to install nvidia-docker, I didnt see it on that image 'eks-gpu-node-1.18'. To install follow this guide: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker.

from merlin.

benfred avatar benfred commented on June 3, 2024

@Arnie0426 did installing nvidia-docker resolve the issue for you?

from merlin.

benfred avatar benfred commented on June 3, 2024

We're basing our containers off the DLFW now, which includes user-mode cuda drivers. You should be able to run our containers on a cuda 11 host without issue as of the 21.09 containers - but definitely re-open if you hit this issue again

from merlin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.