Comments (7)
Hello @Arnie0426, can you give us a little more information about the particular AMI you were using and your instance settings (storage amount, instance type). This will help us recreate the issue. Currently, using the Deep Learning AMI available from Amazon with a P3 instance, we are able to successfully run the merlin container on AWS.
from merlin.
Hi @jperez999, I was trying to use the 1.18-v20210125 AMI on P3.8xlarge instances. Are there certain AMIs not supported by Merlin?
from merlin.
@Arnie0426
So I have been trying to repro you case, been having a hard time. When I try to create an instance on AWS with the AMI 1.18-v20210125, I see three options:
amazon-eks-node-1.18-v20210125 - ami-0b9bf042ba04abfa6
amazon-eks-gpu-node-1.18-v20210125 - ami-0f9f34596af53c9b8
amazon-eks-arm64-node-1.18-v20210125 - ami-0fb66d99b073e5c27
Are any of these the ones you used? (the one I tested, "amazon-eks-gpu-node", did not work because of a docker service failure)
Can you check if you have nvidia-docker installed on that AMI?
It would be best if I could repro your environment, to really drill down on the problem.
from merlin.
Other docker images work fine, and yes, I am using the eks-gpu-node-1.18.
Would you also be able to tell me which AMI you used when you were able to successfully run the merlin container? Was it the newest one?
from merlin.
The image I tested on was the Deep Learning AMI (Amazon Linux 2) Version 37.0
For your image, you may need to install nvidia-docker, I didnt see it on that image 'eks-gpu-node-1.18'. To install follow this guide: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker.
from merlin.
@Arnie0426 did installing nvidia-docker resolve the issue for you?
from merlin.
We're basing our containers off the DLFW now, which includes user-mode cuda drivers. You should be able to run our containers on a cuda 11 host without issue as of the 21.09 containers - but definitely re-open if you hit this issue again
from merlin.
Related Issues (20)
- [RMP] Add support for ranking models in PyTorch
- POC on how to build a session-based recommendation pipeline that can deal with the item cold-start problem
- POC on how to build a session-based recommendation model that can be used to re-rank candidate items
- [Task] Centralize API Documentation in Merlin
- [RMP] Update Merlin Models TensorFlow API to Match PyTorch API
- [BUG] User cannot deploy Merlin image >=23.04 on Azure Databricks
- [QST] Where to get tensorflow2.10.1+nv22.12 source code ?
- [QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? HOT 1
- Use MLflow Experiments with Merlin containers[QST] HOT 5
- [BUG] Merlin io - ModuleNotFoundError: No module named 'cudf._version' HOT 2
- [QST]Follow the example 'getting started movies' to execute an error. HOT 3
- [BUG] FileNotFoundError when apply Categorify after JoinExternal HOT 1
- [BUG] CUDA context error HOT 1
- [BUG]Unauthorized with docker build HOT 1
- [QST] Status: CUDA driver version is insufficient for CUDA runtime version HOT 1
- [QST] What is the best way of handling string UUIDs in Merlin? HOT 2
- [QST] Help w/ exporting Retrieval Model. HOT 5
- [QST] What is the best way of handling string UUIDs in Merlin? HOT 1
- [QST] How to do normal retrieval of candidates without starting a server HOT 2
- How to re-train the already trained model with new small dataset.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from merlin.