Giter Club home page Giter Club logo

nvidia-docker's Introduction

Deep Learning Container Setup and Usage Guide

This guide provides instructions for setting up and using Podman containers for running deep learning applications with PyTorch and NVIDIA GPUs.

Useful Resources

Setup Instructions

  1. Project Folder:
    • Rename your project folder to my_project.
  2. Environment Variables:
    • Open the .env/.argfile file in the root directory.
    • Set your project name as an environment variable (e.g., PROJECT_NAME=my_project).
    • Set the Jupyter Lab port (e.g., JUPYTER_PORT=8000).
    • Configure cluster settings (MASTER_PORT, MASTER_ADDR, WORLD_SIZE, NODE_RANK).
    • Set NCCL environment variables.
  3. Requirements File:
    • Add any necessary pip dependencies to the requirements.txt file.

Usage

  • Starting the Container:
    • Run bash build.sh to build and start the container using Podman.
  • Accessing Jupyter Lab:
    • Connect to Jupyter Lab through http://<ip-address>:<JUPYTER_PORT>/?token=<token>
  • Direct File Execution:
    • To directly execute a file, such as a python script, from the terminal, use a command like the following:
      • ( source .env && podman exec -w /workspace/my_project $PROJECT_NAME-$NODE_RANK conda run --live-stream -n accelerate accelerate launch my-project.py --arg1 ../path/to/data )
    • This command sources your environment variables from .env and executes the specified Python script or Jupyter notebook inside the Podman container.

Synchronization between Nodes

  • Synchronization between Nodes with Optional File Execution:
    • The sync folder contains a script for synchronizing your working directory with remote nodes, essential for training on a cluster.
    • The script supports start and stop actions for synchronizing and managing containers on remote nodes.
    • Additionally, the sync/sync.sh command can take an optional fourth argument specifying a file/path (script or notebook) from the project directory, which will then be executed.
    • Starting Synchronization and Containers:
      • Usage: bash sync/sync.sh <local_absolute_path> <remote_relative_path> start [optional_file_path].
      • For example, to start synchronization and execute a script: bash sync/sync.sh ~/my_project .sync/my_project start /scripts/my-script.py.
    • Stopping Remote Containers:
      • Usage: bash sync/sync.sh <local_absolute_path> <remote_relative_path> stop.
      • For example: bash sync/sync.sh ~/my_project .sync/my_project stop.
    • Configuring Sync Settings:
      • Update the sync/config.json file to include your own nodes, their respective SSH access details, and keys. Ensure to replace node1, node2, etc., with your actual node details.

nvidia-docker's People

Stargazers

Jonas Kuche avatar

Watchers

Maximilian Huber avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.