Giter Club home page Giter Club logo

Comments (18)

elezar avatar elezar commented on August 11, 2024

@MKrupauskas would switching to /libnvidia-container/debian10/amd64 as the source of truth for the package be a solution on your end?

Our intent with the official documentation was to make the downloading the repository list file work across different distributions, but the .list files would locally refer to the lowest compatible distribution for a given package flavor.

In the Debian case, this is debian10. The motivation for the changes that are causing the breakages are called out in NVIDIA/nvidia-container-toolkit#89 (comment)

from libnvidia-container.

jonathanjsimon avatar jonathanjsimon commented on August 11, 2024

All these user complaints would be solved with a symlink 😉

from libnvidia-container.

elezar avatar elezar commented on August 11, 2024

All these user complaints would be solved with a symlink 😉

@jonathanjsimon it's not quite a simple as that. A symlink duplicates the contents of the target folder at the link location when publishing these repos through GitHub pages. The reason this optimisation was performed was that the resultant artifact is already too large, causing the pages deployment to fail meaning that new packages are not available.

We are aware that there may be ways to increase the timeout using custom pages deployments. If you have experience in how to do this, suggestions are welcome.

from libnvidia-container.

MKrupauskas avatar MKrupauskas commented on August 11, 2024

While we did work around the issue by pointing our source list to Debian 10 the solution isn't ideal. If the only issue is the artifact size and build timeouts I think we should address that for the sake of having a Debian repo that matches the repo standard and user expectations.

Could you share some logs on what exactly times out if we correctly symlink the distribution directories? Looking at github action docs the steps themselves shouldn't time out for 360m if the default isn't overridden https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepstimeout-minutes

from libnvidia-container.

elezar avatar elezar commented on August 11, 2024

@MKrupauskas I have made the symlink changes to my personal mirror elezar@98ee43d.

The GitHub actions deploying this is here:

A previous action shows the archive size warning:

The following is an example of a deployment that failed due to a timeout, although this was using the "Deploy from branch" pages deployment and not an explicit workflow as we are using now.

from libnvidia-container.

elezar avatar elezar commented on August 11, 2024

We have updated our repository structure and installation instructions to make use of generic debian packages. The distribution name no longer affects the instructions.

Please see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html and reopen this issue if there are still problems.

from libnvidia-container.

HenriWahl avatar HenriWahl commented on August 11, 2024

Hi there,
when using tools like apt-mirror or apt-mirror2 the file Packages always is empty after being downloaded from https://nvidia.github.io/libnvidia-container/stable/deb/amd64/Packages, but works in a browser. Do you have any idea where to search for a solution?

from libnvidia-container.

elezar avatar elezar commented on August 11, 2024

@HenriWahl I don't know what apt-mirror expects. This is the file tree as deployed to GitHub pages: https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/deb/amd64

If there is additional metadata required by the toolking we could consider adding it.

from libnvidia-container.

HenriWahl avatar HenriWahl commented on August 11, 2024

@elezar I am not sure what is missing, looks good to me.
The only hint I have that it works with https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64, maybe there is some difference.

Edit: yes, there are some differences:

Edit 2: I found this being an older problem: NVIDIA/nvidia-docker#730

from libnvidia-container.

elezar avatar elezar commented on August 11, 2024

Those are useful pointers. I will spend some time investigating this.

from libnvidia-container.

elezar avatar elezar commented on August 11, 2024

I have just tried the following in a clean ubuntu container:

  1. Installed apt-mirror
  2. Edit /etc/apt/mirror.list to only reference:
deb https://nvidia.github.io/libnvidia-container/experimental/deb/amd64 /
  1. When running apt-mirror I then see:
Processing indexes: [Psh: 1: xz: not found
]
  1. I then installed xz-utils:
apt-get install -y xz-utils
  1. When I now ran apt-mirror the repo is mirrored:
$ ls /var/spool/apt-mirror/mirror/
nvidia.github.io
  1. And in the folders themselves:
ls /var/spool/apt-mirror/mirror/nvidia.github.io/libnvidia-container/experimental/deb/amd64/
Packages                                           libnvidia-container-tools_1.15.0~rc.3-1_amd64.deb  nvidia-container-toolkit-base_1.14.0~rc.2-1_amd64.deb
Packages.xz                                        libnvidia-container1-dbg_1.14.0~rc.2-1_amd64.deb   nvidia-container-toolkit-base_1.15.0~rc.1-1_amd64.deb
libnvidia-container-dev_1.14.0~rc.2-1_amd64.deb    libnvidia-container1-dbg_1.15.0~rc.1-1_amd64.deb   nvidia-container-toolkit-base_1.15.0~rc.2-1_amd64.deb
libnvidia-container-dev_1.15.0~rc.1-1_amd64.deb    libnvidia-container1-dbg_1.15.0~rc.2-1_amd64.deb   nvidia-container-toolkit-base_1.15.0~rc.3-1_amd64.deb
libnvidia-container-dev_1.15.0~rc.2-1_amd64.deb    libnvidia-container1-dbg_1.15.0~rc.3-1_amd64.deb   nvidia-container-toolkit_1.14.0~rc.2-1_amd64.deb
libnvidia-container-dev_1.15.0~rc.3-1_amd64.deb    libnvidia-container1_1.14.0~rc.2-1_amd64.deb       nvidia-container-toolkit_1.15.0~rc.1-1_amd64.deb
libnvidia-container-tools_1.14.0~rc.2-1_amd64.deb  libnvidia-container1_1.15.0~rc.1-1_amd64.deb       nvidia-container-toolkit_1.15.0~rc.2-1_amd64.deb
libnvidia-container-tools_1.15.0~rc.1-1_amd64.deb  libnvidia-container1_1.15.0~rc.2-1_amd64.deb       nvidia-container-toolkit_1.15.0~rc.3-1_amd64.deb
libnvidia-container-tools_1.15.0~rc.2-1_amd64.deb  libnvidia-container1_1.15.0~rc.3-1_amd64.deb

Could you confirm that xz-utils is installed on your system?

from libnvidia-container.

HenriWahl avatar HenriWahl commented on August 11, 2024

Hi @elezar - thanks for your investigations!

I can confirm that my apt-mirror image did NOT have the package xz-utils installed but now it works WITH it!

Great job! 👍

from libnvidia-container.

HenriWahl avatar HenriWahl commented on August 11, 2024

@elezar one thing is left: now the apt command on a client cries that there is no Release file.

I see it is even missing at https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/deb/amd64.

from libnvidia-container.

elezar avatar elezar commented on August 11, 2024

From the following documentation: https://wiki.debian.org/DebianRepository/Format#Flat_Repository_Format it is unclear whether a Release file is actually required. It seems that either InRelease or Release must be specified.

Can you give more information on what apt commands you're using and what the errors are?

from libnvidia-container.

HenriWahl avatar HenriWahl commented on August 11, 2024

After an apt update i get this:

Ign:5 https://mirror-apt.local/nvidia-container-toolkit-jammy  InRelease
Ign:6 https://mirror-apt.local/nvidia-cuda-jammy  InRelease
Err:7 https://mirror-apt.local/nvidia-container-toolkit-jammy  Release
  404  Not Found [IP: 10.10.10.10 443]
Hit:8 https://mirror-apt.local/nvidia-cuda-jammy  Release
Reading package lists... Done
E: The repository 'https://mirror-apt.local/nvidia-container-toolkit-jammy  Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.

InRealease and Release are both getting tried. Meanwhile I found that none of them does exist in my local mirror, as in your listing above.

from libnvidia-container.

elezar avatar elezar commented on August 11, 2024

Does:

sudo apt-get update --allow-insecure-repositories

work as expected?

from libnvidia-container.

HenriWahl avatar HenriWahl commented on August 11, 2024

Yes it does.

The problem seems to be caused by apt-mirror, according to apt-mirror/apt-mirror#156. It seems to miss this file on flat repositories. I will look for it or an alternative next week. Thanks for your commitment!

from libnvidia-container.

elezar avatar elezar commented on August 11, 2024

Yes it does.

The problem seems to be caused by apt-mirror, according to apt-mirror/apt-mirror#156. It seems to miss this file on flat repositories. I will look for it or an alternative next week. Thanks for your commitment!

I think you can get by this by marking the local mirror as trusted or ensuring that the public key for our repos is also downloaded. For example, as per our documentation https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Note that the lines effectively look like:

deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/$(ARCH) /

in this case and setting up something similar for your mirrors would be needed.

from libnvidia-container.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.