Giter Club home page Giter Club logo

docs's Introduction

https://travis-ci.org/jade-hpc-gpu/docs.svg?branch=master

https://readthedocs.org/projects/jade-hpc/badge/?version=latest

target:http://jade-hpc.readthedocs.io/en/latest/?badge=latest
alt:Documentation Status

JADE HPC Facility Documentation

This is the source code for the documentation of JADE HPC facility user guide. It is written in the rst format. For a guide on the rst file format see this document.

How to Contribute

To contribute to this documentation, first you have to fork it on GitHub and clone it to your machine, see Fork a Repo for the GitHub documentation on this process.

Once you have the git repository locally on your computer, you will need to install sphinx and sphinx_bootstrap_theme to be able to build the documentation. See the instructions below for how to achieve this.

pip install sphinx_rtd_theme

Once you have made your changes and updated your Fork on GitHub you will need to Open a Pull Request. All changes to the repository should be made through Pull Requests, including those made by the people with direct push access.

Building the documentation

  1. Install Python on your machine

  2. Install sphinx:

    pip install sphinx
    

    or

    conda install sphinx
    
  3. Install sphinx_rtd_theme:

    pip install sphinx_rtd_theme
    

    or

    conda install sphinx_rtd_theme
    
  4. To build the HTML documentation run:

    make html
    

    Or if you don't have the make utility installed on your machine then build with sphinx directly:

    sphinx-build . ./html
    

Continuous build and serve

The package sphinx-autobuild provides a watcher that automatically rebuilds the site as files are modified. To use it, install (in addition to the Sphinx packages) with the following:

pip install sphinx-autobuild

To start the autobuild process, run:

sphinx-autobuild . ./html

The application also serves up the site at port 8000 by default at http://localhost:8000.

Making Changes to the Documentation

The documentation consists of a series of reStructured Text files which have the .rst extension. These files are then automatically converted to HTMl and combined into the web version of the documentation by sphinx. It is important that when editing the files the syntax of the rst files is followed.

If there are any errors in your changes the build will fail and the documentation will not update, you can test your build locally by running make html. The easiest way to learn what files should look like is to read the rst files already in the repository.

Submitting Changes and Making Contributions

Contributions should be made by forking the documentation site repo (this repo) and submitting a pull request. Pull requests will be merged by an Admin after review.

docs's People

Contributors

andygittings avatar charliejhadley avatar dialpuri avatar felix-salfelder avatar jimmadge avatar kpu avatar mcduta avatar mondus avatar mozhgan-kch avatar talesa avatar twinkarma avatar willfurnass avatar wizofe avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

docs's Issues

CUDA_VISIBLE_DEVICES environment variable not passed in to Docker containers

This visible devices lock used to limit which GPU a user is able to utilise and is essential when there's multiple users sharing a single node.

When NOT using containers the environment variables are set:

srun --partition=big --gres=gpu:2 printenv | grep CUDA_VISIBLE_DEVICES

CUDA_VISIBLE_DEVICES=2,3

but when using a docker container e.g. Caffe, the same command does not show the environment variable:

srun --gres=gpu:2 --pty  /jmain01/apps/docker/caffe 17.04

printenv | grep CUDA_VISIBLE_DEVICES
#Noting is shown

I've also tested using the deviceQuery from the CUDA sample code and it shows that 8 devices are detected instead of just the 2 that was requested.

Use of HTTP proxy documentation appears wrong

software/git.rst says you need to use a proxy to use git on JADE but this doesn't appear to be true, and certainly the http_proxy environment variables don't appear to be set on the login nodes. Might also be useful to provide links to the git documentation on creating personal access tokens:

https://docs.github.com/en/get-started/getting-started-with-git/about-remote-repositories#cloning-with-https-urls
https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token

`containers` command docs

On the http://docs.jade.ac.uk/en/latest/jade/containers.html#containers page, it says to run

root@dgj223:~# containers

I'm not sure where the # comes from. I think it might be more readable to just say

~ containers or containers.

On another note, the containers command doesn't seem to work in an interactive session (e.g. after connecting to a worker node with srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --gres=gpu:1 --partition=devel --pty bash -i). Why is that?

Also, the list of containers seems to be out of date.

Documentation parts to add

  • How much storage does a normal JADE account have?
  • Where/how to request for more storage
  • Where is your storage mounted
  • Other types of storage that can be used e.g. /tmp /scatch?
  • How to use the DGX-1's SSD as temporary/fast storage

File system quotas

Hi,

Are there space usage limits on JADE? It'd be nice to have this documented somewhere. Using lfs quota -u $USER /jmain01/data/path/to/my/data yields a quota of 0k so I assume I have 'unlimited' storage space

Outdated instructions for obtaining a ServiceNow account

In jade/getting-account.rst, it is currently stated that the address to go to is https://stfc.service-now.com/hartreecentre?id=index, and to use "reset password" to be able to request a new password. However, after having spoken to Rebecca Mason from the Hartree Centre Support, the actual address to go to now is https://stfc.service-now.com/hcssp, and a new account can be made by simply registering the normal way (via https://stfc.service-now.com/hcssp?id=csm_registration - I was assigned a User ID and password the same day).

Updating this might save some future new users some confusion - thank you :)

Portforward to compute node

How do I forward a node from the compute node running srun to the login node and further to my local computer s..t I can use a debugger?

I tried ssh user@computenode but get this error message:

ssh_exchange_identification: read: Connection reset by peer

Missing documentation items

  • Access
    • Generation of SSH keys
  • File storage
    • Local ssd /raid/local_scratch/[your username]
    • DDN NVMe storage, how to get access

Query about accelerating I/O

Hi!
I am pretraining transformers, however, I found every time I load the model from a checkpoint, it spent more than 16 hours skipping the pretraining steps (in other words, on pure I/O). I stored the data file in '/jmain02/home/J2ADxxx/[project]/[user]/XXX/XXX'. Furthermore, I wonder is there any other path on JADE2 that I could store data and accelerate I/O? If yes, is there a space quota?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.