Giter Club home page Giter Club logo

Comments (34)

floresv299 avatar floresv299 commented on August 17, 2024 2

Aha,

I have fixed the error and successfully build the code. The problem was that I did not specified the target architecture. By adding the following (here Volta is the NVIDIA GPU generation)

-DAMReX_CUDA_ARCH=Volta

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024 2

Hi Axel,

its going well, I hope you are doing good! Sure thing here are the outputs for runs with one and two nodes.

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --gres=gpu:V100:4

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --gres=gpu:V100:4

The simulations run successfully however the following error arises:
"Multiple GPUs are visible to each MPI rank, but the number of GPUs per socket or node has not been provided.
This may lead to incorrect or suboptimal rank-to-GPU mapping.!"

I have attached the files below. One quick note that I've noticed, I'm using Openpmd for analysis and h5 as backend. I was getting an error when I specified the backend as h5, turns out that when I was building the build options had HDF5: OFF . I tried to specify the path of the hdf5 along with its libraries but no luck. Then I changed the version from what we had module load hdf5/1.14.0/gcc.11.2.0-openmpi.4.1.2 to module load hdf5/1.13.1/gcc.11.2.0-openmpi.4.1.2. That did the trick and I know get hf files for all geometries.

Warpx_run_1_node_output.txt

Warpx_run_1_node_err.txt

Warpx_run_2_node_output.txt

Warpx_run_2_node_err.txt

from warpx.

ax3l avatar ax3l commented on August 17, 2024 2

Hi @floresv299,

Thanks a lot for testing! Perfect, I reflected both your HDF5 module findings and an additional set of hints to address the GPU visibility issue you found in #4021.

Can you please update the WarpX profile with the changes in there?
Please re-login after the changes, activate the WarpX profile, remove the build directory and recompile WarpX to test the changes. Let me know what you find! :)

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024 2

Axel,

The code was now ran on 2 nodes. No errors were detected ! Here is the output. Thank you once again for all the help. A lab partner of mine was able to download and run the code with the current documentation.
8_MPI_pross_out.txt

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024 1

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024 1

Hi Axel,

apologies, I did not notice that the documents were not posted. Full-flavored install sounds great thank you!

HPC3_FBPIC_batch.pdf
HPC3_EPOCH_job.pdf
Warpx_on_HPC3_UCI.pdf

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024 1

Thank you !

Yes I see both of them: lapack/3.9.0 and OpenBLAS/0.3.6 , OpenBLAS/0.3.19 , OpenBLAS/0.3.21

Screenshot 2023-06-14 at 3 55 23 PM

from warpx.

ax3l avatar ax3l commented on August 17, 2024 1

Ok, then let us try the latest OpenBLAS/0.3.21 to start with.

But let's build WarpX with the above module environment first and let me know if that works, then we do advanced features.

cmake -S . -B build -DWarpX_COMPUTE=CUDA -DWarpX_QED_TABLE_GEN=ON -DWarpX_PSATD=ON -DWarpX_DIMS="1;2;3"
cmake --build build -j 12

cmake -S . -B build_rz -DWarpX_COMPUTE=CUDA -DWarpX_QED_TABLE_GEN=ON -DWarpX_PSATD=OFF -DWarpX_DIMS="RZ"
cmake --build build_rz -j 12

@floresv299 does this work? Can you post the build logs when running this? :)

from warpx.

ax3l avatar ax3l commented on August 17, 2024 1

Hi @floresv299 , can you post the output of

scontrol show partition=gpu
scontrol show partition=free-gpu
scontrol show partition=gpu-debug

here? I am trying to document how many GPU nodes there are in HPC3 and the respective homepage section is just shy of that detail for users.

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024 1

Hi @floresv299 , can you post the output of

scontrol show partition=gpu
scontrol show partition=free-gpu
scontrol show partition=gpu-debug

here? I am trying to document how many GPU nodes there are in HPC3 and the respective homepage section is just shy of that detail for users.

Sure thing Axel,
GPU
freeGPU

GPU_debug

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024 1

Hi @floresv299, can you please run

srun -p free-gpu --gres=gpu:V100:4 nvidia-smi

and post the output here? I cannot find how much memory each V100 GPU has and this will show us. What I found so far, there are four V100 GPUs per GPU node (nice!) :)

In terms of GPU memory here is the output.
GPUmem

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024 1

Awesome, great news! I'll start a pull request (#4010) documenting the workflow and adding RZ geometry with PSATD field solver support, too.

Just checking, I think if you set

# optimize CUDA compilation for V100
export AMREX_CUDA_ARCH=7.0

# compiler environment hints
export CXX=$(which g++)
export CC=$(which gcc)
export FC=$(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}

and rm -rf build before rebuilding, then the variable AMREX_CUDA_ARCH should have the same effect for you. Can you confirm?

Axel, I can confirm that after adding the above lines in your warpx.profile, the build is successful with no errors.

from warpx.

ax3l avatar ax3l commented on August 17, 2024 1

Hi @floresv299, how is it going?

Can you post again the output of a 1 and two node example run with the latest instructions to check all is read to go? :)

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024 1

@floresv299,

Thanks for testing! Can you try again compiling with cmake ... ... -DGPUS_PER_SOCKET=2 -DGPUS_PER_NODE=4 and see if that changes it?

Axel, I just ran another simulation with these flags. I can confirm that this works ! No errors found, running on
#SBATCH --nodes=1
#SBATCH -J WarpX
#SBATCH -p free-gpu
#SBATCH --ntasks-per-node=4
#SBATCH --gres=gpu:V100:4
#SBATCH --cpus-per-task=10

from warpx.

ax3l avatar ax3l commented on August 17, 2024

Hi @floresv299,

Thank you for reaching out! We are glad to help you to get started.

The best way to run on institutional clusters is to use their modules for base dependencies, e.g., compilers, MPI and any dependency we can find, and only add what we need manually.

In general, all (potential and optional) dependencies of WarpX are listed here:
https://warpx.readthedocs.io/en/latest/install/dependencies.html#install-dependencies

I would recommend to start with the following, which you should get from module load on your HPC system:

  • compiler
  • CMake
  • MPI

Let's then get into a few details of your cluster:

  • Is there a public documentation to the machine you are referring to? Can you provide us with that link to guide you?
    • Is this a CPU or GPU machine?
  • Do you already run other codes on the machine?
    • Are there job scripts that you already use?
  • For WarpX, which features do you like to use?
    • GPU acceleration?
    • 1D/2D/3D and/or RZ geometry?
    • Advanced field solvers (PSATD)?

Based on this, we can guide you best how to do the next steps.

Just to share upfront, we document a lot of clusters around the world in our manual's HPC section. We can derive from one that is close to yours and even add your workflow there as well if you like.
https://warpx.readthedocs.io/en/latest/install/hpc.html

from warpx.

ax3l avatar ax3l commented on August 17, 2024

Hi @floresv299, how is it going? :)

from warpx.

ax3l avatar ax3l commented on August 17, 2024

Hi @floresv299 ,

Thanks for the details! Your cluster looks like a Linux-based, Module-provided, Slurm-scheduled system, so we can build instructions similar to Lawrencium (LBNL), adjusting exact module names and versions as needed.

The job scripts did not attach, unfortunately. Can you post them in the GitHub issue?

For features that sounds like we will just do a full-flavored install, including pseudo-spectral field solvers and all geometries :)

from warpx.

ax3l avatar ax3l commented on August 17, 2024

there are some dependancies that I'm not sure what they do or how to get them

So getting started on this, we need to find the software we need in

Since I have no access to that specific system, can you save the output of module avail in a text file and post it here? That way, we can pick what we like and compile the extras we might need.

from warpx.

ax3l avatar ax3l commented on August 17, 2024

Awesome, thanks a lot for the details.

So, I think for a GPU build, module-wise we will need:

# required dependencies
module load cmake/3.22.1
module load gcc/11.2.0
module load cuda/11.7.1
module load openmpi/4.1.2/gcc.11.2.0

# optional: for QED support with detailed tables
module load boost/1.78.0/gcc.11.2.0

# optional: for openPMD and PSATD+RZ support
module load hdf5/1.14.0/gcc.11.2.0-openmpi.4.1.2

# optional: CCache
#module load ccache  # missing

# optional: for Python bindings
module load python/3.10.2

# optimize CUDA compilation for V100
export AMREX_CUDA_ARCH=7.0

# compiler environment hints
export CXX=$(which g++)
export CC=$(which gcc)
export FC=$(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}

This will already build you all of WarpX, besides the RZ geometry with PSATD field solvers.

The only things we need to build (we can write a script) are:

  • c-blosc (for compressed I/O)
  • ADIOS2 (for faster I/O than HDF5)
  • BLAS++ & LAPACK++ for RZ Geometry with PSATD field solver

Do you see an "OpenBLAS" or "LAPACK" module somewhere in module av once you have loaded the modules listed above?

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024

Dear Axel,

I've been trying to build Warpx for the past following days, apologies for the delay. I've gotten the following warnings / errors. I was hoping you would know how to go about them, thank you. I will attach the CMake error logs but essentially both RZ and 1/2/3D give the same error when building.

these are the following warnings:

warning #811-D: const variable "info" requires an initializer -- class "amrex::LPInfo" has no user-provided default constructor
warning #20014-D: calling a__host__ function from a __host__ __device__ function is not allowed
warning #177-D: function "amrex::<unnamed>::IFRTag::box" was declared but never referenced
tmpxft_002e0402_00000000-6_AMReX.compute_86.cudafe1.cpp:(.text+0x2418): failed to convert GOTPCREL relocation against '_ZN5amrex6system13error_handlerE'; relink with --no-relax

Errors

for RZ

gmake[2]: *** [CMakeFiles/app_rz.dir/build.make:388: bin/warpx.rz.MPI.CUDA.DP.PDP.OPMD.QED] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:1092: CMakeFiles/app_rz.dir/all] Error 2
gmake: *** [Makefile:136: all] Error 2

for 1/2/3D

collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/app_3d.dir/build.make:404: bin/warpx.3d.MPI.CUDA.DP.PDP.OPMD.PSATD.QED] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:1369: CMakeFiles/app_3d.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
gmake: *** [Makefile:136: all] Error 2
building_error_cart .txt
Building_errors_RZ.txt
CMake_error_log_cart.txt
Cmake_Error_log_Rz.txt

from warpx.

ax3l avatar ax3l commented on August 17, 2024

Awesome, great news! I'll start a pull request (#4010) documenting the workflow and adding RZ geometry with PSATD field solver support, too.

Just checking, I think if you set

# optimize CUDA compilation for V100
export AMREX_CUDA_ARCH=7.0

# compiler environment hints
export CXX=$(which g++)
export CC=$(which gcc)
export FC=$(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}

and rm -rf build before rebuilding, then the variable AMREX_CUDA_ARCH should have the same effect for you. Can you confirm?

from warpx.

ax3l avatar ax3l commented on August 17, 2024

Hi @floresv299, can you please run

srun -p free-gpu --gres=gpu:V100:4 nvidia-smi

and post the output here? I cannot find how much memory each V100 GPU has and this will show us. What I found so far, there are four V100 GPUs per GPU node (nice!) :)

from warpx.

ax3l avatar ax3l commented on August 17, 2024

Awesome, thanks a lot!

from warpx.

ax3l avatar ax3l commented on August 17, 2024

Splendid, @floresv299!

To complete #4010, can you please try if the job script (hpc3_gpu.sbatch) I posted in there works? Can you post the output for output.txt, WarpX.o*, WarpX.e*?

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024

Axel,

I seem to have reached an error, OPAL ERROR: Unreachable in file pmix3x_client.c . I'm running on 1 GPU to begin. Here is what I get from the WarpX.e file, the rest are blank.
batcherror.txt

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024

I'm not sure what the problem was but I managed to work around it. Looking through the HPC3 documentation for OpenMP Jobs, srun is not included. Here I attached the sbatch script and the output. The WarpX.e and Warpx.0 are empty. The outputs include a folder of diagnostics and a file called warpx_used_inputs. All the data seems to be here

warpx_batch.txt

Warpx_out.txt

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024

Here is the sbatch and output for multi MPI processes.

Warpx_Sbatch_multi_MPI.txt

Warpx_out_8_MPI_processe.txt

from warpx.

ax3l avatar ax3l commented on August 17, 2024

Thanks for testing! 🙏

I am looking at these docs:

right now.

Good point, for this system, mpirun -np $SLURM_NTASKS seems to be the documented way for multi-GPU jobs, so now updated that. In your test above, I am a bit concerend that 8 MPI ranks were started but only 2 GPUs used. The ratio should be 1:1. I think the issue was that you only requested #SBATCH --gres=gpu:V100:1 and not #SBATCH --gres=gpu:V100:4 per node, leaving 3 out of 4 GPUs unused for the two nodes requested (#SBATCH --nodes=2).

Please leave #SBATCH --ntasks=... unset - it will be automatically calculated, so that you can request your jobs in multiples of full nodes and only have to change a single line, #SBATCH --nodes=..., to scale the job size up and down :)

Can you try again to run first on one and then on two nodes, with the latest template that I pushed to #4010?
hpc3_gpu.sbatch.txt

Notes

In case you logged out of the system in between compiling and submission, it is important that you have again the source $HOME/hpc3_gpu_warpx.profile line activated, so all modules are correctly loaded for running.

This comment in the docs seems to suggest we actually might need to use some other flags, but let's try without first.

The comment here on GPU jobs does not apply to us:

#SBATCH --gres=gpu:V100:1         # specify 1 GPU of type V100

GPU number should be set to 1. Nearly 100% of applications on the cluster will use only 1 GPU. None of Perl, Python, R-based applications need multi-GPU. Very few applications can use multiple GPUs in P2P (peer-2-peer) mode. These applications need to be specially designed and compiled with very specific flags and options to be able to use multi-GPU acceleration. A few examples of applications that can do P2P are Amber and NAMD.

Like Amber and NAMD, WarpX does use all GPUs in parallel if we run 4 MPI ranks (or multiples thereof) on the system :)

If you want to run truly small simulations with WarpX, that only use one single GPU for a whole job, set this:

#SBATCH --gres=gpu:V100:1
#SBATCH --ntasks-per-node=1

This will leave both 3/4rds of GPUs and 3/4th of CPU cores of a node unused.

from warpx.

ax3l avatar ax3l commented on August 17, 2024

I merged this first documentation workflow with #4010 in, so it can be rendered to https://warpx.readthedocs.io/en/latest/install/hpc/hpc3.html

We will do follow-up changes based on your tests and HPC support response for potential updates to the job script :)

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024

Axel,

I made the changes to the Warpx profile, removed build directory, and recompiled Warpx. Unfortunately, the problem seems to persist. The same error is shown Multiple GPUs are visible to each MPI rank, but the number of GPUs per socket or node has not been provided.This may lead to incorrect or suboptimal rank-to-GPU mapping.!

from warpx.

ax3l avatar ax3l commented on August 17, 2024

@floresv299,

Thanks for testing! Can you try again compiling with cmake ... ... -DGPUS_PER_SOCKET=2 -DGPUS_PER_NODE=4 and see if that changes it?

from warpx.

ax3l avatar ax3l commented on August 17, 2024

Hi @floresv299, I pushed another update to #4021 - can you try the new job script template and post the output? :)

from warpx.

ax3l avatar ax3l commented on August 17, 2024

This is awesome! Everything looks good to me!

On hint: I think the input file lacks a line diag1.format = openpmd and thus shows a warning about the backend line not being used.

We are all set, glad that you tested with a colleague that the instructions work for them as well 🎉

If you like, please feel free to close this issue and open new ones if you have any questions, suggestions or want to share your results in the future. Happy computing!!

from warpx.

floresv299 avatar floresv299 commented on August 17, 2024

Thank you very much Axel, you and your team are amazing ! Really appreciate all the help.

from warpx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.