Giter Club home page Giter Club logo

Comments (10)

joaofrancafisica avatar joaofrancafisica commented on June 24, 2024

Job submission script:

#!/bin/bash
##
## ntasks = quantidade de nucleos
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=35
#
## Nome do job. Aparece na saida do comando 'squeue'.
## E recomendado, mas nao necesssario, que o nome do job
#SBATCH --job-name=J1018

echo -e "\n## Job iniciado em $(date +'%d-%m-%Y as %T') #####################\n"

## O nome dos arquivos de input e output sao baseados no nome do job.
## Observe que nao e obrigatorio esta forma de nomear os arquivos.

## Informacoes do job impressos no arquivo de saida.
echo -e "\n## Jobs ativos de $USER: \n"
squeue -a -u $USER
echo -e "\n## Node de execucao do job: $(hostname -s) \n"
echo -e "\n## Numero de tarefas para este job: $SLURM_NTASKS \n"

## Execucao do software
module load python
python pipeline.py

echo -e "\n## Job finalizado em $(date +'%d-%m-%Y as %T') ###################"

from pyautolens.

Jammy2211 avatar Jammy2211 commented on June 24, 2024

Can you tell me what version of dynesty and autofit you are on?

pip show dynesty
pip show autofit

from pyautolens.

Jammy2211 avatar Jammy2211 commented on June 24, 2024

Also the Python version.

I honesty have no idea, paralliezation is a nightmare.

For Emcee, did you follow the `Multiprocessing' or 'MPI' example?

from pyautolens.

joaofrancafisica avatar joaofrancafisica commented on June 24, 2024

Thanks for your prompt answer!

Yeah, sure, here it is:

Python 3.9.12
Dynesty 1.0.1
AutoFit 2022.07.11.1

I tried the 'Multiprocessing' one. Do you think it could work using MPI?

from pyautolens.

Jammy2211 avatar Jammy2211 commented on June 24, 2024

Dynesty supports multprocessing, I was checking if it only worked with MPI.

Can you check the behaviour of an autofit Emcee fit, by changing:

search_0 = af.DynestyStatic(path_prefix = './', # Prefix path of our results
                            name = 'source_parametric', # Name of the dataset
                            unique_tag = '0839_[1]', # File path of our results
                            nlive = 250,
                            number_of_cores=35) 

too:

search_0 = af.Emcee(path_prefix = './', # Prefix path of our results
                            name = 'source_parametric', # Name of the dataset
                            unique_tag = '0839_[1]', # File path of our results
                            number_of_cores=35) 

This will inform me if its a dynesty specific issue or autofit specific issue.

from pyautolens.

joaofrancafisica avatar joaofrancafisica commented on June 24, 2024

It seems to work with Emcee although the gain in performance was not what I was expecting:

My laptop 1 core: ~1000 seconds
Cluster 1 core: ~1000 seconds
Cluster 35 cores: ~500 seconds

Also, during the sampling, the cpus usage were about ~20%.

from pyautolens.

Jammy2211 avatar Jammy2211 commented on June 24, 2024

I will have a think.

It may be worth profiling 4 and 8 cores (for emcee and dynesty). When there are too many cores information passing can overwhelm the speed up and actually cause things to slow down.

from pyautolens.

joaofrancafisica avatar joaofrancafisica commented on June 24, 2024

I think you are right. I tried to run with n_cores=4 and I was able to see the processes. It wasn't as fast as my laptop (694/825 sec) but I will make sure I have the same packages version. Thank you so much!

from pyautolens.

Jammy2211 avatar Jammy2211 commented on June 24, 2024

I think parallelization is probably working ok then, but that it just is not giving a huge speed up.

This is common for runs we do, with a ~x5 speed up across 25 cores (and often slower speed up for more cores). Packages like dynesty don't parallelize particularly efficiently unfortunately.

Your laptop being faster could be because its hardware runs PyAutoLens faster natively than the super computer -- this is not all that uncommon.

from pyautolens.

joaofrancafisica avatar joaofrancafisica commented on June 24, 2024

20221013_011148
Yeah. I noticed that the numba version in the cluster was a bit different from the one in my laptop so I decided to export the environment. For low core numbers, the parallelization is working fine, roughly at the same speed. I will try a higher number again but that's unfortunate that it is not efficient in parallelizations.

from pyautolens.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.