brutishguy / pyfriends Goto Github PK

Python implementation of the paper "PyFriends: The First Fully Generalized Friends-of-Friends Extragalactic Galaxy Group Finder", using a Friends-of-Friends (FoF) algorithm for galaxy group detection, augmented by graph theory approaches.

Python 100.00%

pyfriends's Introduction

pyfriends

A detailed description of the algorithm can be found in the paper above linked on ArXiv.org.

Installation

Download the repository through Git (For Windows, you can download Git Bash For Windows here).

git clone https://github.com/BrutishGuy/pyfriends.git

Data

Example data has been included in the ./data/ folder of this repository. It follows from Macri et al.

Execution

To execute the code, one must modify the config.text file to set necessary parameters for the run. These are already set to reasonable parameters.

Detailed explanation on these parameters will follow.

To run the code, simply execute the file Py2Friends.py through the command line or your favourite editor, ensuring your working directory is set to the repository directory, such that config.txt is in your working directory. Then, simply run

python ./src/Py2Friends.py

For any issues or feature requests, please log an issue on this Github repository.

pyfriends's People

Contributors

Stargazers

Watchers

Forkers

wolffem

pyfriends's Issues

"here" link in readme is not linked to anything

For windows users .... click here doesn't do anything

Update Config.txt to include v0i and v0f.

This includes making sure it's read in properly in Funcs.py as well as in the main program.

Vanilla FoF

Need to add a vanilla FoF thing so users who don't want to trust the graph theory can make use of the traditional HG82 version of the FoF algorithm without some features

no updating position by averaging
no multiple runs (not needed since removing the position averaging will ensure all runs are identical)

Parallelization of FoF function

Really we only need to parallelize the "FoF" function over the number of runs which are taking place. Maybe this can even be a user option? Multiprocessing = True or something like that.

I don't know what the maximum we could do but we should just do that.

This really only effects users who want to run the algorithm many times and average over using graph theory, which may not be everyone but the Vanilla FoF algorithm should be included for people who want to use just that.

So lets parallelize the FoF function in the Py2Friends.py That should bring 100 runs down to 10 minutes or so, which it is now in anycase. Plus the Graph Theory is already pretty fast after being vectorized properly and should be able to average over all the runs in about 30 - 90 seconds.

Syntax error Py2Friends.py

There are two instances of <> in the Py2Friends.py program that cause a syntax error immediately.

My guesses are
if len(checklimit)<>0: should be if len(checklimit)>0:
while list(friends_after) <> list(friends_before) and iterations<20: should be while list(friends_after) > list(friends_before) and iterations<20:.

Include Flagging and Correcting Faux Connections

Need to flag problematic cases where large groups are found as a single group because of a small number of connections between the 2.

We have done this previously by post processing the output files but this should be part of the main program such that it is independent of output.

Need Python3 upgrade

Need Documentation

We need documentation for users

Overall Speed of the initial Algorithm takes way too long.

So far for 45 000 galaxies the algorithm takes about 10-15 minutes to run 10 times. Ideally at least 100 runs would be done meaning about 100 minutes.

Need to speed up the overall time of the actually FoF algorithm.

Obvious solution is to parallelize running the different trials but I think we can speed up the whole thing before we parallelize and then future proof using parallelization.

For 45 000 galaxies, the time taken should be under 2 mins for 100 runs. This would mean future large surveys with 100's of thousands of galaxies would be handled well.

I don't want to use C :'(

Can we get some scripts <3 thanks :)

Param File Generator.

I think the best way to get the param file implemented into the package is to create a param file generator that will spawn a param file template which the user can then update.

The param file can be generated blankly or alternatively we can include some default values. Specifically for specific type of surveys with Shecter params included.

Output files should be managed better than just being in the main script

The main body of the program, running the algorithm is actually very short. Writing the numerous outputs takes up the most space by far. Perhaps a class with .totxt() .tocsv() .toIAPUC() .toPartiView() etc. Anytime someone makes a request for a new type of output type we should be able to just add it to the class?

For now those three should do well.