lsstdesc / blendingtoolkit Goto Github PK

View Code? Open in Web Editor NEW

26.0 271.0 10.0 193.2 MB

Tools to create blend catalogs, produce training samples and implement blending metrics

Home Page: https://lsstdesc.org/BlendingToolKit/index.html

License: MIT License

Python 5.01% Jupyter Notebook 94.99%

simulation galaxies tutorials cosmology blending crowded-scenes blended-galaxies

blendingtoolkit's People

Stargazers

Watchers

Forkers

eiffl fjaviersanchez herjy aromazyl aboucaud ismael-mendoza mpaillassa b-biswas mardom matthewdowicz

blendingtoolkit's Issues

Double check survey parameters in survey.py

Once #87 is merged, could someone help me double check the survey parameters in survey.py. I marked the fields I'm especially concerned about with a #TODO. Thanks

Implement create_blend_generator

input : catalog, batch size, sampling strategy
output : generator.

The generator creates blend lists according to a given strategy. Proposed strategies include simple

random choice and positioning on the post stamps of a given number of galaxies
using friends-of-friends implemented in DC2 to find close pairs.

Scarlet notebook ExtendSource updated

Scarlet has recently updated their ExtendedSource so that symmetric, monotonic are no longer keyword arguments. I removed them for now so that this code can run, but not sure how to add them back

Improve time to generate images

At present script takes around 1 second to generate 120x120 6 band image of galaxy blend with isolated objects(6 band each) and PSF image. This isn't suffiecient for running large analysis. Possible solutions could be multithreading, parallelizing some code, caching images.

Refactor the BTK module

While sprinting on BTK, I have spotted various places in the code that could be refactored

imports – enforce absolute imports and reorder imports in files
installation – the installation procedure could be improved, maybe by updating the required dependencies or specifying different level of install (e.g. with or without galsim, ...)
entry points – the way to programmatically run BTK without running into too much details is through the btk_input.py which is currently outdated. This script should be incorporated into the module and be made available to the user upon install via as a command-line script through and entry_point in the setup.py
documentation – automate the generation of doc through CI

Implement friends-of-friends algorithm to generate blends

Extension of #4 to implement friends-of-friend algorithm in creating blend_list in create_blend_generator. Currently realistic blends can only be created by identifying groups in a pre-run WLD catalog as shown in notebooks/custom_sampling_function.ipynb shows. A fast on the fly identification of neighbors in a group would eliminate to run the WLD catalog beforehand.

Get the correct values for Roman Survey object

The values for the mean sky level and the exposure time for Roman observatory are placeholders for now, we should investigate at some point to find the correct values.

Handle WCS with galsim instead of Astropy

We discussed about using Galsim WCS instead of Astropy, here is a simple function I have (from this code) to create CelestialWCS with Galsim (no optical distortion here):

def get_wcs(world_position= (0,0), pixel_scale=0.187, img_shape=(51,51)):
    """ Get WCS

    Give basic WCS as galsim object.

    Parameters
    ----------
    world_position: tuple
        World position at the center of the postage stamp (ra, dec) in degrees.
    pixel_scale: float
        Pixel scale of the image in arcsec/pixel.
    img_shape: tuple
        Final size of the postage stamp (nx, ny).

    Returns
    -------
    wcs: galsim.fitswcs.GSFitsWCS
        WCS for the postage stamp.
    """
    tot_origin = galsim.PositionD(img_shape[0]/2, img_shape[1]/2)
    tot_world_origin = galsim.CelestialCoord(world_position[0]*galsim.degrees, 
                                             world_position[1]*galsim.degrees)
    affine = galsim.AffineTransform(pixel_scale, 0, 0, pixel_scale, origin=tot_origin)
    wcs = galsim.TanWCS(affine, world_origin=tot_world_origin)

    return wcs

The reason why it is done this way and not using PixelScale or ShearWCS is to have a celestial projection which might preferable in some cases.. This should give you the equivalent of what you get with astropy.

Multi-resolution stamps

We wish to implement the possibility to get simulated images from several different surveys, possibly with different pixel scales, in order to allow joint analysis of the data.

set input parameters with a config file

At present the input parameters for the blend generation is decides by a class in config.py that are not saved to disk. It would be useful to have configuration options used for a particular BTK run to live in a small human-readable text file (e.g., yaml) that can be tracked with git for reproducible results.

Ensure reproducibility of image production runs

Reproduce results obtained by the various BTK steps would be a real added value.

So far the introductory notebook makes use of numpy.random.seed() but because many generators are involved, we should probably use something more scalable like passing around a random generator like numpy.random.RandomState().

create jupyter notebook showcasing new workflow

Add tensorflow utility functions to efficiently feed DL tools

Images can be generated on the fly and cached using the tf.dataset API, this will be useful for feeding Deep Learning or other TensorFlow-based algorithms.

Here is an example of how to do that with GalSim:
https://gist.github.com/EiffL/f7d06252b90581d3b00d01ea466257e6

We would like to be able to provide such tools for the user interface

Simplify the user interface

At the moment the user interface seems somewhat heavy. A user has to define a BlendGenerator, an ObservingGenerator, a DrawBlendGenerator, which all have pretty similar names and pipe into each other. And this is on top of the catalog, sampling_functions, Cutout and obs_conditions. This makes a LOT of objects which role I am not completely clear about. At the very least, we should come up with better, much more explicit names for these. While I understand that these classes give us some flexibility and allow the user for quick coding of custom classes I am convinced that this can be simplified.

In an upcoming PR, I propose to rethink the name of these classes at the very least and to think of ways of simplifying the API.

include function in utils that uses certain entries from the input catalog instead of randomly sampling blends

At present the create_blend_generator makes a random selection from the input catalog and makes a blend with the defined sampling function. While this is useful to generate blend scenes on the fly during training stage, it is not suited for testing on a pre defined test set since every returns a random selection. This would result in same blends being generated upon multiple calls of create_blend_generator. It would be useful to have a function in utils.py that returns a group with a certain range of group ids from a pre-run wld catalog. The generator would thus exit upon returning all the entries.

Improve metrics

Right now, the whole "compute_metrics" part can be confusing to users ; I believe it would need some working over to get it more consistent with the rest of the code / more understandable for new users.

Some issues :

When matching the detected galaxies and the true galaxies, the matching is done in two different ways, one using the usual pixel distance, and the other ones renormalizing this distance with the galaxy size ; this leads to a lot of duplicate code, and both of them are returned
The whole generator structure applies only up to the MeasureGenerator ; while there is the MetricsParams class which looks a bit like a "MetricsGenerator", the final product is the run function which is not ultimately a generator.
The comparison of several deblending methods is not supported right now ; it can be done in a sketchy way but there is no clean way to do it.

Cache pip in CI to improve speed of installing dependencies

Fix COSMOS catalog + add unit tests

In #87 the COSMOS galaxy catalog was removed from the __init__ of the CosmosGenerator. I think it should be directly included in the self.catalog. To do imminently after closing #87

lmfit dependency not listed in installation instructions

The install of btk works fine, but then lmfit is required to run it, so adding these two options to the install instructions would help:

pip install lmfit
conda install -c conda-forge lmfit

add functionality for galsim_hub

Opening this issue to remind us that we would like to incorporate models from https://github.com/McWilliamsCenter/galsim_hub essentially it uses the same information as the catalogs being added in PR #52 so it will be easier to add this change afterwards.

Quick note, these models use neural networks to produce the galaxies in batches, which is a bit different than the current structure in the drawing generator (producing one galaxy at a time using Galsim), some refactoring might be required

More general image generation

[In progress]
Several uses have expressed interest in adding more realistic galaxy images to BTK this requires changing the pipeline (perhaps) significantly.

Ideas:

Leverage scarlet image generation pipeline.
User needs to provide postage stamps and respective catalogs so that we can form blends from them. (is this reasonable?)
User should have a way to retrieve postage stamps given the ids in the blend_catalog
Need functions to paste images in postage stamps to scene (scarlet might be able to do this already)
refactor draw_blends.py in the associated WLD into a class. User can choose who images are generated (from WLD/galsim or through their provided postage stamps)

Datasets:

RealGalaxy from Galsim
HSC/HST
DC2
- Use this notebook to obtain truth catalogs
- This notebook has examples of postage stamps.
- Could use FOF for identifying 'groups' like here then create postage stamps for these groups.

Include metrics for segmentation algorithm

Add script that computes Intersection over Union (IoU) for the segmentation map produced by an algorithm. Estimating the segmentation map should be done in measure and IoU computation in compute_metrics.

add a doc for contributing

given how there are certain steps (e.g. pre-commit) to ensure that a PR passes the CI checks, I think I will make a small contributing document as a readme file (called CONTRIBUTING.md)

pip installable

Remove `Args`

This is for the (near?) future, but I think it would be beneficial to remove the Args that are used throughout the code (class Simulation_params) as @herjy suggested to me once. I can see two benefits:

Avoid code coupling, BTK becomes more modular and easier to extend.
Make parameters explicit, it will be clear which configuration parameters are used in which functions.

Add lensing shear to blends

We want functionality to add lensing shear to blends. We can make use of GalSim implementation. Based on our discussion in our BTK telecon, we need to take care of following things.

Lensing shear needs to be added for each object, since shear is different for sources at difference redshift. We could specify shear for each object in an input catalog. In this way, people could add, e.g., in a simplest case a constant shear or in a complicated case variable shear computed from N-body simulations.
We might want to check with Matt Becker things like astrometric shift due to lensing.

Enable instruments in surveys

So far the Survey objects contain information about a single instrument. In the case of Euclid for instance, there are two instruments with different spatial (and spectral, but that's besides the point) resolutions. We can handle this as a separate Survey or come up with a nice way of integrating Instruments to Surveys.

custom PSF and/or noise ?

inputs : blends list, observing conditions
outputs : blended poststamp + single galaxies appearing in the blend ("truth" poststamps :-) )

caveats :

Noise to be added at this stage ?
All are PSF convolved ?

Define the image generation API

API developed during Hack Day at CMU:

btk.get_input_catalog creates an astropy table containing properties of single galaxies (bulge+disk+AGN).
btk.create_blend_generator creates a generator of lists of blends. Blends are lists of objects (position + properties) within a single post stamp image.
btk.create_obscond_generator creates a generator of observing conditons. This contains the PSF, exposure time, survey filters, rotations of the field.
btk.draw_blends generates the images using GalSim (blend + list of components images).
btk.metrics implements a bunch of metrics to assess the performance of different algorithms.

Parts of these tasks are already implemented in the WeakLensingDeblending and simply need some re-organization to provide a flexible training sample generator, fast enough to run on the fly.

refactor unit tests

Prevent writing to disk unless user adds special flag
Make relative paths into fixtures.
recently removed timeouts, is this a problem? Are they useful and should be revived?
organize unit tests into classes so dependency between functions is easier to distinguish
No use of np.random.seed seems that can introduce system dependencies in tests. One idea is to have a catalog with only 1/2 entries so so generator has to return same image every time.

Add DC2-like images

Following the discussion at the beginning of the sprint week, we think it could be interesting to generate DC2-like images. François mentioned that it might be easier to generate those from GalSim instead of going through the butler since DC2-images are initially generated with this package. Here is what I think about the two options:

Generated images with GalSim might be easier as we have already morphology and color parameters from the DC2 extragalactic catalogs. However, we might lack complexity, at least in PSF models for example.
Going through the butler to generated stamps is not very difficult but requires a bit of time to match observed and extragalactic catalogs if we want to access "truth" properties of observed objects. Moreover if we want images of noiseless isolated galaxies we need to generate them using GalSim and information from the extragalactic catalog...

We probably need to brainstorm a bit more about this to go further.

various issues with unit tests

I find that unit tests are very redundant and hard to reproduce, in particular some of them are breaking even though the images and the corresponding code is still the same. It's really hard to figure out why though

In general, I'm not sure if we should include scarlet/sep/lsst in the unit tests. I feel that correctness of the images produced should be independent of the algorithms for detection.

I think the testing suite needs an overhaul in terms of deciding whether each test is should actually be included. Here is a list of things that I'm debating whether we should include in the testing suite at all:

whether detection works by various algorithms like sep/scarlet/lsst
group sampling functions from WeakLensingDeblending (these seems like an additional feature to define realistic groups, but maybe we should move away from depending on descwl and definite our own groups as suggested in #16)
most of the config tests are broken, these need to be revived at the same time as #48 (they are more broken now than before because of definite our own Survey objects instead of a string)
Tests should not depend on band order which can be easily switched (architecture in general should use dictionaries for bands?)

Documentation

The doc for BTK is largely outdated and incomplete ; it needs some working over.

Various issues related to new Survey/Catalog objects

Check whether a given survey is appropriate for a catalog (potentially can be added in DrawBlendsGenerator as a class attribute)
check certain mandatory column names are present in the given catalog and fail with a nice exception if they are not
make all the sigma_X immutable containers rather than numpy_arrays (in obs_conditions.py). Although then you can't multiple them so you would need to convert them to np arrays later.
Maybe create some Filter object for each Survey or something else that accounts for this functionality. @aboucaud and @herjy had some discussion:

why not also define a Filter or Bandpass the same way you define a Survey, with its associated psf_scale, mean_sky_level, exp_time and zero_point ?

I'm rolling back on the Filter objects. We will have to make distinction between the g band from HSC and the g band from Rubin for instance and that will lead to the creation of a lot of objects. Maybe there is a right way to do this, but I suggest to put it in a different PR then.

Question: Do we need CosmosCatalog at all or all these Catalog objects? Related comments

Sadely for galsim yes, there are two catalogs that contain slighlty different informations. Though we were thinking of sort of dropping the galsim catalog. It is not actually necessary at this point and we could use the default catalog to draw coordinates and fluxes as it was done before. Galsim cats only matter for the draw single part. But my opinion of catalog is shifting, and I actually don't think that we need to build a CosmosCatalog here, at least, not unless we want to draw HST images. If we want to generate Rubin images, we should rather use the WLD catalog. To generate Rubin images, you actually want a Rubin-compatible catalog with the magnitudes in the Rubin bands, that's what matters.

Broken test in test_input.py

test_measure in test_input.py is broken, I suspect is because the code changed and deprecated the original results with the current np.seed but not completely sure. Skipping for now since all other tests pass and will try to resolve it soon

Brainstorm and implement a suite of metrics

This issue is meant to be a place for brainstorming about various metrics which will then be implemented in btk.metrics.

low-level metrics:
- detection: detection of a blend, detection of the correct number of objects in a blend, accuracy of the position of the centers of objects
- segmentation: quality of the masks
- deblending: pixel-wise comparison, blendedness (as defined in the Scarlet paper)
high-level metrics involving measurements:
- flux/colors
- photo-z: bias
- size and shape
- shear:
  - selection effects
  - exclusion bias
  - impact on shear estimators

Reworking the measurement and metrics part

During the last months, we have mainly focused on image generation which is only one part of BTK (albeit an important one). I believe the next big pull request should be about the measurement part which have not been updated for a while (but should still be functional).
I'm putting here the up-to-date flowchart of BTK to show how it works.

The user is supposed to create a subclass of MeasurementParams, with custom get_deblended_images and make_measurement methods (though make_measurement is basically unused in the default implementation), which should return respectively the deblended images and the measurements on the blended images. They give this to the MeasureGenerator, as well as a DrawBlendsGenerator, which when called generates the images and executes the two methods on them (and returns all of it).
For the metrics (actually measuring the performance of the algorithm), they should create a subclass of MetricsParams, several function such as get_detections, get_segmentation, get_flux... which are used to recover the results from the MeasureGenerator (which is given to the MetricsParams). The MetricsParams is then given to the run function, which uses a bunch of utility functions to evaluate the performance.
As you can see, it lacks a bit of symmetry. I think this part definitely need some reworking to get something people can actually and understand and use. Please put your ideas under this issue if you have any.

What I think : the generator structure seems appropriate to me ; I am ok with giving the DrawBlendsGenerator to the MeasureGenerator. I believe there should also be a MetricsGenerator receiving the MeasureGenerator. The MeasurementParams seems a bit heavy to me ; basically it's there to gather several functions made by the users to run on the generated data. I think this could be achieved using a namedtuple (to keep consistency with other parts of BTK), with several "special" attributes corresponding to specific measurements which will be used for the metrics part (eg deblended images, detections, segmentation,...), where users would provide functions they made, and possibly an additionnal attribute if users want to run some other measurements on the data not covered by the metrics part. EDIT : I realized that this would not be a very good way to do it, as often the same algorithm gives both segmentation, deblended images, centers or whatever. Instead, maybe the user should provide an unique function returning the results as a dictionnary with specific keys ?
In the same fashion, the MetricsParams is too heavy, and actually I do not think it really is useful (the fact that it takes in a MeasureGenerator is also very weird.
Anyway, there is some work to be done to sort out the metrics part as there are a lot of functions in it right now, and it is unclear to me right now which are useful and which are not.

Do we need to support catalog building in BTK

At the moment BTK has functionalities to load a catalog and apply selection functions that restrict the the sample of galaxies use to generate patches.
My question is: Do we need in BTK to support this functionality. While adapting a new format for drawing images (from the cosmos galsim images), I realised that the Catalog handling and the selection function were rather the responsibility of the user and I do not see the point for BTK to support this. BTK could very happily work if we were provided a catalog that is already filtered by any means the user deem necessary.

Instead here, we go through the trouble of building an interface for applying selection function to catalogs, which requires some uniformisation and probably more steps for a user who wants to use a specific custom catalog than if these functionalities weren't there. Plus it imposes that we maintain a flexible interface. It is not a problem in itself, but it should be well motivated.

This matter is open for debate and I'm happy to be showed what I might be missing.

cache pip

pipenv somehow broke recently in one of the PRs #105 . I will try another way of caching next (poetry has worked well for another project of mine)

btk_input is broken and cannot be currently used

Comment from Alexandre:

the way to programmatically run BTK without running into too much details is through the btk_input.py which is currently outdated. This script should be incorporated into the module and be made available to the user upon install via as a command-line script through and entry_point in the setup.py

add wcs (part 1)

As discussed with @herjy and @thuiop we will start working towards adding multi-resolution capabilities in BTK, a first step is to enable WCS compatibility in single resolution images.

A small change to get us started is to add WCS information using the code from @herjy into the output dictionary of the draw_blend_generator.

Here are the steps from what I can see:

Add code to create wcs information in create_observing_generator.py
You can use the galsim image in this line to get the true_center which seems to be required to run mk_wcs(see line).
You can store the wcs information for each galaxy in the blend in a list that you return in run_single_band, run_mini_batch and then as part of the dictionary of generate (all these functions are in draw_blends.py).

@thuiop can you open a PR to address this github issue? It would be great if @herjy can review this PR alongside me.

@thuiop feel free to add more comments to this github issue if you have any questions or encounter any problems along the way.

Add functionality for realistic PSF

We want to add a functionality to import realistic PSF. We should first brainstorm how to implement this. I think we can start with the following points.

Format: The most flexible format would be an image. We may also want a capability to talk to the DM stack directly. We could add a capability to specify which GalSim PSF to use, since GalSim also has realistic PSFs such as the Kolmogorov profile and Zernike aberration model.
We have subclasses of Cutout for WLD and COSMOS real images. Should we separately implement the realistic PSF capability for these subclasses, or can we share it among subclasses? I am not familiar with WLD, but it also uses galsim, so it might be possible to go with the latter.

Correct shifts and magnitudes for galsim galaxies

Following PR #52 the shifts and fluxes of galaxies generated with cosmos are not handeled. This requires writing the proper code for draw_single and check how this interfaces with the draw_blend. To do imminently.

changer master branch to main branch

We should change the branch name to the new more standard convention, here are some instructions: https://github.com/github/renaming

Implement observing conditions generator

create_observing_generator

input : survey name (Possibility for the user to give a table of properties ?)
output : generator

Creates a generator of characteristics of a given observation for a given survey.

generator outputs exposure time, PSF (fixed for now ?), filters, rotation of stamp.

Could be adapted from WeakLensingDeblending.

Implement get_input_catalog

input: cat name
typical output : astropy Table( 'id', 'ra','dec', bulge props with SED, disk props with SED, AGN props with SED)

This function creates an astropy table containing information that is useful to generate postage stamp images with appropriate distributions of shapes, colors, fluxes, etc. Typical input catalogs will be CatSim generated catalogs or the DC2 catalog, in which case this will use GCR.

Stack notebooks

Following #87, Scarlet and LSST Stack will no longer be a dependence. Instead, we will create notebooks/documentation that specifically talk about how to use these tools in BTK.

Add detection/deblending/measurement algorithm

Make a function, btk.measure, that inputs draw_blends generator, draws blended images and then runs a measurement algorithm on it to output results. These results would then be analyzed using btk.metrics to gauge the performance of the algorithm. The measurement algorithm will be a user input function.

For reference, a function to perform measurement with the lsst science pipeline can be described in utils.py

Allow for custom survey input

As of now, the available surveys are taken from the all_surveys variable in obs_conditions.py ; the users cannot give a new survey as an input, even if they use custom obs conditions and blend generator.

Stack config plugins are broken

In the most recent PR I had to disable the following commands for the most recent version of the Stack to work (see line):

config1.plugins.names.add('ext_shapeHSM_HsmShapeRegauss')
config1.plugins.names.add('ext_shapeHSM_HsmSourceMoments')
config1.plugins.names.add('ext_shapeHSM_HsmPsfMoments')

What should these lines be replaced with?

lsstdesc / blendingtoolkit Goto Github PK

blendingtoolkit's People

Stargazers

Watchers

Forkers

blendingtoolkit's Issues

Recommend Projects

Recommend Topics

Recommend Org