lsstdesc / blendingtoolkit Goto Github PK
View Code? Open in Web Editor NEWTools to create blend catalogs, produce training samples and implement blending metrics
Home Page: https://lsstdesc.org/BlendingToolKit/index.html
License: MIT License
Tools to create blend catalogs, produce training samples and implement blending metrics
Home Page: https://lsstdesc.org/BlendingToolKit/index.html
License: MIT License
Once #87 is merged, could someone help me double check the survey parameters in survey.py
. I marked the fields I'm especially concerned about with a #TODO
. Thanks
The generator creates blend lists according to a given strategy. Proposed strategies include simple
Scarlet has recently updated their ExtendedSource
so that symmetric, monotonic
are no longer keyword arguments. I removed them for now so that this code can run, but not sure how to add them back
At present script takes around 1 second to generate 120x120 6 band image of galaxy blend with isolated objects(6 band each) and PSF image. This isn't suffiecient for running large analysis. Possible solutions could be multithreading, parallelizing some code, caching images.
While sprinting on BTK, I have spotted various places in the code that could be refactored
galsim
, ...)btk_input.py
which is currently outdated. This script should be incorporated into the module and be made available to the user upon install via as a command-line script through and entry_point
in the setup.py
Extension of #4 to implement friends-of-friend algorithm in creating blend_list in create_blend_generator
. Currently realistic blends can only be created by identifying groups in a pre-run WLD catalog as shown in notebooks/custom_sampling_function.ipynb shows. A fast on the fly identification of neighbors in a group would eliminate to run the WLD catalog beforehand.
The values for the mean sky level and the exposure time for Roman observatory are placeholders for now, we should investigate at some point to find the correct values.
We discussed about using Galsim WCS instead of Astropy, here is a simple function I have (from this code) to create CelestialWCS with Galsim (no optical distortion here):
def get_wcs(world_position= (0,0), pixel_scale=0.187, img_shape=(51,51)):
""" Get WCS
Give basic WCS as galsim object.
Parameters
----------
world_position: tuple
World position at the center of the postage stamp (ra, dec) in degrees.
pixel_scale: float
Pixel scale of the image in arcsec/pixel.
img_shape: tuple
Final size of the postage stamp (nx, ny).
Returns
-------
wcs: galsim.fitswcs.GSFitsWCS
WCS for the postage stamp.
"""
tot_origin = galsim.PositionD(img_shape[0]/2, img_shape[1]/2)
tot_world_origin = galsim.CelestialCoord(world_position[0]*galsim.degrees,
world_position[1]*galsim.degrees)
affine = galsim.AffineTransform(pixel_scale, 0, 0, pixel_scale, origin=tot_origin)
wcs = galsim.TanWCS(affine, world_origin=tot_world_origin)
return wcs
The reason why it is done this way and not using PixelScale or ShearWCS is to have a celestial projection which might preferable in some cases.. This should give you the equivalent of what you get with astropy.
We wish to implement the possibility to get simulated images from several different surveys, possibly with different pixel scales, in order to allow joint analysis of the data.
At present the input parameters for the blend generation is decides by a class in config.py
that are not saved to disk. It would be useful to have configuration options used for a particular BTK run to live in a small human-readable text file (e.g., yaml) that can be tracked with git for reproducible results.
Reproduce results obtained by the various BTK steps would be a real added value.
So far the introductory notebook makes use of numpy.random.seed()
but because many generators are involved, we should probably use something more scalable like passing around a random generator like numpy.random.RandomState()
.
Images can be generated on the fly and cached using the tf.dataset API, this will be useful for feeding Deep Learning or other TensorFlow-based algorithms.
Here is an example of how to do that with GalSim:
https://gist.github.com/EiffL/f7d06252b90581d3b00d01ea466257e6
We would like to be able to provide such tools for the user interface
At the moment the user interface seems somewhat heavy. A user has to define a BlendGenerator
, an ObservingGenerator
, a DrawBlendGenerator
, which all have pretty similar names and pipe into each other. And this is on top of the catalog
, sampling_functions
, Cutout
and obs_conditions
. This makes a LOT of objects which role I am not completely clear about. At the very least, we should come up with better, much more explicit names for these. While I understand that these classes give us some flexibility and allow the user for quick coding of custom classes I am convinced that this can be simplified.
In an upcoming PR, I propose to rethink the name of these classes at the very least and to think of ways of simplifying the API.
At present the create_blend_generator
makes a random selection from the input catalog and makes a blend with the defined sampling function. While this is useful to generate blend scenes on the fly during training stage, it is not suited for testing on a pre defined test set since every returns a random selection. This would result in same blends being generated upon multiple calls of create_blend_generator
. It would be useful to have a function in utils.py
that returns a group with a certain range of group ids from a pre-run wld catalog. The generator would thus exit upon returning all the entries.
Right now, the whole "compute_metrics" part can be confusing to users ; I believe it would need some working over to get it more consistent with the rest of the code / more understandable for new users.
Some issues :
MetricsParams
class which looks a bit like a "MetricsGenerator", the final product is the run
function which is not ultimately a generator.The install of btk works fine, but then lmfit
is required to run it, so adding these two options to the install instructions would help:
pip install lmfit
conda install -c conda-forge lmfit
Opening this issue to remind us that we would like to incorporate models from https://github.com/McWilliamsCenter/galsim_hub essentially it uses the same information as the catalogs being added in PR #52 so it will be easier to add this change afterwards.
Quick note, these models use neural networks to produce the galaxies in batches, which is a bit different than the current structure in the drawing generator (producing one galaxy at a time using Galsim), some refactoring might be required
[In progress]
Several uses have expressed interest in adding more realistic galaxy images to BTK this requires changing the pipeline (perhaps) significantly.
Ideas:
scarlet
image generation pipeline.blend_catalog
draw_blends.py
in the associated WLD into a class. User can choose who images are generated (from WLD/galsim or through their provided postage stamps)Datasets:
Add script that computes Intersection over Union (IoU) for the segmentation map produced by an algorithm. Estimating the segmentation map should be done in measure
and IoU computation in compute_metrics
.
given how there are certain steps (e.g. pre-commit
) to ensure that a PR passes the CI checks, I think I will make a small contributing document as a readme file (called CONTRIBUTING.md
)
This is for the (near?) future, but I think it would be beneficial to remove the Args
that are used throughout the code (class Simulation_params
) as @herjy suggested to me once. I can see two benefits:
We want functionality to add lensing shear to blends. We can make use of GalSim implementation. Based on our discussion in our BTK telecon, we need to take care of following things.
So far the Survey
objects contain information about a single instrument. In the case of Euclid for instance, there are two instruments with different spatial (and spectral, but that's besides the point) resolutions. We can handle this as a separate Survey
or come up with a nice way of integrating Instruments to Surveys
.
caveats :
API developed during Hack Day at CMU:
btk.get_input_catalog
creates an astropy table containing properties of single galaxies (bulge+disk+AGN).btk.create_blend_generator
creates a generator of lists of blends. Blends are lists of objects (position + properties) within a single post stamp image.btk.create_obscond_generator
creates a generator of observing conditons. This contains the PSF, exposure time, survey filters, rotations of the field.btk.draw_blends
generates the images using GalSim (blend + list of components images).btk.metrics
implements a bunch of metrics to assess the performance of different algorithms.Parts of these tasks are already implemented in the WeakLensingDeblending and simply need some re-organization to provide a flexible training sample generator, fast enough to run on the fly.
np.random.seed
seems that can introduce system dependencies in tests. One idea is to have a catalog with only 1/2 entries so so generator has to return same image every time.Following the discussion at the beginning of the sprint week, we think it could be interesting to generate DC2-like images. François mentioned that it might be easier to generate those from GalSim
instead of going through the butler since DC2-images are initially generated with this package. Here is what I think about the two options:
Generated images with GalSim
might be easier as we have already morphology and color parameters from the DC2 extragalactic catalogs. However, we might lack complexity, at least in PSF models for example.
Going through the butler to generated stamps is not very difficult but requires a bit of time to match observed and extragalactic catalogs if we want to access "truth" properties of observed objects. Moreover if we want images of noiseless isolated galaxies we need to generate them using GalSim
and information from the extragalactic catalog...
We probably need to brainstorm a bit more about this to go further.
I find that unit tests are very redundant and hard to reproduce, in particular some of them are breaking even though the images and the corresponding code is still the same. It's really hard to figure out why though
In general, I'm not sure if we should include scarlet/sep/lsst in the unit tests. I feel that correctness of the images produced should be independent of the algorithms for detection.
I think the testing suite needs an overhaul in terms of deciding whether each test is should actually be included. Here is a list of things that I'm debating whether we should include in the testing suite at all:
whether detection works by various algorithms like sep/scarlet/lsst
group sampling functions from WeakLensingDeblending (these seems like an additional feature to define realistic groups, but maybe we should move away from depending on descwl and definite our own groups as suggested in #16)
most of the config tests are broken, these need to be revived at the same time as #48 (they are more broken now than before because of definite our own Survey objects instead of a string)
Tests should not depend on band order which can be easily switched (architecture in general should use dictionaries for bands?)
The doc for BTK is largely outdated and incomplete ; it needs some working over.
Check whether a given survey is appropriate for a catalog (potentially can be added in DrawBlendsGenerator as a class attribute)
check certain mandatory column names are present in the given catalog and fail with a nice exception if they are not
make all the sigma_X
immutable containers rather than numpy_arrays (in obs_conditions.py
). Although then you can't multiple them so you would need to convert them to np arrays later.
Maybe create some Filter
object for each Survey or something else that accounts for this functionality. @aboucaud and @herjy had some discussion:
why not also define a Filter or Bandpass the same way you define a Survey, with its associated psf_scale, mean_sky_level, exp_time and zero_point ?
I'm rolling back on the Filter objects. We will have to make distinction between the g band from HSC and the g band from Rubin for instance and that will lead to the creation of a lot of objects. Maybe there is a right way to do this, but I suggest to put it in a different PR then.
CosmosCatalog
at all or all these Catalog objects? Related commentsSadely for galsim yes, there are two catalogs that contain slighlty different informations. Though we were thinking of sort of dropping the galsim catalog. It is not actually necessary at this point and we could use the default catalog to draw coordinates and fluxes as it was done before. Galsim cats only matter for the draw single part. But my opinion of catalog is shifting, and I actually don't think that we need to build a CosmosCatalog here, at least, not unless we want to draw HST images. If we want to generate Rubin images, we should rather use the WLD catalog. To generate Rubin images, you actually want a Rubin-compatible catalog with the magnitudes in the Rubin bands, that's what matters.
test_measure
in test_input.py
is broken, I suspect is because the code changed and deprecated the original results with the current np.seed but not completely sure. Skipping for now since all other tests pass and will try to resolve it soon
This issue is meant to be a place for brainstorming about various metrics which will then be implemented in btk.metrics
.
low-level metrics:
high-level metrics involving measurements:
During the last months, we have mainly focused on image generation which is only one part of BTK (albeit an important one). I believe the next big pull request should be about the measurement part which have not been updated for a while (but should still be functional).
I'm putting here the up-to-date flowchart of BTK to show how it works.
The user is supposed to create a subclass of MeasurementParams, with custom get_deblended_images and make_measurement methods (though make_measurement is basically unused in the default implementation), which should return respectively the deblended images and the measurements on the blended images. They give this to the MeasureGenerator, as well as a DrawBlendsGenerator, which when called generates the images and executes the two methods on them (and returns all of it).
For the metrics (actually measuring the performance of the algorithm), they should create a subclass of MetricsParams, several function such as get_detections, get_segmentation, get_flux... which are used to recover the results from the MeasureGenerator (which is given to the MetricsParams). The MetricsParams is then given to the run function, which uses a bunch of utility functions to evaluate the performance.
As you can see, it lacks a bit of symmetry. I think this part definitely need some reworking to get something people can actually and understand and use. Please put your ideas under this issue if you have any.
What I think : the generator structure seems appropriate to me ; I am ok with giving the DrawBlendsGenerator to the MeasureGenerator. I believe there should also be a MetricsGenerator receiving the MeasureGenerator. The MeasurementParams seems a bit heavy to me ; basically it's there to gather several functions made by the users to run on the generated data. I think this could be achieved using a namedtuple (to keep consistency with other parts of BTK), with several "special" attributes corresponding to specific measurements which will be used for the metrics part (eg deblended images, detections, segmentation,...), where users would provide functions they made, and possibly an additionnal attribute if users want to run some other measurements on the data not covered by the metrics part. EDIT : I realized that this would not be a very good way to do it, as often the same algorithm gives both segmentation, deblended images, centers or whatever. Instead, maybe the user should provide an unique function returning the results as a dictionnary with specific keys ?
In the same fashion, the MetricsParams is too heavy, and actually I do not think it really is useful (the fact that it takes in a MeasureGenerator is also very weird.
Anyway, there is some work to be done to sort out the metrics part as there are a lot of functions in it right now, and it is unclear to me right now which are useful and which are not.
At the moment BTK has functionalities to load a catalog and apply selection functions that restrict the the sample of galaxies use to generate patches.
My question is: Do we need in BTK to support this functionality. While adapting a new format for drawing images (from the cosmos galsim images), I realised that the Catalog handling and the selection function were rather the responsibility of the user and I do not see the point for BTK to support this. BTK could very happily work if we were provided a catalog that is already filtered by any means the user deem necessary.
Instead here, we go through the trouble of building an interface for applying selection function to catalogs, which requires some uniformisation and probably more steps for a user who wants to use a specific custom catalog than if these functionalities weren't there. Plus it imposes that we maintain a flexible interface. It is not a problem in itself, but it should be well motivated.
This matter is open for debate and I'm happy to be showed what I might be missing.
pipenv
somehow broke recently in one of the PRs #105 . I will try another way of caching next (poetry has worked well for another project of mine)
Comment from Alexandre:
the way to programmatically run BTK without running into too much details is through the btk_input.py which is currently outdated. This script should be incorporated into the module and be made available to the user upon install via as a command-line script through and entry_point in the setup.py
As discussed with @herjy and @thuiop we will start working towards adding multi-resolution capabilities in BTK, a first step is to enable WCS compatibility in single resolution images.
A small change to get us started is to add WCS information using the code from @herjy into the output dictionary of the draw_blend_generator
.
Here are the steps from what I can see:
Add code to create wcs information in create_observing_generator.py
You can use the galsim image in this line to get the true_center
which seems to be required to run mk_wcs
(see line).
You can store the wcs
information for each galaxy in the blend in a list that you return in run_single_band
, run_mini_batch
and then as part of the dictionary of generate
(all these functions are in draw_blends.py).
@thuiop can you open a PR to address this github issue? It would be great if @herjy can review this PR alongside me.
@thuiop feel free to add more comments to this github issue if you have any questions or encounter any problems along the way.
We want to add a functionality to import realistic PSF. We should first brainstorm how to implement this. I think we can start with the following points.
GalSim
PSF to use, since GalSim
also has realistic PSFs such as the Kolmogorov profile and Zernike aberration model.Cutout
for WLD and COSMOS real images. Should we separately implement the realistic PSF capability for these subclasses, or can we share it among subclasses? I am not familiar with WLD, but it also uses galsim, so it might be possible to go with the latter.Following PR #52 the shifts and fluxes of galaxies generated with cosmos are not handeled. This requires writing the proper code for draw_single
and check how this interfaces with the draw_blend
. To do imminently.
We should change the branch name to the new more standard convention, here are some instructions: https://github.com/github/renaming
create_observing_generator
Creates a generator of characteristics of a given observation for a given survey.
generator outputs exposure time, PSF (fixed for now ?), filters, rotation of stamp.
Could be adapted from WeakLensingDeblending.
This function creates an astropy table containing information that is useful to generate postage stamp images with appropriate distributions of shapes, colors, fluxes, etc. Typical input catalogs will be CatSim generated catalogs or the DC2 catalog, in which case this will use GCR.
Following #87, Scarlet and LSST Stack will no longer be a dependence. Instead, we will create notebooks/documentation that specifically talk about how to use these tools in BTK.
Make a function, btk.measure
, that inputs draw_blends
generator, draws blended images and then runs a measurement algorithm on it to output results. These results would then be analyzed using btk.metrics
to gauge the performance of the algorithm. The measurement algorithm will be a user input function.
For reference, a function to perform measurement with the lsst science pipeline can be described in utils.py
As of now, the available surveys are taken from the all_surveys
variable in obs_conditions.py
; the users cannot give a new survey as an input, even if they use custom obs conditions and blend generator.
In the most recent PR I had to disable the following commands for the most recent version of the Stack to work (see line):
config1.plugins.names.add('ext_shapeHSM_HsmShapeRegauss')
config1.plugins.names.add('ext_shapeHSM_HsmSourceMoments')
config1.plugins.names.add('ext_shapeHSM_HsmPsfMoments')
What should these lines be replaced with?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.