Giter Club home page Giter Club logo

flows's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

flows's Issues

Instrument class

  • Split instruments into each of their own file
  • Add color-term interface to each Instrument file, also gracefully failing when it doesn't exist. (This will be used by the upcoming color-terms code).
  • Initial guesses for PSF should be overriden by Instrument class instance, and this should be used in PSFBuilder or Photometry/PhotometryManager.
  • add documentation for creating a new instrument
  • update docstrings

tendrils integration:

  • Tendrils requires python 3.10 due to annotations, so either move flows to 3.10 or move tendrils to using 3.9 compatible ones.
  • tns.py tests need the bot credentials locally. These should be written.
  • tendrils should copy all sections of a config so that it can be used by the pipeline as well to copy database secrets.
  • Add test that fails when tns config isn't properly set up.
  • after this remove redundant internal tns.py

Implement Reproducible Imagematch Wrapper

This issue is to keep track of implementing a reproducible wrapper for imagematch. I will use the config file to keep track of this and all of this should fit into the run_imagematch.py script.

Currently, my plan is to make the run_imagematch function more agnostic and general. Doing checks on the config should be abstracted away to another function, and a final config file should be produced that gets fed into run_imagematch. I have all this in a Jupyter notebook, so I'll move this over to flows now.

The end goal is that we should be able to simply pass around these config files and the template and science images, and get the same subtracted image as a user submitted one.

I'll also be using the forked version of imagematch for this, although Chris might merge this back into official imagematch down the line.

For now, we agreed on making the config file have the name of the science image, and some sort of increment to keep track of different versions.

Refactor load_image

load_image now contains sites/instruments, as well as the FlowsImage dataclass. This should all be refactored out of load_image.

Change to supporting or requiring black for code style.

Is your feature request related to a problem? Please describe.
Implementing a consistent code format and avoiding code format related reviews and delays is a chore.

Describe the solution you'd like
To avoid needless code style reviews and edits and manual changes, I propose we move to using python black, as it is now considered stable and is becoming widely used. Black is a highly opinionated code formatter, which will help us have consistent code formatting.

More info: https://black.readthedocs.io/en/stable/

It has it's own CI workflow that integrates with GitHub so we can easily add it in and get uniformly formatted code for the whole project. It has its own test suite etc. Widely used. Great!

Describe alternatives you've considered

Alternative is to keep it as it is and have vastly different code formats across the same file, or spend time manually editing each contributors code formatting habits. There are also other alternatives but none as widely used and supported as black.

Additional context
It's very easy to set up your IDE to use black, or to set up a pre-hook commit e.g. on GitKraken to format your code after you write it. We should also use a github action that checks against black (not sure if it can auto suggest changes and a new commit like some other GitHub actions.

One problem is that moving to black means using spaces instead of tabs, so the project files need to be changed, and the editor config. I have opened PR #42 to address this.

Improve bad star rejection

The current bad star rejection based on R^2 testing in #20 is very hacky. This should be improved to both use a better goodness of fit measure as well as to remove the unnecessary looping, unclear expectations, etc.

We could move to using some AIC/BIC criteria. Alternatively or in addition, we also can make the minimum references (and possibly the rejection criteria whether AIC/BIC/R^2) a user-supplied criteria instead of a hard-coded value. (Passed in during run_photometry.py), with some sensible but conservative defaults.

update lightcurve ecsv file headers

Describe the bug
As of sometime (possibly astropy 4.3) ecsv now requires string, instead of str, in the metadata headers. Wherever these are being generated should be updated. There's a closed bug on astropy where Objects being saved using a table were being converted to str, and then subsequently were unable to be read since table.Table now requires a string (or other) formats, and gives an error on str.

To Reproduce
Steps to reproduce the behavior:

  1. import flows api
  2. api.get_lightcurve() with any targetid.
  3. Astropy throws an error (tested using Astropy 4.3.1), lightcurve is not read into a table and temporary file is deleted after code exit due to error.

Expected behavior
I should get the lightcurve of the target in an astropy table.

Fix
I made a PR #36 that works around the issue for downloading the lightcurves for now. We should update the generator of the .ecsv files so that this PR can be reverted.

Catalog generation/pointing

Problem: As flows has evolved, the pipeline has become more of a general SN photometry and candidate tracking one than the initial limited scope one for VLT + supporting observations we envisioned. Given that scope, how does pointing and OB generation fit in? How do we currently do our cuts, and how should we expose this to users to create their own local sequence lists from our catalogs? This might need to be merged with #22 to make sense.

Proposal:

Initially, we decided to use the most restrictive instrumental set-up to create the reference list via its pointing criteria. Meaning using the (VLT) NIR to set the conditions for the optical. But with the use of color-terms, this should be unnecessary. Ultimately, we probably want to allow a unique reference list for each instrument and target combination based on catalog reference star colors, magnitudes, and positions.

For now, the default should be to use the NIR instrument to set the criteria for the target, and to use that reference list also for the optical, as it is currently done. However, many objects won't have VLT data and need a different (less restrictive) criteria. So we need a way to enter a criteria for each object, and failing that, using a conservative default. I know we currently have some dummy defaults.

But ideally, we should test using a separate cut-off for each target/instrument pairing. This would need to go into the catalogs script. When fetching the catalog, we would also feed in the current instrument, and it would search for a different set of criteria, and fallback to the default for the object if it doesn't exist. Something like that.

The querying of the catalog around the object is currently wide enough to accommodate basically any professional ground based telescope that would be used with FLOWS, so I think we can leave that in place as is. But the creation of the final reference list from that initial circular region needs to be customized for each instrument.

DAOFIND detection issues

We find many spurious star detections in bad images, especially in the NIR but also in some shallow optical images. We need to reject these bad (non)stars.

Some ideas we brainstormed:

  • Look at DAOFIND parameters again.
  • Use the initial 2D gaussian fit to ensure a good match in case of bad daofind result.
  • Possibly segment out daofind criteria per instrument or per filter.

I think the initial gaussian fits we do to get the FWHM are probably underutilized. They can probably help us solve this. For example, looking the the 2D gaussian fit parameters and doing some basic consistency checks can probably get rid of 99% of the spurious detections.

Bug report (from LLuis)

Bug report attached, I tracked this down to sep but properly fixing it is now straightforward (we would need to monkeypatch sep).

2022-05-04 13:00:04 - INFO - Placing output in '/flows/photometry_workdir/2020hvf/14683'
2022-05-04 13:00:04 - INFO - Load image '/flows/archive/2020hvf/SN2020hvf_20210420_H.fits.gz'
2022-05-04 13:00:08 - ERROR - Photometry failed
Traceback (most recent call last):
File "run_photometry.py", line 49, in process_fileid
photfile = photometry(
File "/usr/users/kasoc/flows/flows/flows/photometry.py", line 188, in photometry
objects = sep.extract(image.image - sep_background,
File "sep.pyx", line 737, in sep.extract
File "sep.pyx", line 289, in sep._assert_ok
Exception: internal pixel buffer full: The limit of 300000 active object pixels over the detection threshold was reached. Check that the image is background subtracted and the detection threshold is not too low. If you need to increase the limit, use set_extract_pixstack.

Increase test coverage

Increase test coverage especially of our main photometry. Check OUTPUT as well on some images.

Add option to save and re-use recalculated WCS

This can cut down on memory requirements per thread by over 80%. It would also allow an initial run with much larger memory and timeout resources to save the WCS, then subsequent re-runs using the improved WCS.

The WCS needs to save the pipeline version it was created with to ensure it's re-created appropriately when running with a new version.

This should be saved to the database as well.

Candidate Marshal Alpha

Let's use this issue to keep track of progress on the candidate marshal.

  • Candidate page
  • Marshal table view
  • Upload API
  • Auto-population
  • Speed

Comments:

  • All seem happy with the candidate page.
  • The table view stamp should show a more zoomed in image, perhaps 3-5 arcmins instead of 20.
  • Upload seems to work well. Perhaps we should use a smaller separation than 1 arcmin to reject nearby candidates.
  • We can use #11 to auto-populate from TNS.
  • Any hacks to speed-up fetching the ZTF lightcurves, visibility plots, etc?

Make it easier to add sites

We should allow adding sites/instruments modifying filters, fits keywords etc. all via GitHub only. I think the best way to do this is to change the sites to be .py files, that one can open a PR for a new site. However, in order to have sites available to the API, there should be a helper script that goes through each site and writes out info into some file. We should read and use that file in flows database, and the flows API calls to the sites.php should now grab this info from there.

This'll help us quickly add changes to sites if we need to, for example, pull in new header info with differing header keywords for each instrument, that we want to use when developing a new addition to the code, without having to get this added for each instrument behind the scenes via a request to Rasmus. That would be annoying quickly especially if we ended up not needing the changes in the end..

what do you think @rhandberg ?

Add lightcurve generation script

LC generation has been removed from run_plotlc.py. It will now be in its own script. Ideally, this script uses the API so no local LC files are needed.

Too many tests per release

I want to move to making more frequent releases, but there are 5 rounds of the same CI tests being run per release with the current workflow. Need to cut down on that using GitHub actions.

Downloading catalogs: CasJobs memory error

When downloading the REFCAT2 catalog in a crowded field (e.g. 2020oij), the CasJobs query will fail with a memory limit error when downloading the full 24 arcmin cone in one go. We need to be able to catch this and break the query down in several smaller ones.

Add release workflow to README

When devel is ready, from @rhandberg:

  1. Update the VERSION file with the new version-name (master-vX.Y.Z), push the change up.
  2. Ensure the tests pass on GitHub.
  3. Merge the devel branch into the master branch.
  4. Create tag on new master branch (vX.Y.Z).
  5. Merge the new master branch back into the devel branch. This ensures that the tag is in the history of the devel branch.
  6. Once CI/CD tests pass on tagged master branch, go to GitHub releases to review new release.

Get even better star rejection by interpolating over nans

Currently, nan's in the image can break the star rejection algorithm due to astropy.fitting limitations (underlying scipy.optimize limitations), where nan's would cause an early exit close to the guess.

As of astropy 5.1, nan's raise an Exception. However when we try to mask this and do the fit, the fit is not very successful and can fail.

Use scipy.interp2d to interpolate missing values if only a small fraction of pixels are nan, else reject the star.

User supplied sequence/catalog

This is a feature requested by Max. Easiest way to implement seems to be to add a way to define a local sequence by supplementing the api.catalogs call in flows/photometry.py with a local/custom catalog.

The catalog would need ra, dec, epoch, frame, proper motions, and magnitudes in relevant bands. What else? I guess we can generate a unique custom starid for each. It would also need to match the astropy.table formatting of flows' native catalog.

Remove support for py 3.6

support for py 3.6 needs to be dropped. It's missing many core features like static typing etc. I ideally actually want to move to 3.9+ for the pipeline part but we should be able to support 3.7/3.8 by using some libraries like typing or from __future__... imports, as necessary.

As a note, astropy v5.0+ only supports python 3.8+, so we should at least get parity there soon.

TODO

  • Remove py 3.6 from github actions
  • Update tests.
  • Remove any references to py 3.6, docs, tests, etc.
  • Decide on whether to remove py 3.7 as well.

Color Terms

We need to add in color terms post photometry to calculate natural system photometry. This means keeping track of the color terms used for each image in some way. The simplest is to of course append the color terms to the eventual file header.

This also means obtaining color terms for each instrument in the flows pipeline and keeping those updated and ensuring that the right color terms go with the right images, given that color terms change with time for most instruments. That's a significant burden and we need to discuss how to make it feasible.

Make it easy to add or modify instruments

Currently it's difficult to add instruments or modify them if their headers change.

There should be 1 place and a standard set of inputs for each instrument so this is easy to do in the future.

Remove API scripts, add pointer to SNflows/tendrils, continue re-organizing code.

https://github.com/SNflows/tendrils is the new API endpoint.

Now a discussion should be had about which API endpoints should actually be in there (e.g., should photometry be its own separate module?) Should query TNS/Alerce/etc. get some functionality there? (I think so).

After we decide, then the ones from here should be removed, and a pointer should be added to README and onto the wiki/website.

Finally, there are a bunch of scripts here that are not really part of the flows photometry pipeline, but are more helper scripts for e.g. populating the marshal. I propose that they should be moved to either flows-tools, or a separate module. The end goal is to isolate only the photometry and related pipeline scripts here, make it a version LOCKED module, and then release it as stable V1.0. Until then we should go to v0.10, etc.

  • Move any remaining API-like scripts to tendrils (or decide where to put them, not in pipeline)
  • Decide on modules for tendrils (should photometry API be its own?, TNS? Alerce?)
  • Remove API scripts from flows pipeline, refactor code to reflect change.
  • Move non-pipeline scripts to a separate repo or sub-module (recommend repo so we can easily make a pipeline package from the remaining and LOCK all dependencies). Maybe a marshal repo, or into flows-tools. Such as run_querytns, run_ztf, plot_lc, etc.
  • Make V1.0 of photometry pipeline with as few dependencies as possible and with locked versions. Release as package.

Astropy >= 4.2 will cause us to drop support for Python 3.6.

This will cause us to have to drop support for Python 3.6. This will cause problems for some of the systems currently running the automatic parts of the code, until we can get them upgraded. What is the reason for the strict "astropy >= 4.2" requirement?

Originally posted by @rhandberg in #26 (comment)

Let's keep track of this issue here. My nearest neighbor WCS correction backup requires a function in Astropy 4.2. In general, it's good to have access to later Astropy versions as it leaves us more flexible to implement ready-made solutions and functions from them into the pipeline.. However, it's also possible to just copy the one functionality we need and not require Astropy 4.2.

Astropy 4.2 may also be needed for latest version of Photutils (not sure, we should check), which does fix some bugs, I don't know if they are relevant or not.

However, let's hold off this workflow breaking update for now.

Change TNS query string to a dictionary

Change the TNS query string to dictionary with a base url string and a data dict. Also need to know whether the null/empty keys can be ignored and whether the dictionary can be submitted in any order by requests.

This is an enhancement on #11 and is a post-alpha step for #12. It's not needed to be done to get the basic system working.

Zeropoint/calibration error bars

I'm creating an issue to track it here based on our discussions from Ebeltoft. When calculating the zero point error, we should really use the standard error of the mean instead of just the standard deviation.

I've also added a bootstrap method for doing the zero point fitting that should be more robust to the kinds of errors we saw. I'll make a pull request when I've tested it and added all this in.
Meanwhile, here is the link to the feature branch.

WCS checking, SEP, possible WCS corrections.

It would be great if flows did a basic WCS check, and even tried to correct for it if the photometry failed.

I think we should update run_photometry.py to try to automatically correct the WCS if the photometry fails. In that case, the new WCS would also need to be added to the image header. We shouldn't do this unless the photometry fails, and we should still require users to submit images with correct WCS. But sometimes it's obviously just a little bit wrong, and it would be possible to deal with this automatically.

I propose we use some combination of sep and astroalign/reproject for this. The problem with daofind is needing the FWHM estimate, and it not performing well on variable background as one gets from stacked & dithered NIR images. sep.extract can deal with it using matched filtering and should be quite fast.

If we use sep to do the source detection, then we can calculate offsets and reproject the image using e.g. astroalign, and saving a copy, or adjusting the WCS. Using the matched filtering algorithm of sep has the additional benefit that we can then check those sources using our forced 2d gaussian fitting and clean bad stars, etc. and get a very good FWHM estimate. At that point, we remove the daofind steps from the pipeline, make it possible to maybe fix WCS when it is a bit off, and keep our bad star rejection in place, without adding a lot of overhead.

Moving to sep for doing the initial source extraction has an issue in that the Cython code underneath can sometimes fail in not a very nice way, (for example running out of hard-coded memory allocation or dividing by zero) which we would have to catch. Anyway, I think it's worth testing out because it can give us a better bad star rejection and make it possible to do a basic WCS check, and even a basic WCS correction, on all the images by default.

Refactor photometry.py

Photometry.py should be refactored to take in the new FlowsImage dataclass, and be re-written in an OOP format.

Key things to initially abstract out:

  • Zero point and field calibration
  • Output I/O
  • File I/O,
  • Template subtraction
  • WCS checking/recalculation
  • Reference star instantiation and cleaning
  • (Including Star detection and Catalog matching)

Issues to tackle:

  • Add in WCS calculation via astrometry.net api.
  • Handle proper motion
  • Dealing with local archive.
  • Add basic try catch for sep errors (or try to move to photutils for this).
  • uncertainty estimation from background rms.
  • Initial guesses for PSF should be saved in FlowsImage and be overriden by Instrument class instance. (Alt. both Image and Instrument instances should be used by Photometry class).

Uploading revised versions of an image

We need to allow the user to upload a revised version of an image already in the inbox. This situation may happen for example if a corrupted file is uploaded to inbox.

The relevant code is here. It should instead either:

  1. upload the new version and increment some identifier
  2. possibly overwrite the current version with after a confirmation prompt.
if os.path.exists(newpath):     
   logger.error("Already exists")      
   if uploadlogid:      
      db.cursor.execute("UPDATE flows.uploadlog SET status='Already exists: file name' WHERE logid=%s;", [uploadlogid])     
      db.conn.commit()    
      continue      

There's also the related issue of adding in new version of files already in the archive. For that, I definitely prefer option 1 over 2, for reproducibility.

Investigate fitter and recentering_boxsize parameters in epsf_builder

Currently called like so on:

flows/flows/photometry.py

Lines 110 to 118 in 01baac6

logger.info("Building ePSF...")
builder = self.epsf_builder(
oversampling=1, shape=1 * self.star_size,
fitter=EPSFFitter(fit_boxsize=max(int(np.round(1.5 * self.fwhm)), 5)),
recentering_boxsize=max(int(np.round(2 * self.fwhm)), 5),
norm_radius=max(self.fwhm, 5), maxiters=100,
progress_bar=multiprocessing.parent_process() is None and logger.getEffectiveLevel() <= 20
)
epsf, stars = builder(stars)

In practice this means 5x5 maximum, which may be unsuitable for some NIR images!
The recentering_boxsize is also the same.

The maximum was put in place to get infront of bad FWHMS creating a poor EPSF, which is way more common than needing a larger boxsize.

As step 1, this maximum should be based on the typical FWHM per instrument, and instruments with poor seeing or large FWHM or small pixels should have this increased!

Step 2, Investigate whether a max that is less than the star size is even needed.

Step 3, There should be a better way to set the max than statically per instrument. (Perhaps just use historical data from pipeline to update this dynamically), or something else to ensure we have a good FWHM.

Change BasicPSFPhotometry fit shape to be a much smaller value

See : astropy/photutils#1130.

New value should likely be based on the FWHM.

The difference in error as a function of the fit_size needs to be investigated. If systematically different, then try to also figure out why.

Fitsize doesn't have to be the whole star-cutout; it can be really just the central pixels, enough to fit a PSF to (so not just the center).

Add transformation from 2MASS system to each telescope natural system

[Issue moved from old repo. Originally posted by @lgalbany ]

As discussed on Nov 4th in Barcelona, it would be good to have already included in the pipeline the transformation between the catalogued 2MASS stars catalogs to the system of each of the telescopes, so then the SN gets calibrated directly in the natural system of the telescope.

This basically means that color terms are included and J/H magnitudes form 2MASS are transformed to CAHA/NOT/NTT/CMO/... before being used to calibrate the SN.

Lock pipeline dependencies to sensible semversions.

The photometry pipeline needs to be cleaned from the API and other parts, and should be locked to sensible but not too old semversion for the dependencies. For example, currently it's not possible to pip install astropy 4.1 on newer python version because of incompatibility with Jinja2 version in its build dependencies. If we don't have a reason for staying on 4.1, we should test moving up, at least to 4.3, and later to 5.0. Astropy versions are technically LTS supported for 2 years.

However, we should also lock the dependencies, make only 1 version of python a required CI check (because this will be a pipeline/app that is locked and using that version). We can over time bump that python version (we don't have to lock the python version though). We can continue testing against py3.9 and py3.10, in preparation for moving to them, but don't need to require it to be working.

  • astropy v4.3x? 5.0x? Upgrade to 4.2x-4.3x would mean that older version ECSV tables cannot be read. 5.0x has this solved, perhaps this can be back ported or we can upgrade to 5.0, not sure if breaking anything or changing results, (especially for phot utils). Needs testing for 5.0 due to modeling changes.
  • numpy (latest?): 1.22+ some issues with building on certain Macs, maybe need to keep to >1.19.1 <1.22
  • matplotlib (latest?), staying at 3.3 for now but need to move.
  • scikit-image (newer supported version?) => 0.14.2
  • scipy (latest?): 1.6.0+, latest seems fine.
  • Photutils (V1.3 ideally, but start with V1.2):
    Looks like Photutils can be moved to V1.2 with zero changes, or to V1.3 by making sure to catch the possible ValueErrors for misconfigured initial guesses. This is fine on astropy 4.0x+ but needs a bump to spicy 1.6.0, scikit-learn 0.19, scikit-image 0.14.2, gwcs 0.12.
  • Bottleneck (latest should be fine).

Add master log/data-monitoring script

There needs to be a master log script for tracking ongoing errors and creating bug reports from them (or marking certain FID's as bad images, asking users for V2 if possible or ignoring these in future runs).

The log should also keep track of the photometry/mag/err per FID per minor release.

Obviously this information is in the database so would be better to query it.

@rhandberg Can the script that populates the Status page on the webpage be used for this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.