astrohuntsman / huntsman-drp Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 2.0 38.81 MB

Imaging pipeline tools and flow.

License: MIT License

Python 94.83% Shell 2.16% Dockerfile 3.01%

huntsman-drp's People

Contributors

Watchers

Forkers

danjampro lspitler

huntsman-drp's Issues

Remove redundant exception check in FileIngestor

This is not necessary:

huntsman-drp/src/huntsman/drp/ingestor.py

Line 298 in 4fc544b

if isinstance(result, queue.Empty):

Do we want to match with calibs outside valid range if they are the only ones available?

Remove duplicates from RawExposureTable

It appears there are some duplicates in the raw exposure table causing problems during calexp production:

e.g.:

2021-04-29 01:46:42.144 | ERROR    | huntsman.drp.services.base:_wrap_process_func:58 - Exception while processing {'dateObs': '2020-09-15', 'filename': '/data/nifi/huntsman_priv/images/fields/Frb200914/1919420013090900/20200915T120511/20200915T121504.fits.fz', 'visit': 20200915121605502, 'field': 'FRB200914', 'dataType': 'science', 'taiObs': '2020-09-15T12:16:05.502(UTC)', 'ccd': 8, 'expId': 'PAN000_1919420013090900_20200915T121504', 'ccdTemp': -0.5, 'expTime': 60.0, 'filter': 'g_band'}

RuntimeError('Multiple matches found for document in <huntsman.drp.collection.RawExposureCollection object at 0x7f80ba8b7b50>: {}.')

These need to be removed from the table and we need to figure out how they were inserted in the first place.

second stage screening metrics

The DRP will produce more metrics that we'll use for another screening phase (following this one #68 ) to identify only science-ready data for final processing.

These will be computed for each file if the metrics are not already in the database.

Some example metrics and plots are available here: https://dmtn-008.lsst.io/

Investigate SQLite threading error

The error below occurs frequently but appears harmless:

Exception ignored in: <function SqlRegistry.__del__ at 0x7f383996db90>
Traceback (most recent call last):
  File "/opt/lsst/software/stack/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_persistence/21.0.0+48431fa087/python/lsst/daf/persistence/registries.py", line 317, in __del__
    self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 139878645864192 and this is thread id 139880111126336.

Create .fz health check scripts

New script to drop duplicate .fits/.fits.fz files in mongodb raw data table.
Keep .fits files.
Script to insert flag into raw data table when it has been certified by the script.
RawDataTable to automatically remove query results that haven't been certified.
Add janitor script as docker service to main docker-compose file.

Need script to regenerate metadata table

Currently have some old metadata entries that are inconsistent with the current metadata parsing procedures or just erroneous (flats being recorded as datatype science etc). Will also need this functionality to repopulate the metadata table when/if changes are made down the line

PSF determination taking a long time

The LSST stack seems to be taking ages (>10 min) per file. Perhaps we need to limit the number sources used to make the PSF models.

reflection subtraction

See background info here: https://aao-org.slack.com/archives/GKQK99GP8/p1562821053008300 from @AnthonyHorton

setup a test suite for LSB pipelines

An environment or a standard set of data, plus monitoring of the data that allows everyone to understand each subsystem (e.g. PSF wings characterisation) as well as them combined together.

bring over metah FF code

Move python requirements to requirements.txt

We should move our python requirements, which are currently hard-coded in the dockerfiles, into a requirements.txt.

Implement version control

Add update date for each metric in quality control database
Nested dictionary with "date" and "value" keys
Regular backup of quality data table

Simplify quality metric implementation

This is too complicated:

quality:
  raw:
    science:
      get_wcs:
        has_wcs: true
    flat:
      quality:
        rawexp:
          flipped_asymmetry:
            flip_asymm_h:
              less_than: 100
            flip_asymm_v:
              less_than: 60
          clipped_stats:
            well_fullfrac:
              greater_than: 0.13
              less_than: 0.19

We should remove the "rawexp" subheading and get rid of function names (e.g. clipped_stats), just keeping metric names.

Make code style guide

Store logs

We need to store logs from huntsman-drp and the LSST stack.

get initial zeropoints and sky backgrounds

And then:

validate/update gunagala
determine flat field number and exposure times
determine optimal science exposure times

Use multiprocessing in calexp-monitor

sky subtraction

1st pass: initial plane fit, subtraction, 2nd pass: aggressive masking

consider means for dwell stacks, medians for dither stacks

As @AnthonyHorton mentioned, due to changing ghost patterns across dithers, it might be better to reject subtle ghosts via a median combine. A mean might still make sense at a dither position (even if we randomise it a little), since it maximises S/N of the source (and ghost, so we can then reject it via median).

No logging in metadata script

Currently no logs are produced during metadata acquisition. This makes it difficult to know how the script is doing.

Code:
https://github.com/AstroHuntsman/huntsman-drp/blob/develop/scripts/quality/get_raw_quality.py

Screening docker service

Need to create a separate screening service that will process raw files and populate a quality table with some simple metrics/quality flags including:
-wcs
-assymetry score/vignetting flag
-median/mean/std etc
-corrupt files/asymmetry
-out of focus/bad focus
-camera shutter failures
-truncated readout
-fpack failures

the service will require following:
-astrometry.net
-huntsman drp

related to #65 #48 #56

Make TAP refcat service

It seems the TAP refcat queries are not thread-safe. If would be good to have a refcat docker server service that uses a lock to make this thread safe.

First it would be good to verify that this is not actually thread safe...

Add test to cover calib maker when there are already calibs available

Let mongo handle date queries

Currently, dates are ingested into the mongodb in a format unrecognised by pymongo / mongo. This means we have to implement date queries ourselves. This can (and should) be avoided by parsing the date correctly from the FITS header during metadata extraction by NiFi.

deal with clouds

do science: https://ui.adsabs.harvard.edu/abs/2013JGRD..118.5679D/abstract

Speed up tests

Tests are slow. We should think about how to make them faster without reducing their efficacy.

bring other things over from metah

metah should have minimal tools, DRP should have the tools

Use pandas to implement more advanced queries

At the moment we have to query for specific values but in the future we might want to query by a range of exposure times or even something like airmass/zeropoint etc

Data screening procedure

need a process to screen data for common data failures (vignettting/truncated readouts etc). Should complete a report on each file and add to a data quality table in the metadata database.
Update this issue with any data quality issues that should be screened for.

out of focus/bad focus
zeropoint monitoring
dome vignetting
camera shutter failures
truncated readout
fpack failures

look at other DRPs?

ESO pipelin
LSST?
others?

Investigate master calib maker silent death

Master Calib management system

idea so far

external (to the docker container) master calibs folder and a metadatabase table for querying files

setup new repo using LSST framework

Master calib maker remaking existing calibs unnecessarily

Add custom exceptions

MissingCalibError, MissingBiasError etc

Calibs take a long time

Figure out why LSST calibs take so long to make

Add archive success flag

Currently, new files that raise errors during archiving do not get added to the raw file data table - they may even get lost forever depending on how NiFi handles the situation. We should catch and log all errors during this step, and add a flag to the raw data table indicating whether archiving was successful.

Make sure database metric values are up to date

Currently the screener service will calculate metrics for any file that fails the screen_success() check. When we add a new metric, the screen_success() will also be updated to check the new metric, if an entry does not have the new metric it will fail the screen success check and have a new set of metrics calculated.

If a metric function gets updated, there needs to be a way to determine that the entries in the database contain old metric values and need to be recalculated. One way of doing this is to include the date of screening with each metric value, which will then be used somehow by screen_success to determine if the metric was calculated with the most recent version of the metric... so somehow need to record when metric functions were last updated?

Screener logs not working

For some reason the screener logs are not working - see screenshot. Logs are not written to file either.

What to do with dud files?

Lots of files are not readable or fail during metric calculation - need to figure out what to do with these files

Figure out the right way to query calib metadata using butler

Implement an improved way of getting calib metadata in huntsman.drp.butler.query_calib_metadata that actually uses the butler, rather than directly querying the calib repository if this is even possible?. Maybe @fergusL knows?

Update: I asked the stack club on slack, but no responses.

remove/model unresolved sources

See https://github.com/AstroJacobLi/mrf

write unit tests

Separate RawExposureTable from CalexpTable

It will be easier to code and maintain the database if they are kept in different tables.

zeropoint solving subsystem

use gaia mags + SkyMapper colors? Or https://www.aavso.org/apass

Use niceness level of 10
Use 40 (of 80) cores maximum

astrohuntsman / huntsman-drp Goto Github PK

huntsman-drp's People

Contributors

Watchers

Forkers

huntsman-drp's Issues

Recommend Projects

Recommend Topics

Recommend Org