Giter Club home page Giter Club logo

huntsman-drp's People

Contributors

danjampro avatar fergusl avatar lspitler avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

huntsman-drp's Issues

Remove duplicates from RawExposureTable

It appears there are some duplicates in the raw exposure table causing problems during calexp production:

e.g.:

2021-04-29 01:46:42.144 | ERROR    | huntsman.drp.services.base:_wrap_process_func:58 - Exception while processing {'dateObs': '2020-09-15', 'filename': '/data/nifi/huntsman_priv/images/fields/Frb200914/1919420013090900/20200915T120511/20200915T121504.fits.fz', 'visit': 20200915121605502, 'field': 'FRB200914', 'dataType': 'science', 'taiObs': '2020-09-15T12:16:05.502(UTC)', 'ccd': 8, 'expId': 'PAN000_1919420013090900_20200915T121504', 'ccdTemp': -0.5, 'expTime': 60.0, 'filter': 'g_band'}

RuntimeError('Multiple matches found for document in <huntsman.drp.collection.RawExposureCollection object at 0x7f80ba8b7b50>: {}.')

These need to be removed from the table and we need to figure out how they were inserted in the first place.

second stage screening metrics

The DRP will produce more metrics that we'll use for another screening phase (following this one #68 ) to identify only science-ready data for final processing.

These will be computed for each file if the metrics are not already in the database.

Some example metrics and plots are available here: https://dmtn-008.lsst.io/

Investigate SQLite threading error

The error below occurs frequently but appears harmless:

Exception ignored in: <function SqlRegistry.__del__ at 0x7f383996db90>
Traceback (most recent call last):
  File "/opt/lsst/software/stack/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_persistence/21.0.0+48431fa087/python/lsst/daf/persistence/registries.py", line 317, in __del__
    self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 139878645864192 and this is thread id 139880111126336.

Create .fz health check scripts

  • New script to drop duplicate .fits/.fits.fz files in mongodb raw data table.
  • Keep .fits files.
  • Script to insert flag into raw data table when it has been certified by the script.
  • RawDataTable to automatically remove query results that haven't been certified.
  • Add janitor script as docker service to main docker-compose file.

Need script to regenerate metadata table

Currently have some old metadata entries that are inconsistent with the current metadata parsing procedures or just erroneous (flats being recorded as datatype science etc). Will also need this functionality to repopulate the metadata table when/if changes are made down the line

setup a test suite for LSB pipelines

An environment or a standard set of data, plus monitoring of the data that allows everyone to understand each subsystem (e.g. PSF wings characterisation) as well as them combined together.

Implement version control

  • Add update date for each metric in quality control database
  • Nested dictionary with "date" and "value" keys
  • Regular backup of quality data table

Simplify quality metric implementation

This is too complicated:

quality:
  raw:
    science:
      get_wcs:
        has_wcs: true
    flat:
      quality:
        rawexp:
          flipped_asymmetry:
            flip_asymm_h:
              less_than: 100
            flip_asymm_v:
              less_than: 60
          clipped_stats:
            well_fullfrac:
              greater_than: 0.13
              less_than: 0.19

We should remove the "rawexp" subheading and get rid of function names (e.g. clipped_stats), just keeping metric names.

Store logs

We need to store logs from huntsman-drp and the LSST stack.

sky subtraction

  • 1st pass: initial plane fit, subtraction, 2nd pass: aggressive masking

consider means for dwell stacks, medians for dither stacks

As @AnthonyHorton mentioned, due to changing ghost patterns across dithers, it might be better to reject subtle ghosts via a median combine. A mean might still make sense at a dither position (even if we randomise it a little), since it maximises S/N of the source (and ghost, so we can then reject it via median).

Screening docker service

Need to create a separate screening service that will process raw files and populate a quality table with some simple metrics/quality flags including:
-wcs
-assymetry score/vignetting flag
-median/mean/std etc
-corrupt files/asymmetry
-out of focus/bad focus
-camera shutter failures
-truncated readout
-fpack failures

the service will require following:
-astrometry.net
-huntsman drp

related to #65 #48 #56

Make TAP refcat service

It seems the TAP refcat queries are not thread-safe. If would be good to have a refcat docker server service that uses a lock to make this thread safe.

First it would be good to verify that this is not actually thread safe...

Let mongo handle date queries

Currently, dates are ingested into the mongodb in a format unrecognised by pymongo / mongo. This means we have to implement date queries ourselves. This can (and should) be avoided by parsing the date correctly from the FITS header during metadata extraction by NiFi.

Speed up tests

Tests are slow. We should think about how to make them faster without reducing their efficacy.

Data screening procedure

need a process to screen data for common data failures (vignettting/truncated readouts etc). Should complete a report on each file and add to a data quality table in the metadata database.
Update this issue with any data quality issues that should be screened for.

  • out of focus/bad focus
  • zeropoint monitoring
  • dome vignetting
  • camera shutter failures
  • truncated readout
  • fpack failures

Add archive success flag

Currently, new files that raise errors during archiving do not get added to the raw file data table - they may even get lost forever depending on how NiFi handles the situation. We should catch and log all errors during this step, and add a flag to the raw data table indicating whether archiving was successful.

Make sure database metric values are up to date

Currently the screener service will calculate metrics for any file that fails the screen_success() check. When we add a new metric, the screen_success() will also be updated to check the new metric, if an entry does not have the new metric it will fail the screen success check and have a new set of metrics calculated.

If a metric function gets updated, there needs to be a way to determine that the entries in the database contain old metric values and need to be recalculated. One way of doing this is to include the date of screening with each metric value, which will then be used somehow by screen_success to determine if the metric was calculated with the most recent version of the metric... so somehow need to record when metric functions were last updated?

Screener logs not working

For some reason the screener logs are not working - see screenshot. Logs are not written to file either.
Screenshot 2021-04-01 at 15 33 33

What to do with dud files?

Lots of files are not readable or fail during metric calculation - need to figure out what to do with these files

Figure out the right way to query calib metadata using butler

Implement an improved way of getting calib metadata in huntsman.drp.butler.query_calib_metadata that actually uses the butler, rather than directly querying the calib repository if this is even possible?. Maybe @fergusL knows?

Update: I asked the stack club on slack, but no responses.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.