astrohuntsman / huntsman-drp Goto Github PK
View Code? Open in Web Editor NEWImaging pipeline tools and flow.
License: MIT License
Imaging pipeline tools and flow.
License: MIT License
This is not necessary:
huntsman-drp/src/huntsman/drp/ingestor.py
Line 298 in 4fc544b
It appears there are some duplicates in the raw exposure table causing problems during calexp production:
e.g.:
2021-04-29 01:46:42.144 | ERROR | huntsman.drp.services.base:_wrap_process_func:58 - Exception while processing {'dateObs': '2020-09-15', 'filename': '/data/nifi/huntsman_priv/images/fields/Frb200914/1919420013090900/20200915T120511/20200915T121504.fits.fz', 'visit': 20200915121605502, 'field': 'FRB200914', 'dataType': 'science', 'taiObs': '2020-09-15T12:16:05.502(UTC)', 'ccd': 8, 'expId': 'PAN000_1919420013090900_20200915T121504', 'ccdTemp': -0.5, 'expTime': 60.0, 'filter': 'g_band'}
RuntimeError('Multiple matches found for document in <huntsman.drp.collection.RawExposureCollection object at 0x7f80ba8b7b50>: {}.')
These need to be removed from the table and we need to figure out how they were inserted in the first place.
The DRP will produce more metrics that we'll use for another screening phase (following this one #68 ) to identify only science-ready data for final processing.
These will be computed for each file if the metrics are not already in the database.
Some example metrics and plots are available here: https://dmtn-008.lsst.io/
The error below occurs frequently but appears harmless:
Exception ignored in: <function SqlRegistry.__del__ at 0x7f383996db90>
Traceback (most recent call last):
File "/opt/lsst/software/stack/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_persistence/21.0.0+48431fa087/python/lsst/daf/persistence/registries.py", line 317, in __del__
self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 139878645864192 and this is thread id 139880111126336.
.fits/.fits.fz
files in mongodb raw data table.RawDataTable
to automatically remove query results that haven't been certified.docker-compose
file.Currently have some old metadata entries that are inconsistent with the current metadata parsing procedures or just erroneous (flats being recorded as datatype science etc). Will also need this functionality to repopulate the metadata table when/if changes are made down the line
The LSST stack seems to be taking ages (>10 min) per file. Perhaps we need to limit the number sources used to make the PSF models.
An environment or a standard set of data, plus monitoring of the data that allows everyone to understand each subsystem (e.g. PSF wings characterisation) as well as them combined together.
We should move our python requirements, which are currently hard-coded in the dockerfiles, into a requirements.txt.
This is too complicated:
quality:
raw:
science:
get_wcs:
has_wcs: true
flat:
quality:
rawexp:
flipped_asymmetry:
flip_asymm_h:
less_than: 100
flip_asymm_v:
less_than: 60
clipped_stats:
well_fullfrac:
greater_than: 0.13
less_than: 0.19
We should remove the "rawexp" subheading and get rid of function names (e.g. clipped_stats), just keeping metric names.
We need to store logs from huntsman-drp
and the LSST stack.
And then:
As @AnthonyHorton mentioned, due to changing ghost patterns across dithers, it might be better to reject subtle ghosts via a median combine. A mean might still make sense at a dither position (even if we randomise it a little), since it maximises S/N of the source (and ghost, so we can then reject it via median).
Currently no logs are produced during metadata acquisition. This makes it difficult to know how the script is doing.
Code:
https://github.com/AstroHuntsman/huntsman-drp/blob/develop/scripts/quality/get_raw_quality.py
Need to create a separate screening service that will process raw files and populate a quality table with some simple metrics/quality flags including:
-wcs
-assymetry score/vignetting flag
-median/mean/std etc
-corrupt files/asymmetry
-out of focus/bad focus
-camera shutter failures
-truncated readout
-fpack failures
the service will require following:
-astrometry.net
-huntsman drp
It seems the TAP refcat queries are not thread-safe. If would be good to have a refcat docker server service that uses a lock to make this thread safe.
First it would be good to verify that this is not actually thread safe...
Currently, dates are ingested into the mongodb in a format unrecognised by pymongo / mongo. This means we have to implement date queries ourselves. This can (and should) be avoided by parsing the date correctly from the FITS header during metadata extraction by NiFi.
Tests are slow. We should think about how to make them faster without reducing their efficacy.
metah should have minimal tools, DRP should have the tools
At the moment we have to query for specific values but in the future we might want to query by a range of exposure times or even something like airmass/zeropoint etc
need a process to screen data for common data failures (vignettting/truncated readouts etc). Should complete a report on each file and add to a data quality table in the metadata database.
Update this issue with any data quality issues that should be screened for.
idea so far
MissingCalibError
, MissingBiasError
etc
Figure out why LSST calibs take so long to make
Currently, new files that raise errors during archiving do not get added to the raw file data table - they may even get lost forever depending on how NiFi handles the situation. We should catch and log all errors during this step, and add a flag to the raw data table indicating whether archiving was successful.
Currently the screener service will calculate metrics for any file that fails the screen_success()
check. When we add a new metric, the screen_success()
will also be updated to check the new metric, if an entry does not have the new metric it will fail the screen success check and have a new set of metrics calculated.
If a metric function gets updated, there needs to be a way to determine that the entries in the database contain old metric values and need to be recalculated. One way of doing this is to include the date of screening with each metric value, which will then be used somehow by screen_success
to determine if the metric was calculated with the most recent version of the metric... so somehow need to record when metric functions were last updated?
Lots of files are not readable or fail during metric calculation - need to figure out what to do with these files
Implement an improved way of getting calib metadata in huntsman.drp.butler.query_calib_metadata
that actually uses the butler, rather than directly querying the calib repository if this is even possible?. Maybe @fergusL knows?
Update: I asked the stack club on slack, but no responses.
It will be easier to code and maintain the database if they are kept in different tables.
These are only as accurate as our pointing (so not accurate at all) and should be based on the WCS fit in the FITS header instead.
This will also fix some errors during calexp creation where these keys are missing from the FITS header.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.