Giter Club home page Giter Club logo

mirar's People

Contributors

broulston avatar dependabot[bot] avatar github-actions[bot] avatar jamiesoon avatar robertdstein avatar saarahhall avatar sukishore12 avatar virajkaram avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mirar's Issues

Missing requirements

Extra python requirements were introduced in #39 , but these are not yet reflected in requirements.txt/setup.py.

At a minimum, confluent_kafka is now required, I didn't investigate further yet.

Image not being closed properly in zogy

/Users/robertstein/Code/winterdrp/winterdrp/processors/zogy/py_zogy.py:53: ResourceWarning: unclosed file <_io.FileIO name='/Users/robertstein/Data/summer/20220815/subtract/SUMMER_20220816_042349_Camera0.resamp.resamp.fits.scaled' mode='rb' closefd=True>
  N = fits.open(Nf)[0].data

Nan/zero in swarp

To quote @virajkaram:

"Swarp sets sets masked pixels to zero when it resamples, but the other processors only masks nans. This affects the subtractions"

Right now we mask zeros when loading raw images. Maybe we should try making this self-contained/do such things in the swarp processor.

Merge imsub back into wirc

The BasePipeline is already set up to enable different running modes which can be selected via command line. So everything in imsub should eventually be merged back into the wirc directory/pipeline.

Check quality of pipeline

Right now there is a unit test, so we check the code is consistent up to photometric calibration. We do not yet check where the reduction is consistently good or consistently bad.

Systematic Error Handling

Following standard python practise:

All errors should be raised and then handled, rather than relying on passing around processing status numbers etc.

We need to systematically raise errors, handle them, and then be able to summarise them. In production you want want a nightly email summary tracking which images were/were not successfully processed.

So far this has been partially addressed by #47 and #48, but there are still missing pieces.

Ideally, all errors would be raised by the code itself. In any case, we should track which errors were not raised by the code, so we can prioritise fixing them, leaving errors related to e.g image issues which are understood and unavoidable.

Reorganise Database Processor

We want to reorganise the Database Processors into:

  • DBHandler
  • BaseDBImporter
  • BaseDBExporter
  • ImageDBImporter
  • ImageDBExporter
  • DataframeDBImporter
  • DataframeDBExporter

That'll be needed for many downstream functionality including Reference Image generation and Candidate naming.

Should be done on db branch.

Test database

Much of the database creation is not tested. Let's change that!

Processing candidates to Fritz

If Fritz is down, we run into the issue that candidate cannot be processed/annotations cannot be updated. To ensure this doesn't cause the pipeline to break:

  • Once candidates table is in the database, have a field is_fritz_processed. Query everything that hasn’t been submitted to fritz and feed them to SendToFritz processor
  • If Fritz is down, update the is_fritz_prcessed field to False for that candidate, wait 30 seconds (Fritz goes now in 30 sec stretches), then continue
  • Successfull Fritz processing/updating, sets is_fritz_prcessed field to True
  • Query is_fritz_processed again for False entries after going through all cands

Set up unit tests for summer

We now have a framework for running unit tests using data from a private Github repo. We can now adopt the mantra "test evertthing" without worrying about making data public (though that's my strong preference where possible). Anyway, we can actually run the summer unit tests with the CI as well as WIRC, so should set that up.

Similar to #38, similarly blocked by #22.

Slack Reports

It would be nice to let the monitor send slack reports...

Integrate with Github Actions

We want to run the tests on Github Actions. However, we first need a way to get test data.

We could:

  • Get permission for publishing some limited set of test data (I vote for this in the spirit of open source)
  • Set up some download function, perhaps with secret github keys, to download the data

We also need to install sextractor etc on Github, which may or may not be possible. Otherwise, Docker? Needs investigation.

Wiki Page Development

Include broad overview of pipeline functioning (batching, etc)
Info on overview on processors
How to run and create unit tests
#19

Add magdifflim field to df

magdifflim is a field used by photometry creation (sections that need it are currently being skipped). Once added, make_photometry method in SendToFritz.py can be updated.

Update SUMMER test ZP values

The Summer data reduction pipeline currently uses 30 arcminute radius to query catalog sources for astrometry and photometric calibration. The field of view of the camera is ~15 arcmin on a side, so we really need only 7.5 arcmin radius searches, making the queries faster. This changes the zeropoints slightly (0.001), but result in CLI failing

Set up unit tests with WIRC

We got the green light for setting up unit tests with WIRC data. This would involve making a minimum number of WIRC images public, needed for testing the pipeline. We should identify which images are needed:

  • A target where we have an image to use as a reference for subtraction
  • A set of flats/darks
  • A block of dithers for a WIRC stack
  • Ideally use published data -> Select old target

Topic datetime for data processing

When sending avro packets to IPAC, currently sending the packets to topic name with utc now. This allows for testing and down the line, for reprocessing of data.

Need to review that this is the right decision for the topic naming.

Unit tests broken

The unit tests which run successfully on main do NOT run on the imsub branch. We should not merge until we understand the discrepancy.

pycache files added to repo

Looks like a bunch of random pycache files that should be untracked were added to the repo with PR #84. Is it okay to delete them?

ImageRejector

I want a processor that works like an anti ImageSelector. It should remove images if they have header keys matching particular values. I'll use it to eliminate focus images, AND to select images which have not been processed/entered into a database before.

Weird Problem with the Sextractor Module

When trying to reduce data, I am running into a weird error:

Error for processor winterdrp.processors.astromatic.sextractor.sextractor at 2022-08-30 12:10:28.784147 (local time): 
   File "/Users/robertstein/Code/winterdrp/winterdrp/processors/base_processor.py", line 127, in base_apply
    batch = self.apply(batch)
  File "/Users/robertstein/Code/winterdrp/winterdrp/processors/base_processor.py", line 199, in apply
    images, headers = self._apply_to_images(images, headers)
  File "/Users/robertstein/Code/winterdrp/winterdrp/processors/astromatic/sextractor/sextractor.py", line 179, in _apply_to_images
    header[sextractor_checkimg_keys[checkimg_type]] = checkimage_name[ind]
KeyError: 'NONE' 
  This error affected the following files: ['SUMMER_20220824_204552_Camera0.resamp.fits'] 
This error was not a known error raised by winterdrp. 

Beyond the typos, why would the code be trying to get an entry marked "None"?

Hardcoded paths need to be replaced

I think there are some paths which were hard-coded relative the winterdrp directory, rather than absolute paths. The upshot of this is that you can only run the code if you are in the winterdrp directory, rather than in any directory as expected.

The solution is to replace any relative path to one referencing the absolute path of the code directory, which can get with e.g pathlib or using os.path.abspath(__file__).

The specific error I get is:

FileNotFoundError: [Errno 2] No such file or directory: 'winterdrp/pipelines/wirc_imsub/wirc_imsub_files/schema/candidates.sql'

but I suspect there are others.

Less verbose log

After successfully running the code on a live night of ~300 images, we have logs of 4.6Mb. That feels too big to send as an email attachment every day (admittedly it was in debug). We should consider whether to reduce some of the text output (e.g those jumbo astroquery tables).

Update docs

Should include:

-Add install instructions
-Explain Unit tests
-Links to contributing
-Requirements
-Author list

Database Duplicate

We need to add different options for handling the entry of duplicated data into a database. Probably options are fail, replace or skip, with default being fail.

Create DatabaseQueryProcessor

Need a DB_query processor that queries the database table and makes a dataframe. Use before the SendToFritz processor, for candidates that need to be processed by Fritz (is_fritz_processed). #41

Name creation for candidates

Need to query database for candidate name, if not exists, sequentially assign id/name based on predefined naming schema. If exists, update and/or include in prev_cand field

Requires #16

MulitProcess Processor

I would love to have a Multiprocess Processor, which would split batches into different python processes. Basically, you could take 8 batches, make the flats first, then run each batch on N different CPU for factor N speedup.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.