Giter Club home page Giter Club logo

kbmod's Issues

Investigate Access of Pixel Data

There are a couple of options to accessing the actual pixel data:

  • Leave the images on disk and access them outside the database
  • Ingest the images as BLOBS
  • Ingest each pixel as a row in the database

The latter issue might be problematic as there are 10^12 pixels and ingest (and index building and clustering) could take a substantial amount of time (CPU years). A parallel database might help solve this.

I will start this investigation by looking at how to turn the ra,decl of a Trajectory/Image intersection into a pixel address within the image. This can happen

  • In the database using our UDF
  • Outside the database using the WCS transformation that comes with the .fits file

PostgreSQL implementation of the UDF for radec, image -> pixel

If we use RDS:

  • RDS does not let you use compiled code.
  • rewrite the UDF in PL/SQL.

Actually, we probably want to kill the RDS instance and instead set up a self-managed PostgreSQL on EC2. If we use PostgreSQL on an EC2 VM instead, we will need to:

  1. Move from RDS to EC2
    • Kill the RDS instance
    • Select an EC2 instance type that will work with PostgreSQL
    • Set up the VM and the database, including PostGIS.
  2. Get a compiling version of the UDF. @bholt looked at the code and thinks we may be able to extract the core bits from WCSLIB. Then get this version working with PostgreSQL.

Reassess Database Structure

We'd like to re-think the structure of the database tables.

In this version (v2 of the tables) we have an ImageSet table that corresponds to a set of images that are spatially and temporally connected. For SDSS this will be a (run, camcol, filter) strip. For e.g. LSST this would be a focal plane exposure. We'll build a coarse bounding box of its total spatial extent, and a temporal window range corresponding to same. This will be used in a coarse initial pruning step (what ImageSets does a given trajectory intersect with on a given night)

We'll have a second Image table which corresponds to the position of the image within the ImageSet. For SDSS this will be field, for LSST it will be a CCD number. This will also have a bbox and trange per Image, for finding a specific intersection of a sky position with a pixel of silicon.

Its an open question if we ingest the pixels into their own Pixel table, or leave them on disk as FITS files. If the latter, we will have to ensure efficient access to them. I.e. knowing which bits of a file correspond to the value of a pixel. If the former, we have an issue of ingest. Loading 1e12 pixels into a database, at a rate of 2e4 per second, will take ~600 days. Building a covering index would take about the same. We could parallelize pixel tables on E.g. ImageSet or Image, but we'll then need N filesystems for parallel ingest, where N is the degree to which we fragment the currently-monolithic v1 Pixel table. And then we still have the issue of merging these tables together.

Speeding up UDF query

Brandon and I have come up with a couple of ways to speed up the UDF query. This issue will capture that work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.