Giter Club home page Giter Club logo

oin-meta-generator's Introduction

Metadata Generator for OIN

The Open Imagery Network standardizes on a single format for imagery. This small CLI tool generates a JSON string of the standard metadata for a given OIN geo image.

Dependencies

You must have GDAL installed.

Installation

  • $ npm install

Usage

Usage: oin-meta-generator [args] <file>

Options:
  -u, --uuid                 Source UUID
  -t, --title                Source title
  -a, --acquisition-start    Acquisition start date
  -A, --acquisition-end      Acquisition end date
  -p, --provider             Provider / owner
  -P, --platform             Imagery platform (satellite, aircraft, UAV, etc.)
  -c, --contact              Data provider contact info
  -U, --uploaded-at          Date uploaded
  -m, --additional-metadata  Additional metadata (sensor=WV3, etc.)
  --help, -h                 Show help                                 [boolean]
  --version, -V              Show version number                       [boolean]

Sample:

$ oin-meta-generator \
  -u "http://oam-uploader.s3.amazonaws.com/uploads/2015-08-18/55d3b052f885a1bb0221434b/scene/0/scene-0-image-0-NE1_50M_SR.tif" \
   -t "Natural Earth Image" \
   -a "2015-04-01T00:00:00.000Z" \
   -A "2015-04-30T00:00:00.000Z" \
   --platform "satellite" \
   --provider "Natural Earth" \
   -c "Ziggy,[email protected]" \
   -m "sensor=Some Algorithm" \
   -m "thumbnail=http://oam-uploader.s3.amazonaws.com/uploads/2015-08-18/55d3b052f885a1bb0221434b/scene/0/scene-0-image-0-NE1_50M_SR.tif.thumb.png" \
   -m "license=CC-BY 4.0" \
   -m "tags=tropical, paradise" \
    NE1_50M_SR.tif | jq .
{
  "uuid": "http://oam-uploader.s3.amazonaws.com/uploads/2015-08-18/55d3b052f885a1bb0221434b/scene/0/scene-0-image-0-NE1_50M_SR.tif",
  "title": "Natural Earth Image",
  "platform": "satellite",
  "provider": "Natural Earth",
  "contact": "Ziggy,[email protected]",
  "properties": {
    "sensor": "Some Algorithm",
    "thumbnail": "http://oam-uploader.s3.amazonaws.com/uploads/2015-08-18/55d3b052f885a1bb0221434b/scene/0/scene-0-image-0-NE1_50M_SR.tif.thumb.png",
    "license": "CC-BY 4.0",
    "tags": "tropical, paradise"
  },
  "acquisition_start": "2015-04-01T00:00:00.000Z",
  "acquisition_end": "2015-04-30T00:00:00.000Z",
  "file_size": 1149210,
  "projection": "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433],AUTHORITY[\"EPSG\",\"4326\"]]",
  "gsd": 0.03333333333333333,
  "bbox": [
    128.99999999999997,
    29.000000000000004,
    146,
    54
  ],
  "footprint": "POLYGON((128.99999999999997 54,146 54,146 29.000000000000004,128.99999999999997 29.000000000000004,128.99999999999997 54))"
}

Testing

Run npm test

oin-meta-generator's People

Contributors

jflasher avatar mojodna avatar smit1678 avatar olafveerman avatar tombh avatar lossyrob avatar

Stargazers

Erdong avatar  avatar Jérémy Garniaux avatar M Haidar Hanif avatar Tod Robbins avatar

Watchers

 avatar Kate Chapman avatar  avatar Chris Holmes avatar  avatar James Cloos avatar Alireza avatar  avatar Cristiano Giovando avatar

oin-meta-generator's Issues

Publish to npm

osm-dynamic-tiler depends on this and currently uses a GitHub reference. It would be great to refer to a version and pull from npm instead.

Metadata generation doesn't respect nested S3 folders

The generated metadata files in meta drop any folder prefix information they may have had on S3. This is a problem when multiple folders contain imagery with the same filenames (since the last one to be written wins).

Seems to be some magic wall at 548

I've been running this a bunch of times now and seem to be running into a magic wall at 548 files. As soon as it tries to process that many, I get the error below. Could the s3 module have some limit internally causing this?

/Users/flasher/workspace/oin-meta-generator/node_modules/s3/node_modules/aws-sdk/lib/request.js:32
          throw err;
                ^
TypeError: Cannot read property 'addListener' of undefined
    at Object.exports.execFile (child_process.js:781:15)
    at Object.exports.exec (child_process.js:642:18)
    at Object.module.exports.remote (/Users/flasher/workspace/oin-meta-generator/node_modules/gdalinfo-json/index.js:86:17)
    at generateMeta (/Users/flasher/workspace/oin-meta-generator/index.js:112:12)
    at iterator (/Users/flasher/workspace/oin-meta-generator/index.js:65:7)
    at iterator (/Users/flasher/workspace/oin-meta-generator/index.js:74:5)
    at iterator (/Users/flasher/workspace/oin-meta-generator/index.js:74:5)
    at iterator (/Users/flasher/workspace/oin-meta-generator/index.js:74:5)
    at iterator (/Users/flasher/workspace/oin-meta-generator/index.js:74:5)
    at iterator (/Users/flasher/workspace/oin-meta-generator/index.js:74:5)

AWS_SECRET_KEY_ID vs. AWS_ACCESS_KEY_ID

AWS_ACCESS_KEY_ID appears to be the defacto environment variable used for this purpose (and may already be set in some peoples' environments). AWS_SECRET_KEY_ID is used here instead.

bbox and footprint in meters

An error popped up after processing new imagery on production. Metadata is generated fine (example), but when the catalog indexes the new image, mongodb throws an error:

message: 'exception: Can\'t extract geo keys: { _id: ObjectId(\'58fffebeb0eae7f3b143c1a8\'), uuid: "http://oin-hotosm.s3.amazonaws.com/58fff92190a38300103c37c8/0/0cb3651f-d796-4e6a-ab58-ecaf12569d0b.tif", geojson: { bbox: [ 318567.4056300001, 7842361.890440002, 318843.0565999987, 7842607.837750001 ], coordinates: [ [ [ 318567.40563, 7842607.83775 ], [ 318843.056599999, 7842607.83775 ], [ 318843.056599999, 7842361.89044 ], [ 318567.40563, 7842361.89044 ], [ 318567.40563, 7842607.83775 ] ] ], type: "Polygon" }, meta_uri: "http://oin-hotosm.s3.amazonaws.com/58fff92190a38300103c37c8/0/0cb3651f-d796-4e6a-ab58-ecaf12569d0b_meta.json", footprint: "POLYGON ((318567.40563 7842607.83775,318843.056599999 7842607.83775,318843.056599999 7842361.89044,318567.40563 7842361.89044,318567.40563 7842607.83775))", bbox: [ 318567.4056300001, 7842361.890440002, 318843.0565999987, 7842607.837750001 ], gsd: 0.02512999999987835, projection: "PROJCS["WGS 84 / UTM zone 59S",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","63...", file_size: 24653540, uploaded_at: new Date(1493170469325), acquisition_end: new Date(1428188820000), acquisition_start: new Date(1428185220000), properties: { license: "CC-BY 4.0", sensor: "Unknown", wmts: "http://tiles.openaerialmap.org/58fff92190a38300103c37c8/0/0cb3651f-d796-4e6a-ab58-ecaf12569d0b/wmts", tms: "http://tiles.openaerialmap.org/58fff92190a38300103c37c8/0/0cb3651f-d796-4e6a-ab58-ecaf12569d0b/{z}/{x}/{y}.png", thumbnail: "http://oin-hotosm.s3.amazonaws.com/58fff92190a38300103c37c8/0/0cb3651f-d796-4e6a-ab58-ecaf12569d0b_thumb.png" }, contact: "Keiko Saito,[email protected]", provider: "Government of Vanuatu", platform: "uav", title: "Tanna Letobom Ortho 05042015_1007", __v: 0 } 
 longitude/latitude is out of bounds, lng: 318567 lat: 7.84261e+06',

It seems that for these images the bbox and footprint units are in meters, when mongo expects them to be in geographic coordinates, latitude and longitude.

cc @tombh

Overly aggressive dependency on AWS credentials in one's environment

The AWS SDK supports multiple credential sources and transparently uses them. This means that if I have $HOME/.aws/credentials or $HOME/.aws-credentials-master configured, it will pull values from them if none are available in the environment.

This check should be less aggressive: https://github.com/openimagerynetwork/oin-meta-generator/blob/master/index.js#L20-L26

(I use require-env for this purpose, and assign values to constants with env.require("WHATEVS") early in to effect the same crash-first behavior.)

Standalone app or part of upload form?

Wondering if this is meant as a standalone app or rather as part of an upload Web form?

From what I understand in the scenario described, image data is already existing in some bucket, but is missing metadata json files. This tools would allow extracting basic gdalinfo metadata plus letting the user add any OIN specific values.

Footprint format

Up till now we've been using a simplified expression of footprint that provides little more than that already offered by bbox. @mojodna has already implemented a more comprehensive format in oam-dynamic-tiler's tiler-prep stage. As he describes, it is;

non-rectangular and shows the image's footprint after it's been masked out, with nodata and otherwise [...] this samples at 1% (?) to keep the GeoJSON size down, since the end result is stair-stepped no matter what (because it's polygonizing pixels)

This is clearly a better solution. However, we need to formalise both the necessity and optimal format. Therefore:

  1. What purpose does the footprint serve? Does it serve any purposes beyond oam-browser's use? Can the purposes be served by the bounding box or a centroid? Or perhaps having a detailed footprint means that a bounding box field is superfluous?
  2. Currently the footprint payload is non-trivially large, for example, for a 325MB image, this footprint is 24kb. Of course, this can be simplified by down-sampling, simplifying polygons and using topojson. However, we need to define certain expectations. Can a single OIN meta data payload function reasonably as intended if it is, say, over 5kb? If not then can the footprint be useful with a limit of, say 3kb? If not then should the footprint be kept separately?

Follows on from discussion in #26.

Configurable prefix

In #10, I started to explain the layout of my terrain buckets. In addition to the 4326/ prefix, I also have a 3857/ prefix that contains derived data (for tiling purposes), both reprojected and resampled per zoom. Since that's not intended directly for publishing purposes, I'd like to limit the prefix that the tool uses to list data to be processed.

Expected bucket layout?

I have a bucket containing a bunch of zipped TIFFs (DEMs, in this case), all contained within a source/ prefix. For testing purposes, one of them is uncompressed: 4326/srtm_01_02.tiff

When configuring oin-meta-generator to point at the bucket in question (s3://cgiar-csi-srtm.openterrain.org), it logs the following messages but doesn't do anything further (I'm using the default config.json with the assumption that I'll fill in correct values later):

$ npm start

> [email protected] start /Users/seth/src/openimagerynetwork/oin-meta-generator
> node index.js

Successfully connected to S3 bucket, now retrieving data.

What should I expect to see / have produced for me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.