Giter Club home page Giter Club logo

locationtech / geowave Goto Github PK

View Code? Open in Web Editor NEW
496.0 76.0 190.0 959.11 MB

GeoWave provides geospatial and temporal indexing on top of Accumulo, HBase, BigTable, Cassandra, Kudu, Redis, RocksDB, and DynamoDB.

License: Apache License 2.0

Shell 0.82% Java 94.90% Puppet 0.07% Scheme 0.17% FreeMarker 0.02% Gnuplot 0.47% Python 3.43% Dockerfile 0.03% ANTLR 0.08%
geowave java accumulo geospatial-data hbase geoserver cassandra dynamodb kudu redis

geowave's People

Contributors

akash-peri avatar andrewdmanning avatar ashish217 avatar binderparty avatar blastarr avatar bmendell avatar carolyntang avatar chengyanz avatar chesleytan avatar chrisbennight avatar ckras34 avatar dannyqiu avatar datasedai avatar dcy2003 avatar emacthecav avatar ewilson-radblue avatar gsoyka avatar jdgarrett avatar jprochaz avatar jwileczek avatar jwomeara avatar mawhitby avatar mcarrier7 avatar meislerj avatar rfecher avatar richard3d avatar rwgdrummer avatar scottevil avatar spohnan avatar srinivasreddyv2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geowave's Issues

Implement get feature by id index

Features should be able to be looked up directly by the feature id - i.e. a secondary index mapping feature id's -> row id's.

Probably needed for
#16

Create Travis build matrix to test multiple configurations

Specify multiple build configurations for hadoop versions, accumulo versions, geotools versions, and geoserver versions, and have travis run the different combinations.

In the readme detail this build matrix, and include status icons.

Geospatial benchmark utility

Develop a spatial (and consider temporal) benchmark utility that can be generally applicable to run against any system that stores, indexes, and retrieves spatial content. Likely this will be most generally done by keeping it separate from GeoWave, and utilizing GeoTool's data store abstraction.

Kernel Density process dies when encountering empty points

14/06/27 18:33:30 INFO mapred.JobClient: Task Id : attempt_201406170050_0010_m_000128_2, Status : FAILED
java.lang.IllegalStateException: getY called on empty Point
        at com.vividsolutions.jts.geom.Point.getY(Point.java:131)
        at mil.nga.giat.geowave.analytics.mapreduce.kde.GaussianCellMapper.incrementLevelStore(GaussianCellMapper.java:157)
        at mil.nga.giat.geowave.analytics.mapreduce.kde.GaussianCellMapper.map(GaussianCellMapper.java:144)
        at mil.nga.giat.geowave.analytics.mapreduce.kde.GaussianCellMapper.map(GaussianCellMapper.java:32)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)

additional support for gridded/raster datasets

  • create a data adapter, index, and a data store specifically to handle raster (generally gridded) datasets
    • the data store and index should be able to support a reduced-resolution pyramid and the ability to query for a particular resolution
    • the adapter should be able to support different tile sizes for persistence, but the data store should just provide query by geometric bounding box and resolution, not by tile directly

3D Geometries are stored as 2D

The geometryToBinary function in the GeometryUtils class is using the default constructor for the WKBWriter. This defaults to an output coordinate dimension of 2, so any z values will be ignored.

Zookeeper connection pool / fault handling for geoserver plugin

The current geoserver plugin dies (tomcat needs to be restarted) if the zookeeper cluster goes down and is then restarted (due to persistent zookeeper connections).

Some sort of connection pooling might also need to be enabled as there's a limit to the number of simultaneous zookeeper connections we want (due to internal collection synchronization).

Ensure uzaygezen library usage respects end of range as exclusive

Currently, GeoWave's usage of the uzaygezen library expects the resultant beginning and end of range to be inclusive, but after further inspection the end of the range is exclusive. This will result in one extra row ID potentially which will be filtered out by the fine-grained constraints filtering, but should still be fixed for accurate usage.

NPE Exception when Coordinate method for geometry returns null

Integration tests failing:

  • Issue is in mil.nga.giat.geowave.store.GeometryUtils::geometryToBinary
  • Coordinate will be null when the geometry is empty (no points).
  • In this case it doesn't really matter how we initialize the WKBWriter, as there will be no points to serialize - so just default to 2 when the geometry is empty.

Accumulo namespace support

Add accumulo namespace support where required (or validate and document if namespace.tablename prefix is needed)

Add locality group caching to address performance issues.

Checking the existence of a locality group is causing a noticeable slowdown during ingest. Logic needs to be added to cache locality groups so that we can avoid the overhead with Accumulo. Cached locality groups should be dropped after a specified period of time.

Iterator classloader hack breaks with hdfs URI prefix

Caused by: java.net.MalformedURLException: invalid url: hdfs://:8020/accumulo/lib/geowave-gt-0.8.0-SNAPSHOT-accumulo-singlejar.jar!/ (java.net.MalformedURLException: unknown protocol: hdfs)
at java.net.URL.(URL.java:619)
at java.net.URL.(URL.java:482)
at java.net.URL.(URL.java:431)
at mil.nga.giat.geowave.gt.query.CqlQueryFilterIterator.initClassLoader(CqlQueryFilterIterator.java:163)
at mil.nga.giat.geowave.gt.query.CqlQueryFilterIterator.init(CqlQueryFilterIterator.java:186)
at org.apache.accumulo.core.iterators.IteratorUtil.loadIterators(IteratorUtil.java:243)

(This is where, in a static instance, we create a new classloader instance with the jars and attach it the parent VFS classloader. This hack is required to get SPI injection working in iterator stacks)

Create GeoGig backend

GeoGit is a DVCS for geospatial data. It adds the ability to track provenance, history, and perform diffs on different data sets.
Ultimately it would be ideal to be able to store and track this information for any feature stored in GeoWave - as a first step to this implementing a GeoGit backend in accumulo (leveraging geowave where possible) is desired.

Geogit has a pluggable data store implementation - it looks like there's a concept of a GraphDatabase, an ObjectDatabase, and a StagingDatabase that need to be implemented. The MongoDB backend provides a good example of implementing all three of these (see [3]).

The GraphDB implementation can be done from scratch, but it looks like there are canned implementations that leverage the BluePrints API. There's an a project ([4]) that implements a blueprints-api on accumulo which may speed this up (state of this project is currently known (stability, quality, etc.).

Note that further investigation is needed to determine to what extent GeoWave and GeoGit objects can be co-mingled. It would be ideal to not duplicate any date when not strictly needed. It might also be desireable to keep historical data (diffs, versions), in a separate table to keep "current state" queries quick. Working out the appropriate direction here would be done in conjunction with this task.

There are two groups on the geogit dev list working an HBase (may be migrating to something else - Ceph?) object store as well as spatial indexing - currently most of the stuff seems pretty rudimentary, but might be worth keeping an eye on.

[1] GeoGit: https://github.com/boundlessgeo/GeoGit
[2] DevDocs: https://github.com/boundlessgeo/GeoGit/blob/master/doc/technical/source/developers.rst
[3] MongDB implementation: https://github.com/boundlessgeo/GeoGit/tree/master/src/storage/mongo
[4] Accumulo-Blueprints project: https://github.com/mikelieberman/blueprints-accumulo-graph
[5] Group working NoSQL object database (Hbase - now Ceph?): http://geogitobjdb.blogspot.com/
[6] GeoGit spatial index discussion: https://groups.google.com/a/boundlessgeo.com/forum/#!searchin/geogit/spatial$20index/geogit/9yVQAFL4n4I/VZDFrCsh3kgJ
[7] GeoGit discussion group: https://groups.google.com/a/boundlessgeo.com/forum/#!forum/geogit

Create web front end for easy upload of data

Create a web page / back-end service that exposes the vector file ingest, as well as other ingesters (gpx, etc.) to allow web based submission of GIS data (think geojson, shapefile, etc.). The ingest service should be usable directly as well (i.e. doesn't require the web page)

This should also interact with the geoserver api to automate, or at least simplify, the publishing of data stores and layers.

Mark jai as provided

Jai is getting bundled as a dependency, but if it isn't installed it's causing an exception when the native libraries aren't found. (Causing geoserver not to load in some cases depending on plugins installed).

The libraries should be on the classpath ($JRE_HOME/lib/ext) already, so no need to bundle.

Basic Utility functions

create a geowave-utils project with main methods to do convenience functions:

  • set splits on a table
    • based on quantile distribution and fixed number of splits
    • based on equal interval distribution and fixed number of splits
    • based on fixed number of rows per split
  • get all geowave namespaces
  • set locality groups per column family (data adapter) or clear all locality groups
  • get # of entries per data adapter in an index
  • get # of entries per index
  • get # of entries per namespace
  • list adapters per namespace
  • list indices per namespace

Implement a generalized MapReduce ingest process to use for GPX point and line ingest

  • Add a utility to stage files to HDFS (recurse files in a directory matching a given file filter)
  • Implement GeoWaveAccumuloOuputFormat to easily write data to GeoWave in Accumulo
  • Implement a general purpose mapper and reducer that can use a file reader interface and an aggregation strategy to persist OGC Features in GeoWave
  • Provide a concrete implementation of this generalized process for GPX points and lines

Add a module that can perform end to end system integration testing

The test will ingest reasonably large point and line temporal datasets within default spatial and spatial-temporal indices and test that query results match expected results to give a good indication that the entire system works as expected. This can be useful for verifying a system is set up correctly and for functional regression testing as new features are added.

WFS-T support

Finish implementing required geotools datastore methods for WFS-T functionality.
Probably has a dependency on #17

Output streams sometimes closed twice in geowave iterators

Only apparent impact currently is logspam, but needs to be fixed

19 Jun 06:24:32 WARN [transport.TIOStreamTransport] - Error closing output stream.
java.io.IOException: The stream is closed
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:115)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
        at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
        at org.apache.thrift.transport.TFramedTransport.close(TFramedTransport.java:89)
        at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.close(ThriftTransportPool.java:289)
        at org.apache.accumulo.core.client.impl.ThriftTransportPool.returnTransport(ThriftTransportPool.java:570)
        at org.apache.accumulo.core.util.ThriftUtil.returnClient(ThriftUtil.java:115)
        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:693)
        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:361)
        at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:745)

Integrate Continuous Integration (probably Travis CI)

Ideally we will use Travis CI to run integration and unit tests prior to merging any changes into the master. It will also be nice to automatically build javadocs to be committed to our gh-pages branch for living documentation.

Create pointcloud (LAS probably) ingester

Load pointcloud (3d / 3d + temporal) into geowave.

Investigation on access efficiency

  • Load each point as an individual point (vector)
  • Store a volumetric space and index it by the volume (tiling in 3 dimensions)

Access mechanism: look to #14 as one of the potential use cases

Create geowave-examples project

At first cut geowave examples should

  • Show an easy way of programatically ingesting feature data from shape files and geojson
  • Show an easy way of programatically ingesting geospatial data generated in code
  • Show an easy way to programatically query data and export to a shape file and geojson
  • Demonstrate how to ingest and query data using the mapreduce input/output formats
  • Demonstrate how to ingest data for supported types using the ingest framework
    • This will likely be more in the supporting documentation as there is no code required
  • Demonstrate how to write a new plugin for a simple format type

The documentation in the gh-pages branch should be updated to explain the above as well.

Add GeoServer subpanel to support configuring visibility options

Current visibility options are determined by a GEOWAVE_VISIBILITY attribute of a feature.
Each layer should define it's own visibility criteria.

The visibility metadata should be maintained in zookeeper.
The visibility metadata should be associated with each adapter (typename).
The metadata includes the attribute name and the parser. Currently, there is only one:JsonDefinitionColumnVisibilityManagement. These options are provided to the FeatureDataAdaptor in its constructor as called by GeoWaveGTDataStore#createSchema.
Somehow, the GeoWaveGTDataStore must compile and maintain the metadata. The visibility page access the data though the data store.

Of note: each GT Data Store instance is associated with a workspace and has its own set of associated layers. At the moment, namespace issues can be resolved having layers have unique names. However, the developer should consider the possibility that two data store instances have the layers with same name. I do not think is possible or realistic. Thus, metadata could be simply indexed by typeName.

REST API for service access to GeoWave datastores

Some operations to support are:

  • list GeoWave namespaces that exist (geowave-utils #43)
  • ingest by:
    • upload file (#4)
    • ingest from a filesystem accessible by the server
    • allow for additional attributes be associated with each feature (GeoTools ingest type only)
  • Geoserver facades with default GeoWave configuration to
    • publish data stores
    • publish layers
    • get/set styles
    • enable GeoWebCache
    • list GeoWave data stores, with zookeepers, accumulo instance and namespace of each
    • list all GeoWave layers, and list layers by namespace
  • analytics services to follow

GeoTools data store utilizing existing spatial-temporal index

Currently, our GeoTools data store will create a SpatialQuery object for all queries against any index. In particular we want to be able to utilize the spatial-temporal index if both spatial and temporal bounds are given, but this could be generally useful for querying by any property/dimension, and an index will not work if some bounds are not provided for an indexed field.

Create Mapnik data source

Develop a mapnik geowave datasource plugin:

  • This will enable mapnik to render tiles directly from a geowave datastore. [1][2]
  • Mapnik is behind most of the OSM infrastructure - enabling this brings in lots of great features
  • Initial target would be 3.x unless something comes up[3]

Decision point on RPC (thrift, etc.) vs. in-process (Jace[4] probably)
See #14 - same technique here applies there

ref:
[1] https://github.com/mapnik/mapnik/wiki/PluginArchitecture
[2] https://github.com/mapnik/mapnik/wiki/DevelopingPlugins
[3] https://github.com/mapnik/mapnik/issues?milestone=15&state=open
[4] https://code.google.com/p/jace/wiki/Overview

Options for AccumuloDataStore

Add options that can be provided to AccumuloDataStore and GeoWaveDataStore to change some behaviors but have reasonable defaults be the current behavior if no options are specified. A few examples to start out would be to enable/disable persisting a data adapter in the metadata table if it doesn't exist, enable/disable persisting an index in the metadata table if it doesn't exist, and enable/disable creating an index table if it doesn't exist (probably just throwing an error if it doesn't exist).

Another option in which the default behavior is currently not implemented is to enable/disable automatically adding a locality group for a new column family (data adapter ID) within an index table. It seems the default behavior would be to create a locality group for each column family because this is the most typical access pattern.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.