locationtech / geowave Goto Github PK
View Code? Open in Web Editor NEWGeoWave provides geospatial and temporal indexing on top of Accumulo, HBase, BigTable, Cassandra, Kudu, Redis, RocksDB, and DynamoDB.
License: Apache License 2.0
GeoWave provides geospatial and temporal indexing on top of Accumulo, HBase, BigTable, Cassandra, Kudu, Redis, RocksDB, and DynamoDB.
License: Apache License 2.0
Features should be able to be looked up directly by the feature id - i.e. a secondary index mapping feature id's -> row id's.
Probably needed for
#16
Specify multiple build configurations for hadoop versions, accumulo versions, geotools versions, and geoserver versions, and have travis run the different combinations.
In the readme detail this build matrix, and include status icons.
Develop a spatial (and consider temporal) benchmark utility that can be generally applicable to run against any system that stores, indexes, and retrieves spatial content. Likely this will be most generally done by keeping it separate from GeoWave, and utilizing GeoTool's data store abstraction.
14/06/27 18:33:30 INFO mapred.JobClient: Task Id : attempt_201406170050_0010_m_000128_2, Status : FAILED
java.lang.IllegalStateException: getY called on empty Point
at com.vividsolutions.jts.geom.Point.getY(Point.java:131)
at mil.nga.giat.geowave.analytics.mapreduce.kde.GaussianCellMapper.incrementLevelStore(GaussianCellMapper.java:157)
at mil.nga.giat.geowave.analytics.mapreduce.kde.GaussianCellMapper.map(GaussianCellMapper.java:144)
at mil.nga.giat.geowave.analytics.mapreduce.kde.GaussianCellMapper.map(GaussianCellMapper.java:32)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
The geometryToBinary function in the GeometryUtils class is using the default constructor for the WKBWriter. This defaults to an output coordinate dimension of 2, so any z values will be ignored.
All our data is stored in EPSG:4326 internally - data sources which aren't in this projection need to be normalized. Currently some input methods (such as the vector file ingest) don't do this. Since the CRS is known we should normalize data not in 4326 in a cetralized place (so each ingest process doesn't have to replicate the code - and also to ensure data consistency)
The current geoserver plugin dies (tomcat needs to be restarted) if the zookeeper cluster goes down and is then restarted (due to persistent zookeeper connections).
Some sort of connection pooling might also need to be enabled as there's a limit to the number of simultaneous zookeeper connections we want (due to internal collection synchronization).
Currently, GeoWave's usage of the uzaygezen library expects the resultant beginning and end of range to be inclusive, but after further inspection the end of the range is exclusive. This will result in one extra row ID potentially which will be filtered out by the fine-grained constraints filtering, but should still be fixed for accurate usage.
Integration tests failing:
Add accumulo namespace support where required (or validate and document if namespace.tablename prefix is needed)
Checking the existence of a locality group is causing a noticeable slowdown during ingest. Logic needs to be added to cache locality groups so that we can avoid the overhead with Accumulo. Cached locality groups should be dropped after a specified period of time.
Please place some explanatory and useful graphics within the placeholder sections.
Caused by: java.net.MalformedURLException: invalid url: hdfs://:8020/accumulo/lib/geowave-gt-0.8.0-SNAPSHOT-accumulo-singlejar.jar!/ (java.net.MalformedURLException: unknown protocol: hdfs)
at java.net.URL.(URL.java:619)
at java.net.URL.(URL.java:482)
at java.net.URL.(URL.java:431)
at mil.nga.giat.geowave.gt.query.CqlQueryFilterIterator.initClassLoader(CqlQueryFilterIterator.java:163)
at mil.nga.giat.geowave.gt.query.CqlQueryFilterIterator.init(CqlQueryFilterIterator.java:186)
at org.apache.accumulo.core.iterators.IteratorUtil.loadIterators(IteratorUtil.java:243)
(This is where, in a static instance, we create a new classloader instance with the jars and attach it the parent VFS classloader. This hack is required to get SPI injection working in iterator stacks)
GeoGit is a DVCS for geospatial data. It adds the ability to track provenance, history, and perform diffs on different data sets.
Ultimately it would be ideal to be able to store and track this information for any feature stored in GeoWave - as a first step to this implementing a GeoGit backend in accumulo (leveraging geowave where possible) is desired.
Geogit has a pluggable data store implementation - it looks like there's a concept of a GraphDatabase, an ObjectDatabase, and a StagingDatabase that need to be implemented. The MongoDB backend provides a good example of implementing all three of these (see [3]).
The GraphDB implementation can be done from scratch, but it looks like there are canned implementations that leverage the BluePrints API. There's an a project ([4]) that implements a blueprints-api on accumulo which may speed this up (state of this project is currently known (stability, quality, etc.).
Note that further investigation is needed to determine to what extent GeoWave and GeoGit objects can be co-mingled. It would be ideal to not duplicate any date when not strictly needed. It might also be desireable to keep historical data (diffs, versions), in a separate table to keep "current state" queries quick. Working out the appropriate direction here would be done in conjunction with this task.
There are two groups on the geogit dev list working an HBase (may be migrating to something else - Ceph?) object store as well as spatial indexing - currently most of the stuff seems pretty rudimentary, but might be worth keeping an eye on.
[1] GeoGit: https://github.com/boundlessgeo/GeoGit
[2] DevDocs: https://github.com/boundlessgeo/GeoGit/blob/master/doc/technical/source/developers.rst
[3] MongDB implementation: https://github.com/boundlessgeo/GeoGit/tree/master/src/storage/mongo
[4] Accumulo-Blueprints project: https://github.com/mikelieberman/blueprints-accumulo-graph
[5] Group working NoSQL object database (Hbase - now Ceph?): http://geogitobjdb.blogspot.com/
[6] GeoGit spatial index discussion: https://groups.google.com/a/boundlessgeo.com/forum/#!searchin/geogit/spatial$20index/geogit/9yVQAFL4n4I/VZDFrCsh3kgJ
[7] GeoGit discussion group: https://groups.google.com/a/boundlessgeo.com/forum/#!forum/geogit
Create a web page / back-end service that exposes the vector file ingest, as well as other ingesters (gpx, etc.) to allow web based submission of GIS data (think geojson, shapefile, etc.). The ingest service should be usable directly as well (i.e. doesn't require the web page)
This should also interact with the geoserver api to automate, or at least simplify, the publishing of data stores and layers.
Geolife dataset (See ingester in current geolife branch, data at http://research.microsoft.com/en-us/projects/geolife/ ) has values longitude values that go up to 400. Dateline wrapping isn't handled properly for these values.
Should handle on parsing side, as "meaning" of EPSG:4326 values outside the -180/+180 range is undefined, or rather defined by the convention of the data set.
Jai is getting bundled as a dependency, but if it isn't installed it's causing an exception when the native libraries aren't found. (Causing geoserver not to load in some cases depending on plugins installed).
The libraries should be on the classpath ($JRE_HOME/lib/ext) already, so no need to bundle.
create a geowave-utils project with main methods to do convenience functions:
This will allow us to only multiplex SFC query ranges across the tiers that actually have data.
The integration test fill the output log with errors regarding a non-closed shape file.
servlet or servlet-api is being included in the shaded tomcat jar, which causes tomcat to refuse to load the geoserver plugin (and geoserver as a result).
The geowave-ingest project has a class, VectorFileIngest, which ingests supported geotools datastore formats into geowave.
The test will ingest reasonably large point and line temporal datasets within default spatial and spatial-temporal indices and test that query results match expected results to give a good indication that the entire system works as expected. This can be useful for verifying a system is set up correctly and for functional regression testing as new features are added.
Finish implementing required geotools datastore methods for WFS-T functionality.
Probably has a dependency on #17
This is a feature that would be nice to have in the future. The hope is that we will be able to leverage some of the work done for #17
Write a plugin (read/write) PDAL that allows persistence and query of pointclouds in geowave.
See #13 - use the same technique chosen there (rpc vs. jni) to bridge the PDAL c++ interface with the java.
[1] https://github.com/PDAL/PDAL
[2] http://www.pdal.io/docs.html
[3] http://osgeo-org.1560.x6.nabble.com/pdal-Feedback-on-driver-development-td4680397.html
Only apparent impact currently is logspam, but needs to be fixed
19 Jun 06:24:32 WARN [transport.TIOStreamTransport] - Error closing output stream.
java.io.IOException: The stream is closed
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:115)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
at org.apache.thrift.transport.TFramedTransport.close(TFramedTransport.java:89)
at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.close(ThriftTransportPool.java:289)
at org.apache.accumulo.core.client.impl.ThriftTransportPool.returnTransport(ThriftTransportPool.java:570)
at org.apache.accumulo.core.util.ThriftUtil.returnClient(ThriftUtil.java:115)
at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:693)
at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:361)
at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
at java.lang.Thread.run(Thread.java:745)
The Data Adapter and the Index configuration are persisted within the metadata as values that are only intended for internal system use but are not given a visibility.
Ideally we will use Travis CI to run integration and unit tests prior to merging any changes into the master. It will also be nice to automatically build javadocs to be committed to our gh-pages branch for living documentation.
There are a few deprecated methods in 1.5 that are dropped in 1.6 (InputFormatBase methods, etc.). Default is to support 1.5.x and 1.6.x so bias method choice to that.
This is particularly for a performance improvement for GeoServer's "Compute from data" in the "Edit Layer" page.
It's published now -
http://search.maven.org/#browse|1859425327
Here's the commit that got wiped out from the latest ingest framework:
bc795d5
Load pointcloud (3d / 3d + temporal) into geowave.
Investigation on access efficiency
Access mechanism: look to #14 as one of the potential use cases
In 1.6 accumulo has a namespace option.
In geowave we use the term namespace to refer to a dataset.
Ensure where a namespace value is required or displayed we are clear what type of namespace it is.
At first cut geowave examples should
The documentation in the gh-pages branch should be updated to explain the above as well.
Current visibility options are determined by a GEOWAVE_VISIBILITY attribute of a feature.
Each layer should define it's own visibility criteria.
The visibility metadata should be maintained in zookeeper.
The visibility metadata should be associated with each adapter (typename).
The metadata includes the attribute name and the parser. Currently, there is only one:JsonDefinitionColumnVisibilityManagement. These options are provided to the FeatureDataAdaptor in its constructor as called by GeoWaveGTDataStore#createSchema.
Somehow, the GeoWaveGTDataStore must compile and maintain the metadata. The visibility page access the data though the data store.
Of note: each GT Data Store instance is associated with a workspace and has its own set of associated layers. At the moment, namespace issues can be resolved having layers have unique names. However, the developer should consider the possibility that two data store instances have the layers with same name. I do not think is possible or realistic. Thus, metadata could be simply indexed by typeName.
Some operations to support are:
Currently, our GeoTools data store will create a SpatialQuery object for all queries against any index. In particular we want to be able to utilize the spatial-temporal index if both spatial and temporal bounds are given, but this could be generally useful for querying by any property/dimension, and an index will not work if some bounds are not provided for an indexed field.
Develop a mapnik geowave datasource plugin:
Decision point on RPC (thrift, etc.) vs. in-process (Jace[4] probably)
See #14 - same technique here applies there
ref:
[1] https://github.com/mapnik/mapnik/wiki/PluginArchitecture
[2] https://github.com/mapnik/mapnik/wiki/DevelopingPlugins
[3] https://github.com/mapnik/mapnik/issues?milestone=15&state=open
[4] https://code.google.com/p/jace/wiki/Overview
Add options that can be provided to AccumuloDataStore and GeoWaveDataStore to change some behaviors but have reasonable defaults be the current behavior if no options are specified. A few examples to start out would be to enable/disable persisting a data adapter in the metadata table if it doesn't exist, enable/disable persisting an index in the metadata table if it doesn't exist, and enable/disable creating an index table if it doesn't exist (probably just throwing an error if it doesn't exist).
Another option in which the default behavior is currently not implemented is to enable/disable automatically adding a locality group for a new column family (data adapter ID) within an index table. It seems the default behavior would be to create a locality group for each column family because this is the most typical access pattern.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.