Giter Club home page Giter Club logo

lmarabi / st-hadoop Goto Github PK

View Code? Open in Web Editor NEW
20.0 4.0 6.0 128.48 MB

ST-Hadoop is an open-source MapReduce extension of Hadoop designed specially to analyze your spatio-temporal data efficiently

Home Page: http://st-hadoop.cs.umn.edu/

License: Other

Java 86.00% Shell 0.06% CSS 0.43% HTML 7.93% JavaScript 3.22% PigLatin 0.54% Ruby 1.16% PHP 0.65%
mapreduce spatio-temproal spatiotemporal temporal spatial spatial-data-analysis spatio-temporal indexing 3d spatial-index

st-hadoop's Introduction

ST-Hadoop

ST-Hadoop is an extension to SpatialHadoop that provides efficient processing of Spatio-temporal data using MapReduce Hadoop framework. It provides an extendable spatio-temporal data types to be used in MapReduce jobs including STpoint and Interval. ST-Hadoop employs hierarchical multi-resolution spatio-temporal index structure within Hadoop Distributed File System (HDFS) as a mean of efficient retrieval of spatio-temporal data. The fact that time and space are crucial to the query processing influences our design to use two-level indexing of temporal and then spatial as a second level indexing. It also adds a lower-level of spatial indices in HDFS such as Grid file, R-tree, R+-tree, Quad-tree, and other. Some new InputFormats and RecordReaders are also provided to allow reading and processing spatio-temporal indexes efficiently in MapReduce jobs. ST-Hadoop shipped with two basic spatio-temporal operations that are implemented as efficient MapReduce jobs (i.e., Range query and join) which access ST-Hadoop spatio-temporal index. Developers can implement various spatio-temporal operations that run efficiently using ST-Hadoop data types, operations, and indexes.

How it works

ST-Hadoop is used in the same way as Hadoop. Data files are first loaded into HDFS and indexed using the stmanager command which builds the hierarchical spatio-temporal index structure of your choice of a specified resolution and lower level of a spatial index over the input file. Once the file is indexed, you can execute any of the spatio-temporal operations provided in ST-Hadoop such as range query and spatio-temporal similarity join. New operations will be added occasionally.

Install

ST-Hadoop is packaged as a single jar file which contains all the required classes to run. All operations including building the index can be accessed through this jar file and it gets automatically distributed to all slave nodes by the Hadoop framework. In addition the spatial-site.xml configuration file needs to be placed in the conf directory of your Hadoop installation. This allows you to configure the cluster accordingly.

Examples

Please visit ST-Hadoop website there are few examples of how to use ST-Hadoop operations.

Operation operator Example
spatio-temporal index stmanager ./hadoop jar st-hadoop-uber.jar stmanager /geolife_raw_data.txt /index_geolife time:year shape:edu.umn.cs.sthadoop.trajectory.GeolifeTrajectory sindex:str -no-local -overwrite
spatio-temporal range query strangequery ./hadoop jar st-hadoop-uber.jar strangequery /index_geolife /rq_result shape:edu.umn.cs.sthadoop.trajectory.GeolifeTrajectory rect:39.9111716,116.5956883,39.9119983,116.606835 interval:2008-05-01,2008-05-31 time:year -overwrite
Spatio-temporal DTW KNN dtwknn ./hadoop jar st-hadoop-uber.jar dtwknn /index_geolife /knndtw_result shape:edu.umn.cs.sthadoop.trajectory.GeolifeTrajectory interval:2008-05-01,2008-05-31 time:year k:20 traj:39.9119983,116.606835x39.9119783,116.6065483x39.9119599,116.6062649x39.9119416,116.6059899x39.9119233,116.6057282x39.9118999,116.6054783x39.9118849,116.6052366x39.9118666,116.6050099x39.91185,116.604775x39.9118299,116.604525x39.9118049,116.6042649x39.91177,116.6040166x39.9117516,116.6037583x39.9117349,116.6035066x39.9117199,116.6032666x39.9117083,116.6030232x39.9117,116.6027566x39.91128,116.5969383x39.9112583,116.5966766x39.9112383,116.5964232x39.9112149,116.5961699x39.9111933,116.5959249x39.9111716,116.5956883 -overwrite -no-local

Compile

Advanced users and contributors might like to compile ST-Hadoop on their own machines. ST-Hadoop can be compiled via Maven. First, you need to grab your own version of the source code. You can do this through git. The source code resides on github. To clone the repository, run the following command

git clone https://github.com/lmarabi/st-hadoop.git

If you do not want to use git, you can still download it as a single archive provided by github.

Once you download the source code, you need to make sure you have any and Ivy installed on your system. Please check the installation guide of Maven if you do not have it installed.

To compile ST-Hadoop, navigate to the source code and run the command:

mvn compile

This will automatically retrieve all dependencies and compile the source code.

To build a redistribution package, run the command:

mvn assembly:assembly

This Maven command will package all classes of ST-Hadoop along with the dependent jars not included in Hadoop into an archive. This archive can be used to install ST-Hadoop on any existing Hadoop cluster.

st-hadoop's People

Contributors

lmarabi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.