Giter Club home page Giter Club logo

metanome's Introduction

Metanome

Build Status Coverage Status

The Metanome project is a joint project between the Hasso-Plattner-Institut (HPI) and the Qatar Computing Research Institute (QCRI). Metanome provides a fresh view on data profiling by developing and integrating efficient algorithms into a common tool, expanding on the functionality of data profiling, and addressing performance and scalability issues for Big Data. A vision of the project appears in SIGMOD Record: "Data Profiling Revisited".

The Metanome tool is supplied under Apache License. You can use and extend the tool to develop your own profiling algorithms. The profiling algorithms provided on our download page have HPI copyright. You are free to use and distribute them for research purposes.

The Metanome platform itself is an backend service that communicates over an HTTP REST API endpoint. We provide a Metanome Frontend maintained in an separate repository that can be used to interact with the Metanome platform.

Building Metanome Locally

Metanome is a java maven project. So in order to build the sources, the following development tools are needed:

  1. Java JDK 1.7 or later
  2. Maven 3.1.0
  3. Git

Make sure that all three are on your system's PATH variable when running the build.

#####Pull Metanome Frontend Submodule Before executing the build you have to clone the Metanome Frontend into the project.

git submodule init
git submodule update

#####Build Metanome Metanome can be build by executing:

mvn -T 1C clean install

Metanome can be packaged together with a Tomcat webserver, some test data, and some test algorithms. To speedup builds this package is not created in the default maven profile. The deployment package can be created by executing the build with the deployment-local profile:

mvn verify -P deployment-local

or by executing package on the deployment project directly (if metanome has not been installed dependencies will be retrieved online):

mvn -f deployment/pom.xml package

To start the Metanome frontend you then have to execute the following steps in the deployment folder:

  1. Unzip deployment/target/deployment-1.1-SNAPSHOT-package_with_tomcat.zip
  2. Go into the unzipped folder and start the run script, either run.sh or run.bat(Windows Systems)
  3. Open a browser at http://localhost:8080/

Deploy Metanome Remote

It is possible to deploy Metanome using PaaS providers like (Amazon Beanstalk, Heroku or Google App Engine). We provide additional configs and documentation how to deploy Metanome on these in the github wiki.

Developing a profiling algorithm for Metanome

If you want to build your own profiling algorithm for the Metanome tool, the best way to get started is our Skeleton Project. It contains an algorithm frame and a test runner project, with which you can run and test your code (without a running Metanome tool instance). For more details, check out the contained README.txt file.

Downloads

All Metanome releases can be found on the Metanome releases page.

Current profiling algorithms are available at the Algorithm releases page.

Documentation

The Metanome tool, information for algorithm developers and contributors to the project can be found in the github wiki.

Development

The Metanome modules are continuously deployed to sonatype and can be used by adding the repository:

<repositories>
    <repository>
        <id>snapshots-repo</id>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    </repository>
</repositories>

Git Commit Hooks

The project is using license-maintainer as Pre-Commit Git Hook to keep the license information in all Java, XML and Python files up to date. To use it you have to execute the ./add_hooks.sh shell script which is creating an pre-commit hook symlink to the license-maintainer script.

Coding style

The project follows the google-styleguide please make sure that all contributions adhere to the correct format. Formatting settings for common ides can be found at: http://code.google.com/p/google-styleguide/ All files should contain the apache copyright header. The header can be found in the COPYRIGHT_HEADER file.

metanome's People

Contributors

carlambroselli avatar claudia-exeler avatar dacry avatar fatschi avatar jakob-zwiener avatar jens-ehrlich avatar lsgd avatar mandy-roick avatar maxifischer avatar pmlanger avatar tabergma avatar thorsten-papenbrock avatar timdraeger avatar vincents avatar xchrdw avatar xkr47 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.