Giter Club home page Giter Club logo

amundsen-custom's Introduction

Installation

Bootstrap a default version of Amundsen using Docker

The following instructions are for setting up a version of Amundsen using Docker, backed by Neo4j and Elasticsearch.

  1. Create a private fork of this repo.
  2. Clone your fork this repo and its submodules by running:
    git clone --recursive [email protected]:stemma-ai/amundsen-custom.git
  3. Install docker and docker-compose. Allocate at least 3GB available to Docker.
  4. Enter the cloned directory and run:
    docker-compose up --abort-on-container-exit
  5. Ingest static sample data into Neo4j:
    • In a separate terminal window, cd to the databuilder/upstream submodule.
    • The sample_data_loader.py Python script included in examples/ directory uses elasticsearch client, pyhocon and other libraries. Install the dependencies in a virtual env and run the script by following the commands below:
     python3 -m venv venv
     source venv/bin/activate
     pip3 install --upgrade pip
     pip3 install -r requirements.txt
     python3 setup.py install
     python3 example/scripts/sample_data_loader.py
  6. View UI at http://localhost:5000 and try to search test, it should return some results.

Verify setup

  1. You can verify dummy data has been ingested into Neo4j by by visiting http://localhost:7474/browser/ and run MATCH (n:Table) RETURN n LIMIT 25 in the query box. You should see two tables:
    1. hive.test_schema.test_table1
    2. hive.test_schema.test_table2
  2. You can verify the data has been loaded into the metadataservice by visiting:
    1. http://localhost:5000/table_detail/gold/hive/test_schema/test_table1
    2. http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2

Troubleshooting

  1. If the Docker Container doesn't have enough heap memory for Elastic Search, es_amundsen will with the error es_amundsen | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

    1. Increase the Heap memory in the host machine. In Linux, that means modifying your own machine. For Mac, that means modifying the Docker for Mac configuration. See these detailed instructions from Elastic.
    2. Re-run docker-compose
  2. If docker-compose stops with a org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Failed to create node environment message, then es_amundsen cannot write to .local/elasticsearch. There is a file share mount established between the Docker container and your host machine, so run this in your terminal:

    1. chown -R 1000:1000 .local/elasticsearch
    2. Re-reun docker-compose
  3. If ES container crashed with Docker error 137 on the first call from the website (http://localhost:5000/), this is because you are using the default Docker engine memory allocation of 2GB. The minimum needed for all the containers to run with the loaded sample data is 3GB. To do this go to your Docker -> Preferences -> Resources -> Advanced and increase the Memory, then restart the Docker engine.

  4. Check if all 5 Amundsen related containers are running with docker ps? Can you connect to the Neo4j UI at http://localhost:7474/browser/ and similarly the raw ES API at http://localhost:9200? Does Docker logs reveal any notable issues?

  5. Report the issue on this repo. The standard instructions should Just Work for everyone, and we'll gladly help get your install working!

amundsen-custom's People

Contributors

dorianj avatar verdan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.