Giter Club home page Giter Club logo

arlas-exploration-stack's Introduction

Gisaïa

Gisaïa develops ARLAS, an open source platform for exploring geo-analytically huge volumes of spatio-temporal data.

To begin with ARLAS Exploration, you can start the full software stack on your computer with the ARLAS Exploration stack project. Also, three tutorials are available for loading data in ARLAS. You can try a tutorial with bird tracking data, another one with vessel tracking data (also called AIS data) and finally a last one with polluant data.

In case you are interested in massively processing geotracked asset data, then our open source library ARLAS PROC ML can be very useful.

arlas-exploration-stack's People

Contributors

elouankeryell-even avatar mbarbet avatar mohamedhamougisaia avatar qucmgisaia avatar sebbousquet avatar sylvaingaudan avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

arlas-exploration-stack's Issues

Add a nginx container for serving the services and WUIs

As the user of the ARLAS Exploration stack,
I want to have a single http endpoint for adding dashboards, browsing dashboards and exploring my data
So that I have a seamless user interface: I do not have to switch between http serveurs depending on what I do.

Automatic tests

  • run stack
  • initialize stack
  • get collection
  • get WUI configuration

Mount persistence directory in $HOME/.arlas

Currently, the stack persists the persistence data in /tmp.
Today, I had an electricity interruption. My computer restarted and therefore the content of /tmp was removed. I lost all my configurations. A local directory that is not removed might be a better default value.

Make initializer work with a copy of initialization files, to prevent ownership issues

Issue

The initializer loads initialization files from the host filesystem through a bind mount. The initialization process running in the initializer container is executed as user root. The initializer modifies certain initialization files, namely server/collection.json. As a result, at the end of the initialization process, the modified initialization files have their ownership changed to root on the host file system, which is not acceptable.

$ find data_samples/ais-danmark/ -type f -user root
data_samples/ais-danmark/server/collection.json

Fix

  • [initialization container] Change the destination of the initialization directory's bind-mount, from /initialization to /initialization/original
  • [initialization container] initialization directory's bind-mount read-only, to ensure the initialization container
  • [initialization container] In the container, at the beginning of the initialization process, copy /initialization/original to /initialization/copy
  • [initialization container] In the container, now make the initialization process work with /initialization/copy

Process to test fix

# [Initial state] Verify there are no root owned files
find data_samples/ais-danmark/ -type f -user root

# Perform an initialization
./ARLAS-Exploration-stack.bash up
docker run \
  -e elasticsearch_index=ais-danmark \
  -e server_collection_name=ais-danmark \
  -i \
  --mount dst="/initialization/original",src="$PWD/data_samples/ais-danmark",type=bind \
  --mount type=volume,src=default_wui-configuration,dst=/wui-configuration \
  --name arlas-exploration-stack-initializer \
  --net arlas \
  --rm \
  -t \
  gisaia/arlas-exploration-stack-initializer
./ARLAS-Exploration-stack.bash down

# [Final state] Verify there are no root owned files
find data_samples/ais-danmark/ -type f -user root

docker-compose.yaml : unused port mapping should be removed

Support other logstash input than stdin

For now, the initializer requires to be provided with a data file, which it will pipe into logstash's stdin. Yet, users will not necessarily have their data under the form of a file. They may want to provide them in other ways: kafka, ...

  • make data file optional
  • execute logstash last. Indeed, user may want to run logstash in an infinite way (ex: continuously pull data from a kafka cluster, ...)
  • documentation

ping @mbarbet

Make it work with locally built images

Current behavior of the ARLAS stack: pull remote images. If a local image exists, it will get overwritten.

Sometimes, you want to work with a local development image you built on your own, and there should be a mean to do so.

I suggest a -p|--no-pull option.

Build fails on old `apt` metadata

$ cd arlas-exploration-stack-initializer && docker build -t gisaia/arlas-exploration-stack-initializer .; cd ..
Sending build context to Docker daemon  9.728kB
Step 1/6 : FROM ubuntu:17.10
17.10: Pulling from library/ubuntu
4ccdce43d1e0: Pull complete 
c95f13c88d92: Pull complete 
82656eee95ad: Pull complete 
78ff727be57a: Pull complete 
448bb314afa5: Pull complete 
Digest: sha256:3b811ac794645dfaa47408f4333ac6e433858ff16908965c68f63d5d315acf94
Status: Downloaded newer image for ubuntu:17.10
 ---> e211a66937c6
Step 2/6 : ENV logstash_version=6.2.3
 ---> Running in 0ba4c1fcfa5a
Removing intermediate container 0ba4c1fcfa5a
 ---> 9a94a387ff7f
Step 3/6 : RUN export logstash_archive="logstash-$logstash_version.tar.gz" &&     apt update &&     apt -y install curl jq openjdk-8-jre-headless &&     curl -o "$logstash_archive" "https://artifacts.elastic.co/downloads/logstash/$logstash_archive" &&     tar xf "$logstash_archive" &&     rm -fr "$logstash_archive"
 ---> Running in 1a5ed66626d6

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Ign:1 http://archive.ubuntu.com/ubuntu artful InRelease
Ign:2 http://archive.ubuntu.com/ubuntu artful-updates InRelease
Ign:3 http://archive.ubuntu.com/ubuntu artful-backports InRelease
Err:4 http://archive.ubuntu.com/ubuntu artful Release
  404  Not Found [IP: 91.189.88.173 80]
Err:5 http://archive.ubuntu.com/ubuntu artful-updates Release
  404  Not Found [IP: 91.189.88.173 80]
Err:6 http://archive.ubuntu.com/ubuntu artful-backports Release
  404  Not Found [IP: 91.189.88.173 80]
Ign:7 http://security.ubuntu.com/ubuntu artful-security InRelease
Err:8 http://security.ubuntu.com/ubuntu artful-security Release
  404  Not Found [IP: 91.189.91.23 80]
Reading package lists...
E: The repository 'http://archive.ubuntu.com/ubuntu artful Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu artful-updates Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu artful-backports Release' does not have a Release file.
E: The repository 'http://security.ubuntu.com/ubuntu artful-security Release' does not have a Release file.
The command '/bin/sh -c export logstash_archive="logstash-$logstash_version.tar.gz" &&     apt update &&     apt -y install curl jq openjdk-8-jre-headless &&     curl -o "$logstash_archive" "https://artifacts.elastic.co/downloads/logstash/$logstash_archive" &&     tar xf "$logstash_archive" &&     rm -fr "$logstash_archive"' returned a non-zero code: 100

Enable modular initialization

Description

User might not want to perform the whole initialization process, but only part of it.

Example

@sylvaingaudan has a use-case where he already has his data in Elasticsearch, and he wants to initialize the ARLAS Exploration Stack, but skipping the Elasticsearch-ingestion step.

Specification

  • Break the initialization process into unitary operations
  • A bash function for each operation
  • Have an environment variables containing the comma-separated list of steps to execute
  • An operation is triggered only if the corresponding value is present in the list of steps to execute
  • If the list of steps to execute is not specified by the user, default to full-initialization (i.e. current behavior)

Currently defined operations are shown in this diagram.

Implement installation of plugins @ runtime

Implementation

  • Make arlas-exploration-stack-initializer support new variable ARLAS_EXPLORATION_STACK_LOGSTASH_PLUGINS, which would consist in a comma separated list of plugin names. In base.bash:
if [[ -v ARLAS_EXPLORATION_STACK_LOGSTASH_PLUGINS ]]; then
  mapfile -d ',' -t plugins <<< "$ARLAS_EXPLORATION_STACK_LOGSTASH_PLUGINS"
  "./logstash-$logstash_version/bin/logstash-plugin" install "${plugins[@]}"
fi
  • docs

Fix docker version

ATM, the version of the Docker package installed in the Manager's image is not fixed.

Consequences:

  • the version of the Docker package installed in the Manager's image depends on the time @ which the docker installation command was performed
    • it is not possible to specify a compatibility rule

Example:

Environment Host's Docker version Date of Manager's Docker install Manager's Docker version Result Comment
Environment 1 18.03 a few months ago??? 18.03 Success
Environment 2 18.02 9/2018 18.06 Failure (Docker server version obsolete compared to Docker client version) Even though the compatibility rule stated in the documentation was respected (host's Docker version >= 18.02), it didn't work

Use `CMD` instead of `ENTRYPOINT`

Executing the manager with a different command (debugging purposes) is done this way:

docker run --entrypoint bash -i --rm -t gisaia/arlas-exploration-stack-manager

It would be better if not requiring to use option --entrypoint.

docker run -i --rm -t gisaia/arlas-exploration-stack-manager bash

This is done by using directive CMD instead of ENTRYPOINT in the Dockerfile.

initializer sometimes crashes when executed on AIS sample

Sending Logstash's logs to /logstash-6.2.3/logs which is now configured via log4j2.properties
[2018-09-21T09:30:55,208][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/logstash-6.2.3/modules/netflow/configuration"}
[2018-09-21T09:30:55,243][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/logstash-6.2.3/modules/fb_apache/configuration"}
[2018-09-21T09:30:55,409][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.queue", :path=>"/logstash-6.2.3/data/queue"}
[2018-09-21T09:30:55,415][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.dead_letter_queue", :path=>"/logstash-6.2.3/data/dead_letter_queue"}
[2018-09-21T09:30:56,078][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-09-21T09:30:56,129][INFO ][logstash.agent           ] No persistent UUID file found. Generating new UUID {:uuid=>"b54d84bc-ad94-4a33-843e-1dba299fc5dd", :path=>"/logstash-6.2.3/data/uuid"}
[2018-09-21T09:30:56,854][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.2.3"}
[2018-09-21T09:30:57,524][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2018-09-21T09:31:04,256][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-09-21T09:31:04,886][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://arlas-exploration-stack-elasticsearch:9200/]}}
[2018-09-21T09:31:04,912][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://arlas-exploration-stack-elasticsearch:9200/, :path=>"/"}
[2018-09-21T09:31:05,170][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://arlas-exploration-stack-elasticsearch:9200/"}
[2018-09-21T09:31:05,279][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-09-21T09:31:05,285][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-09-21T09:31:05,326][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-09-21T09:31:05,353][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-09-21T09:31:05,415][INFO ][logstash.outputs.elasticsearch] Installing elasticsearch template to _template/logstash
[2018-09-21T09:31:05,588][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://arlas-exploration-stack-elasticsearch:9200"]}
[2018-09-21T09:31:05,832][INFO ][logstash.pipeline        ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0x1997c3e0 run>"}
[2018-09-21T09:31:06,023][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}
[2018-09-21T09:32:41,178][ERROR][logstash.agent           ] Failed to execute action {:action=>LogStash::PipelineAction::Stop/pipeline_id:main, :exception=>"NoMethodError", :message=>"undefined method `map' for nil:NilClass\nDid you mean?  tap", :backtrace=>["/logstash-6.2.3/logstash-core/lib/logstash/util.rb:40:in `thread_info'", "/logstash-6.2.3/logstash-core/lib/logstash/pipeline.rb:662:in `block in plugin_threads_info'", "org/jruby/RubyArray.java:2486:in `map'", "/logstash-6.2.3/logstash-core/lib/logstash/pipeline.rb:662:in `plugin_threads_info'", "/logstash-6.2.3/logstash-core/lib/logstash/pipeline_reporter.rb:66:in `block in to_hash'", "/logstash-6.2.3/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:80:in `inflight_batches'", "/logstash-6.2.3/logstash-core/lib/logstash/pipeline_reporter.rb:56:in `to_hash'", "/logstash-6.2.3/logstash-core/lib/logstash/pipeline_reporter.rb:51:in `snapshot'", "/logstash-6.2.3/logstash-core/lib/logstash/shutdown_watcher.rb:88:in `pipeline_report_snapshot'", "/logstash-6.2.3/logstash-core/lib/logstash/shutdown_watcher.rb:63:in `block in start'", "/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/interval.rb:20:in `interval'", "/logstash-6.2.3/logstash-core/lib/logstash/shutdown_watcher.rb:59:in `start'", "/logstash-6.2.3/logstash-core/lib/logstash/shutdown_watcher.rb:35:in `block in start'"]}

Clean host from existing containers with name conflict

version: 2412586

Running the ARLAS Exploration stack will fail if there are existing containers with the same name as the one deployed by the stack (arlas-server, arlas-wui, elasticsearch).

Happened to @mbarbet & @sebbousquet

How to reproduce

elouan_keryell-even@baume:ARLAS-Exploration-stack$ docker run -d --name arlas-server busybox
c1ff628e5a74c9e3e3efb215c9d775ec0e8664e7ec24d7e15f1192e44e9fe429
elouan_keryell-even@baume:ARLAS-Exploration-stack$ ./ARLAS-Exploration-stack.bash up
> Shutting down containers...
Network arlas is external, skipping

> Creating containers' network...
e4e5b058985a1ed9c17a3c19959c70a154d6144aca4e4004fb190491e9340943

> Pulling containers' images
Pulling arlas-wui     ... done
Pulling elasticsearch ... done
Pulling arlas-server  ... done

> Starting stack...
Creating volume "default_wui-configuration" with default driver
Creating arlas-wui ... 
Creating arlas-server  ... error
Creating elasticsearch ... 
Creating arlas-wui     ... done
ERROR: for arlas-server  Cannot create container for service arlas-server: Conflict. The container name "/arlas-server" is already in use by container "c1ff628eCreating elasticsearch ... done

ERROR: for arlas-server  Cannot create container for service arlas-server: Conflict. The container name "/arlas-server" is already in use by container "c1ff628e5a74c9e3e3efb215c9d775ec0e8664e7ec24d7e15f1192e44e9fe429". You have to remove (or rename) that container to be able to reuse that name.
ERROR: Encountered errors while bringing up the project.

Solutions

  • solution 1: dynamically name containers. If there is already an arlas-server containers, name the containers with suffix -1 (arlas-server-1, arlas-wui-1, elasticsearch-1), etc etc... This implies that this "container suffix" (...-1, ...-2, ...) gets passed between consequent executions of the arlas-exploration-stack-manager, meaning it becomes a stateful app.
    • ❌ it is better to keep the app stateless, I reject this solution
  • solution 2:
    • rename containers to make it explicit they belong to the ARLAS exploration stack:
      • arlas-exploration-stack-elasticsearch
      • arlas-exploration-stack-server
      • arlas-exploration-stack-wui
    • in function down (), implement cleaning of existing containers with name conflict.

Imma implement solution 2.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.