Giter Club home page Giter Club logo

arlas-tagger's Introduction

Gisaïa

Gisaïa develops ARLAS, an open source platform for exploring geo-analytically huge volumes of spatio-temporal data.

To begin with ARLAS Exploration, you can start the full software stack on your computer with the ARLAS Exploration stack project. Also, three tutorials are available for loading data in ARLAS. You can try a tutorial with bird tracking data, another one with vessel tracking data (also called AIS data) and finally a last one with polluant data.

In case you are interested in massively processing geotracked asset data, then our open source library ARLAS PROC ML can be very useful.

arlas-tagger's People

Contributors

alainbodiguel avatar dependabot[bot] avatar elouankeryell-even avatar mbarbet avatar mohamedhamougisaia avatar

Watchers

 avatar  avatar

arlas-tagger's Issues

Handle tag request error

DEBUG [2021-03-16 10:52:49,824] io.arlas.tagger.kafka.TagKafkaConsumer: [Consumer clientId=fd6d5fdc-27fe-4831-a9c6-fc13a213b972, groupId=execute_tags_consumer_group] Kafka consumer has been closed
Exception in thread "Thread-18" ElasticsearchStatusException[Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]]
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
	at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
	at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
	at org.elasticsearch.client.RestHighLevelClient.updateByQuery(RestHighLevelClient.java:599)
	at io.arlas.tagger.core.FilteredUpdater.doAction(FilteredUpdater.java:67)
	at io.arlas.tagger.service.UpdateServices.unTag(UpdateServices.java:48)
	at io.arlas.tagger.service.TagExecService.processRecords(TagExecService.java:83)
	at io.arlas.tagger.service.KafkaConsumerRunner.run(KafkaConsumerRunner.java:90)
	at java.lang.Thread.run(Thread.java:748)
	Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [https://690a605d3db34f749f1c7bb57e08e45f.europe-west1.gcp.cloud.es.io:9243], URI [/ml_ais_flow/_update_by_query?slices=auto&requests_per_second=-1&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&max_docs=2147483647&timeout=1m], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"query_shard_exception","reason":"Can only use regexp queries on keyword and text fields - not on [tagging.num1] which is of type [long]","index_uuid":"JdeEIwULT1q6qji2PZghNQ","index":"ml_ais_flow"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"ml_ais_flow","node":"Y5lyy8YHRRucKAE2vYcEYw","reason":{"type":"query_shard_exception","reason":"Can only use regexp queries on keyword and text fields - not on [tagging.num1] which is of type [long]","index_uuid":"JdeEIwULT1q6qji2PZghNQ","index":"ml_ais_flow"}}],"suppressed":[{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"ml_ais_flow","node":"Y5lyy8YHRRucKAE2vYcEYw","reason":{"type":"query_shard_exception","reason":"Can only use regexp queries on keyword and text fields - not on [tagging.num1] which is of type [long]","index_uuid":"JdeEIwULT1q6qji2PZghNQ","index":"ml_ais_flow"}}]},{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"ml_ais_flow","node":"Y5lyy8YHRRucKAE2vYcEYw","reason":{"type":"query_shard_exception","reason":"Can only use regexp queries on keyword and text fields - not on [tagging.num1] which is of type [long]","index_uuid":"JdeEIwULT1q6qji2PZghNQ","index":"ml_ais_flow"}}]}]},"status":400}
		at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
		at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
		... 8 more

Log errors when tagging fails

The following line prints in TRACE mode the result of the tagging result:

LOGGER.trace("Tagged {} documents [total={} / {}%] (failed={}) with processtime={}ms", opUpdateResponse.updated, tagUpdateResponse.updated, tagUpdateResponse.progress, tagUpdateResponse.failed, (System.currentTimeMillis() - t0));

But, when tagUpdateResponse.failed is greater than zero, then a message should be printed in the ERROR logs with the content of tagUpdateResponsefailures in order to spot the failures.

Also, the returned io.arlas.tagger.model.response.UpdateResponse does not contain the failures.

Migrate arlas-tagger source code from ARLAS-server project

  • move the source code from ARLAS-server project
    • arlas-tagger (adapt the configuration file)
    • arlas-tagger-core
    • arlas-tagger-rest
    • arlas-tagger-tests (adapt the ITs)
  • write/update documentation
  • write/update docker files
  • write release script

Upgrade to java 17 + dependencies update

Upgrade to java 17 +
Dependencies update:

  • log4j-core 2.13.2 -> 2.14.1
  • junit 4.13.1 -> 4.13.2
  • hamcrest-core 1.3 -> 2.2
  • rest-assured 3.3.0 -> 4.4.0
  • cyclops 10.0.0-M7 -> 10.4.0
  • kafka 2.1.1 -> 3.0.0

Add configuration by environment variable

As the operator of the tagger with docker or docker compose or k8s
I want to be able to specify environment variable in the container launch instruction
So that the server starts with my preferences

Thoses should include at least the one found in the ARLAS Server and that are common to the Tagger plus the one specific to the Tagger.

Tag multiple fields in a request

Currently, a tag request modify one field of selected documents.

If I want to add multiple tags in multiple fields of the same documents, I have to make multiple request.
It seems that the documents are reindexed each time, which is very costly.

It would be nice to be able to tag several fields of the same documents in a unique request (same filter).

[Tagger] Generate API documentation

As a developer
I want to have the API documentation in the docs directory of the GitHub project
So that I can browse it without the need to import the swagger.json in the Swagger UI

[Tagger] Tag job list Kafka consumer problem

The Kafks consumer used by the endpoint "/status/{collection}/_taglist" is used in a multithreaded context which is not supported by the consumer API.

WARN  [2019-11-18 16:18:31,890] org.eclipse.jetty.server.HttpChannel: /arlas/tagger/status/collection/_taglist
! java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
! at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2201)
! at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2185)
! at org.apache.kafka.clients.consumer.KafkaConsumer.assignment(KafkaConsumer.java:853)
! at io.arlas.tagger.service.TagExploreService.getTagRefList(TagExploreService.java:51)
! at io.arlas.tagger.rest.tag.TagStatusRESTService.taggingGetList(TagStatusRESTService.java:114)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.