An application built to insert unstructured data inside a NoSQL database and retrieve that through a full text index search.
A scheduled task runs to ingest data from external resource.
- Architectural view
- Quick Start
- Mongo
- RabbitMQ
- Eureka Registry
- Front End
- Ingestor
- Storage
- Scheduler
- Metrics
- Spring Boot 2.7
- Thymeleaf template
- Docker
- Spring Data Mongo
- Eureka Register
- RabbiMQ
- Prometheus
- Grafana
- EhCache
- Micrometer
Run maven root project (./pom.xml) with Jdk 17
set JAVA_HOME=c:\Program Files\Java\jdk-17.0.3.1
c:\apache-maven-3.6.3\bin\mvn clean package -DskipTests
Start up the containers by running
docker-compose up -d
A collection called "information" inside "test" database is created in Mongo Db during Docker container start up.
Data structure is the follow:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"_id": {
"type": "string"
},
"type": {
"type": "string"
},
"payload": {
"type": "object"
},
"dtInsert": {
"type": "string"
},
"_class": {
"type": "string"
}
},
"required": [
"_id",
"type",
"payload",
"dtInsert",
"_class"
]
}
Payload node contains whatever data inserted.
RabbitMQ message broker decouple communications between Ingestor/Scheduler services and Storage. Dashboard is available at http://localhost:15672 (username: info, password: news).
A single Queue "notify" is configured and joined at direct exchange binding called "notify-exchange". The queue is not persistable.
Services register themselves into Discovery Registry in order to discovery each other without hard coding IP address and/or port. Also, Registry checks their health status and put service offline when is not available.
Front end service using Spring Boot Framework (2.7) and Thymeleaf template to build a Http Web Application available on port 80.
The application is available through basic authentication (username: admin, password: password).
The site is based on two page: In the "Insert" page, a user can add an information with specified a Kind.
In the "Search" page, a user can look for any type of word from information ingested into NoSQL database.
Ingestor service is not exposed on public port and get data from FrontEnd in order to transform it in a message. The message is sent to a RabbitMQ message broker. A retry policy is configured in order to avoid lost message.
cache:
channel:
#Number of channels to retain in the cache. When "check-timeout" > 0, max channels per connection.
size: 2
#Duration to wait to obtain a channel if the cache size has been reached. If 0, always create a new channel.
checkout-timeout: 10000
Storage service is not exposed on public port and make available two feature. First of all, it works as listener to get data from RabbitMQ and store into NoSQL. It also makes available an endpoint to get full text search from MongoDB.
Data is cached when the full text search endpoint is called (/information/{word}). {word} is the cache key. All entries in cache are evicted when new information is added.
A retry policy is configured in order to avoid lost message.
listener:
simple:
retry:
enabled: true
initial-interval: 3s
max-attempts: 3
max-interval: 10s
multiplier: 2
Scheduler service works as job executor. Get news from BBC feed
http://newsapi.org/v2/top-headlines?sources=bbc-news&apiKey=9acc642023684f07b46fae89185513ce
Any entry generates a message sent to RabbitMQ message broker to put them into NoSQL.
FrontEnd service makes metrics available using Micrometer with Prometheus adapter. Prometheus requests application metrics from FrontEnd service in order to make it available to Grafana.
Prometheus is available at endpoint http://localhost:9090
Grafana is available at endpoint http://localhost:3000 (user: admin, password: password)
Data is scraped every 40 seconds.
It's possible to achieve HA with different choices:
Inside docker-compose.yml, duplicating a service (e.g. Ingestor), we'll have 2 instances of the same service balaced by the registry.
ingestor-istance2-app:
image: "area/ingestor-app:1.0.0"
container_name: 'ingestor-istance2-app'
mem_limit: 256M
build:
context: ./Ingestor
dockerfile: Dockerfile
ports:
- '8094:8084'
environment:
JAVA_OPTS: -Xmx256m
depends_on:
- registry_area
In this way, shutting down a service will not generate any failure in application.
Having an orchestrator like Docker Swarm installed in local environment, it's possible to achieve HA by adding deploy strategy into docker-compose.yml file. Example
ingestor-app:
image: "area/ingestor-app:1.0.0"
container_name: 'ingestor-app'
mem_limit: 256M
deploy:
mode: replicated
replicas: 2
build:
context: ./Ingestor
dockerfile: Dockerfile
ports:
- '8084:8084'
environment:
JAVA_OPTS: -Xmx256m
depends_on:
- registry_area
In this way, ingestor-app is deployed with 2 instance.