overdelivery-mgmt

A simplified proof-of-concept project for managing ads over-delivery utilizing Spring Cloud and Apache Kafka. Based on the following article:

https://medium.com/@Pinterest_Engineering/using-kafka-streams-api-for-predictive-budgeting-9f58d206c996

Requirements to build this project

Java 8
Maven

Requirements to run the example

Apache Kafka. See section with binary downloads and the recommended stable version. At the moment of writing the version used is: kafka_2.11-0.10.1.0
A modified version of JSON Data Generator originally provided by [ACES,Inc] (http://acesinc.net/)

Setup Instructions

Extract the kafka_2.11-0.10.1.0.tgz file

tar -xvzf kafka_2.11-0.10.1.0.tgz

Start zookeeper and kafka

kafka-install-dir/bin/zookeeper-server-start.sh kafka-install-dir/conf/zookeeper.properties
kafka-install-dir/bin/kafka-server-start.sh kafka-install-dir/conf/server.properties

Install the Json-Data-Generator

Clone/fork the modified version of JSON Data Generator and follow the instructions provided here

Setup the overdelivery-mgmt repo

Clone or fork the repo

git clone [email protected]:kmandalas/overdelivery-mgmt    
cd overdelivery-mgmt

Create all the topics required by the examples

kafka-install-dir/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic ad-insertion-input
kafka-install-dir/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic predicted-spend-output
kafka-install-dir/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic impressions

Running the Infrastructure services

Run each service in different console/terminal. The recommended order is the following:

1. Config Service

 cd <dir>/config/
 mvn spring-boot:run

2. Registry (Service Discovery with Eureka)

 cd <dir>/registry/
 mvn spring-boot:run

3. API Gateway (Zuul)

 cd <dir>/gateway/
 mvn spring-boot:run

Running the Functional services

1. Budget Service

 cd <dir>/budget-service/
 mvn spring-boot:run

2. Inventory Service

 cd <dir>/inventory-service/
 mvn spring-boot:run

Running the kafka consumers and stream

1. predicted-spend-consumer-0

 cd <dir>/predicted-spend-consumer/
 mvn spring-boot:run -Drun.arguments="--partition-no=0"

2. predicted-spend-consumer-1

 cd <dir>/predicted-spend-consumer/
 mvn spring-boot:run -Drun.arguments="--partition-no=1"

3. impressions-consumer-0

 cd <dir>/impressions-consumer/
 mvn spring-boot:run -Drun.arguments="--partition-no=0"

4. impressions-consumer-1

 cd <dir>/impressions-consumer/
 mvn spring-boot:run -Drun.arguments="--partition-no=1"

5. spend-aggregator (kafka stream)

 cd <dir>/spend-aggregator/
 mvn spring-boot:run

Eureka Dashboard

Once you have started all the services, check Eureka dashboard

Keep in mind that the Service Discovery mechanism needs some time after all applications startup. Any service is not available for discovery by clients until: the instance, the Eureka server and the client all have the same metadata in their local cache so it could take 3 heartbeats. Default heartbeat period is 30 seconds.

If everything is started OK, you should have a view similar to the following one:

Initiate event streams

For this paradigm, two (2) streams of events are needed:

insertions
impressions

When a new ad spot (i.e. an opportunity to display an ad) appears in a "website", the "frontend" sends an ad request to ads inventory. Ads inventory then decides whether to show ads for advertiser X based on their remaining budget. If budget is still available, the ads inventory will make an ad insertion (i.e. an ad entry that’s embedded in a user’s app) to the frontend. After the user views the ad, an impression event is sent to the spend system.

Insertions flow

In our case, we simulate the ad requests sent by a website/frontend with a stream of generated by the JSON Data Generator via HTTP POSTs towards the inventory-service. This microservice checks if actual_spend + inflight_spend > daily_budget and if this is false, it sends a message to the ad-insertion-input kafka topic. The message looks like:

{key: adgroupId, value: inflight_spend}, where
- adgroupId = id of the group of ads under same budget constraint.
- inflight_spend = price * impression_rate * action_rate

The configuration for both impression_rate and action_rate is in inventory-service.yml For starters the selected values are global and the same for all advertisers:

0.5 (i.e 50%) for the impression rate
1 (i.e. 100%) for the action rate since we assume that all advertisers are paying by impression

A kafka-stream spend-aggregator using a tumbling window of 10 seconds acts as the "Spend aggregator" and sends the sums to the predicted-spend-output kafka topic.

A kafka consumer predicted-spend-consumer consumes the messages of the predicted-spend-output topic and updates the inflight_spend accordingly.

All budget retrieval and update actions are done via the budget-service. An H2 database is used, with a single table keeping the data for this simple scenario. The data-ownership belongs to the budget-service. In this way it's not possible to bypass API and access persistence data directly. Kafka consumers, inventory-service etc. all interact with the budget-service using Feign in order to retrieve/update data.

In order to access the in-memory database, view the schema etc. go to your H2 console. As jdbc-url, enter: jdbc:h2:mem:budget-db

You may find some sample configuration files for the json data generator within the folder streaming-workflows. Have in mind that the values in these sample files are over-simplistic. Even in a near real-world simulation scenario, multiple streams may need to be started with different eventFrequency / varyEventFrequency / varyRepeatFrequency parameters etc.

In order to start the streams, first copy the json config files to json generator conf directory:

cp streaming-workflows/* <dir>/json-data-generator-1.2.0/conf

Then, begin by starting the insertions event stream:

java -jar json-data-generator-1.3.1-SNAPSHOT.jar insertions-config.json

Impressions flow

Continue by initiating the impressions event stream:

java -jar json-data-generator-1.3.1-SNAPSHOT.jar impressions-config.json

This time the generator sends the messages directly to the impressions kafka topic. Since this topic is partitioned by adGroupId and the number of consumers is equal to the number of the partitions (in our scenario this number is: 2), we guarantee that there will not be concurent modifications of the budget of a single advertiser.

An impressions-consumer instance, consumes the messages of the impressions topic and updates the actual_spend accordingly (by calling the budget-service via Feign).

Observe the in-flight VS actual spend actuation

A single html page is used to display a "live" dual-series chart for a single advertiser. This mini Spring Boot web-app resides in the gateway module. It uses web-sockets to send periodically to the client pairs of actual spend and in-flight spend for a single advertiser (for the moment the one with adGroupId: 101).

In order to view the page navigate your browser to:

http://localhost:4000

Todo:

Enable hystrix for inventory-service and budget-service
Enable turbine with the monitoring app
Add integration tests for inventory-service and budget-service
Enable security with OAuth2
Enable the aodm-common module in order to avoid code duplicating shared DTOs among the microservices
Enrich the README.md with information about the selected kafka partitioning scheme and provide the list of known issues and limitations of the current project as-is

Feedback welcome

I try add things and improve this project during my... "free" time. I would greatly appreciate your help. Feel free to contact me with any questions/corrections and suggestions.

raultld / overdelivery-mgmt Goto Github PK