Giter Club home page Giter Club logo

clojurex-2018's Introduction

Learning how to design automatically updating AI with Apache Kafka and Deeplearning4J.

This is the codebase that made up the bulk of development to support my talk at Strata Conference London 2018. It comes as is and should be classed as a work in progress.

Requirements

In order to run this system you require the following components:

  • Kafka (I'm using the supplied Zookeeper distribution).
  • MySQL
  • Leiningen (for building the Clojure projects)
  • Maven (for building the Java projects)
  • Java 1.8 (parts of the system use Clojure 1.9 but there's no need to download this, Leiningen takes care of that for you)

Breakdown of the project

The project uses a mixture of Java (for the model creation) and Clojure (for Kafka Streams and the HTTP API). The directory structure of this project is broken down as follows:

config - Kafka Connect configurations: one for the event_topic persistance and the other to save the training data ready for model training.

crontab - Scheduled jobs for model creation.

db - A schema for the MySQL database which holds information on the training and model accuracy and another table to hold the slope/intercept of a simple linear regression model.

messages - Simple JSON messages for the cronjob to send to the event stream.

projects - The main bulk of the coding is here in four projects: Model builds (dlj4.mlp), Kafka Streaming applications (kafka.stream.events and kafka.stream.prediction) and a very basic HTTP API (prediction.http.api)

scripts - Shell scripts to create the required Kafka topics, environment variables and event trigger for the cron job.

slides - Slides from the talk will be added once the talk has taken place on 24th May.

General order of build

Before you start please change the username/password values to a user on your MySQL database.

  1. Create a directory to store persisted events.
  2. Create a directory to save training data.
  3. Create a directory for the generated models.
  4. Start Zookeeper and Kafka
  5. Run the create-topics.sh to create the required topics.
  6. Start Kafka Connect and add the two Connect configurations.
  7. Run lein uberjar on the Kafka Streaming apps and the HTTP API.
  8. Run mvn package to create the model builders.
  9. Run the streaming applications:
PROFILE=local java -jar path/to/jar/kafka-stream-events.jar

and

PROFILE=local java -jar path/to/jar/kafka-stream-prediction.jar

Event Messages

There are two types of events: training events and build events.

{"type":"command", "payload":"build_mlp"}

and

{"type":"training", "payload":"3,4,5,6"}

Training events are based on a four column piece of CSV data. If you want to accomodate others then you will need to modify the model builds and the streaming applications. Right now there's nothing dynamic but I'd like to work on that in the future.

Send all events to the event_topic and they will be persisted and processed by the streaming app.

Predictions

Once models are built you can make predictions through Kafka by sending a message to the prediction_request_topic and watching the results come back through the prediction_response_topic.

JSON payloads look like this:

{"model":"mlp", "payload":"3,4,5"}

Note the fourth column is missing compared to the training data, this is the class you are predicting. The Kafka Streaming app takes care of the parsing and preparation to make a prediction.

Crontab

There are two flavors of cronjobs, the first is a direct call to the executable to create the model. Alternatively, and more desirable, is to send an event to the event_topic and let the stream pick up the event and process it. This means the event stream is preserved via Kafka Connect and can be replayed.

Notes

This is a work in progress to prove out my thoughts for the Strata and ClojureX talks.

There are plenty of improvements that could be done and I'll go through those in my talk..... so no emails just yet :)

clojurex-2018's People

Contributors

jasebell avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.