originrose / cortex Goto Github PK

View Code? Open in Web Editor NEW

1.3K 1.3K 111.0 5.13 MB

Machine learning in Clojure

License: Eclipse Public License 1.0

Java 0.23% Clojure 93.46% Cuda 1.98% Shell 0.27% C++ 2.82% Python 1.19% Makefile 0.06%

cortex's People

Contributors

Stargazers

Watchers

Forkers

sourceops mozinator stoeckley paultopia kovasb gnethercutt danielribeiro benjamesbabala murraju wavelets deno2 gigasquid anujsrc ddossot makemefriendanshu jormis ertugrulcetin smartmachinelearning kenbier shark8me sanatgersappa grammati lahorichargha shaaza rowhit almostimplemented patrickaroo 30s chptx plumpmath siglik psavine42 ml-lab mathpunk rthadani atroche neo4reo xsyn hellonico extremenelson thinhptran maxlorenz vandanphadke owengalenjones davidurpani eistre91 yofaces dalenicholson tony824 stellaywu homesearcher fvides maxbendick tgetgood sdave2 magemasher phosphene mogenslund cpwnd bmorphism 17ai madhuri5279 u0xee rarous clustersdata skottk iboss-ptk wentaoxuusst djzurawski ape-box wanderfalke kieranbrowne bsuhagia deepakjangid123 skullgoblet1089 ksseono chengguobiao pnf afcarl zhengyuv raxod502 antoniogarciag teresy laujensen ahaaaaaaa janetacarr jclavijo clojure-land stjordanis ntestoc3 lihebi hiteck88 dhk465 opyate puchka neojiangtw ubermenschjo devopstoday11 danieltanfh95 xingkongcwb

cortex's Issues

[Enhancement -design question] Visualizing parameter updates as training progresses

Hi all,

Other neural network implementations have tools that aid visualization and debugging of a network while being trained. Specifically, I've found that the Tensorboard UI (from Tensorflow ) is quite capable.

The toolkit should be capable of turning on (or off) instrumentation for different parameters, such as

gradient for weights
raw weight values
different evaluation metrics (e.g. F-measure or accuracy) on train and/or validation sets.

To enable the same kind of functionality in Cortex, there are a few design choices:

Create a built-in visualization tool that can display important metrics
Log event data (to file or other output streams) in a standard format. Downstream consumers can then transform and visualize this in any third party tool.

To me, the second approach is preferred for two reasons:

it decouples the core library from the visualization requirements. (additionally, training on image/video/text datasets will require visualizing actual image or video samples as training progresses)
it can leverage existing toolkits (such as Tensorboard)

I would like to hear your thoughts on this topic

Thanks!

Defined save-file-format

Cortex will need a defined save format that we believe has a decent chance for forward compatibility. We want something that is very minimal but just a bit vetted because I would like to promise that from 1.0 any data created with cortex will be useable indefinitely with future versions of cortex. This is a heavy burden if it is not treated with some respect.

In the cortex.suite, can we add support for cpu backend in training?

With following simple patch, I've managed to run http://gigasquidsoftware.com/blog/2016/12/27/deep-learning-in-clojure-with-cortex/ though I'm not sure I've done correctly. Can we add support for
backend selection in suite?

diff --git a/suite/src/cortex/suite/train.clj b/suite/src/cortex/suite/train.clj
index 3a9e9b1..63ca992 100644
--- a/suite/src/cortex/suite/train.clj
+++ b/suite/src/cortex/suite/train.clj
@@ -4,6 +4,7 @@
             [think.compute.nn.train :as train]
             [think.compute.nn.description :as compute-desc]
             [think.compute.nn.cuda-backend :as gpu-compute]
+            [think.compute.nn.cpu-backend :as cpu-compute]
             [think.resource.core :as resource]
             [clojure.java.io :as io]
             [think.compute.optimise :as opt]
@@ -60,6 +61,12 @@ in initial description.  Else returns the initial description"
                                          batch-size))
 
 
+(defn build-cpu-network
+  [network-description batch-size]
+  (compute-desc/build-and-create-network network-description
+                                         (cpu-compute/create-cpu-backend :float)
+                                         batch-size))
+
 (defn backup-trained-network
   [network-filestem]
   (let [network-filename (str network-filestem ".nippy")]
@@ -105,13 +112,15 @@ we continue to train forever.
   [dataset initial-description input-labels output-labels-and-loss
    & {:keys [batch-size epoch-count
              network-filestem best-network-fn
-             optimiser loss-compare-fn]
+             optimiser loss-compare-fn
+             backend]
       :or {batch-size 128
            network-filestem "trained-network"
            optimiser (opt/adam)
            loss-compare-fn (fn [new-loss old-loss]
                              (< (first new-loss)
-                                (first old-loss)))}}]
+                                (first old-loss)))
+           backend :cpu}}]
   (resource/with-resource-context
     (let [network-filename (str network-filestem ".nippy")
           ;;Backup the trained network if we haven't already
@@ -136,7 +145,9 @@ we continue to train forever.
           cv-labels (mapv vec cv-labels)
           best-network-atom (atom network-desc-loss-map)
           network-description (:network-description network-desc-loss-map)
-          network (build-gpu-network network-description batch-size)
+          network (if (= backend :cpu)
+                    (build-cpu-network network-description batch-size)
+                    (build-gpu-network network-description batch-size))
           train-sequence (train/create-train-epoch-sequence network optimiser dataset
                                                             input-labels output-labels-and-loss)
           epoch-processor (partial per-epoch-eval-training-network
@@ -154,16 +165,20 @@ we continue to train forever.
   "Given a single-output network description and a dataset with the keys
 :data and :labels produced set of inferences, answers, and the observations
 used for both along with the original dataset."
-  [dataset network-description & {:keys [batch-size batch-type input-labels output-labels]
+  [dataset network-description & {:keys [batch-size batch-type input-labels output-labels
+                                         backend]
                                   :or {batch-size 128
                                        batch-type :holdout
                                        input-labels [:data]
-                                       output-labels [:labels]}}]
+                                       output-labels [:labels]
+                                       backend :cpu}}]
   (resource/with-resource-context
     (let [[cv-input cv-labels] (ds/batch-sequence->column-groups
                                 dataset batch-size batch-type
                                 [input-labels output-labels])
-          network (build-gpu-network network-description batch-size)
+          network (if (= backend :cpu)
+                    (build-cpu-network network-description batch-size)
+                    (build-gpu-network network-description batch-size))
           inferences (train/run network dataset input-labels :batch-type batch-type)]
       {:dataset dataset
        :labels cv-labels

Fix license/copyright on cortex/dataset and examples/dropout

Subprojects cortex/dataset and examples/dropout still have the standard boilerplate for the license, with

Copyright © 2016 FIXME

This applies to both the README and the project.clj files. While the license on the project files is stated as Eclipse, it'd likely look more official if the copyright wasn't the default.

multi label fancy loss

https://papers.nips.cc/paper/5969-sparse-local-embeddings-for-extreme-multi-label-classification.pdf

Let's get into this, figure out what extensions we need in cortex to make it work well and make it happen.

Readme link to design doc is broken

Spatial Dropout

I've observed in Keras that SpatialDropout2D seems to have a fairly large impact on regularization effectiveness for convolutional layers with less impact on training time. I haven't dug deeply yet, but the paper describing spatial dropout can be found here. The details on Spatial Dropout are in section 3.2.

OOM while building VGG-like net

I am trying to load the pre-trained VGG16 net from keras, add two linear->relu 4096 layers and replace the input layer. When trying to build the description with a batch size of 4, I get the following error:

Exception CUDA Error: out of memory think.compute.cuda-driver/eval26130/fn--26139 (cuda_driver.clj:349)

Example code that reproduces this error for me:

(defn test-vgg
  []
  (-> (concat (->> "resources/vgg16_combined.h5"
                   (keras/load-combined-hdf5-file)
                   (:model)
                   (drop 1)
                   (concat (desc/input 192 192 3)))
              [(desc/linear->relu 4096)
               (desc/dropout 0.7)
               (desc/linear->relu 4096)
               (desc/dropout 0.7)
               (desc/linear->softmax 40)])
      (compute-desc/build-and-create-network (cuda-backend/create-backend :float) 4)))

Full Output:

loading weights/bias for :conv1_1
Reshaping weights for :conv1_1
loading weights/bias for :conv1_2
Reshaping weights for :conv1_2
loading weights/bias for :conv2_1
Reshaping weights for :conv2_1
loading weights/bias for :conv2_2
Reshaping weights for :conv2_2
loading weights/bias for :conv3_1
Reshaping weights for :conv3_1
loading weights/bias for :conv3_2
Reshaping weights for :conv3_2
loading weights/bias for :conv3_3
Reshaping weights for :conv3_3
loading weights/bias for :conv4_1
Reshaping weights for :conv4_1
loading weights/bias for :conv4_2
Reshaping weights for :conv4_2
loading weights/bias for :conv4_3
Reshaping weights for :conv4_3
loading weights/bias for :conv5_1
Reshaping weights for :conv5_1
loading weights/bias for :conv5_2
Reshaping weights for :conv5_2
loading weights/bias for :conv5_3
Reshaping weights for :conv5_3
Using file input data
Reshaping output for: :conv1_1-activation [224 224 64] 3211264 :Activation
Reshaping output for: :conv1_2-activation [224 224 64] 3211264 :Activation
Reshaping output for: :maxpooling2d_6 [112 112 64] 802816 :MaxPooling2D
Reshaping output for: :conv2_1-activation [112 112 128] 1605632 :Activation
Reshaping output for: :conv2_2-activation [112 112 128] 1605632 :Activation
Reshaping output for: :maxpooling2d_7 [56 56 128] 401408 :MaxPooling2D
Reshaping output for: :conv3_1-activation [56 56 256] 802816 :Activation
Reshaping output for: :conv3_2-activation [56 56 256] 802816 :Activation
Reshaping output for: :conv3_3-activation [56 56 256] 802816 :Activation
Reshaping output for: :maxpooling2d_8 [28 28 256] 200704 :MaxPooling2D
Reshaping output for: :conv4_1-activation [28 28 512] 401408 :Activation
Reshaping output for: :conv4_2-activation [28 28 512] 401408 :Activation
Reshaping output for: :conv4_3-activation [28 28 512] 401408 :Activation
Reshaping output for: :maxpooling2d_9 [14 14 512] 100352 :MaxPooling2D
Reshaping output for: :conv5_1-activation [14 14 512] 100352 :Activation
Reshaping output for: :conv5_2-activation [14 14 512] 100352 :Activation
Reshaping output for: :conv5_3-activation [14 14 512] 100352 :Activation
Reshaping output for: :maxpooling2d_10 [7 7 512] 25088 :MaxPooling2D

Exception CUDA Error: out of memory  think.compute.cuda-driver/eval26130/fn--26139 (cuda_driver.clj:349)

Make adding a new layer easier.

There is tension between ease of extension between backends ↔️ layers.

If things are organized such that each backend is implemented in a single file, then adding a new backend is easy; work from the code for the most similar backend and implement what's there in terms of the new backend. This has the downside that there is no one place to go to see the whole of the implementation of any given layer type.
If things are organized such that each layer is implemented in a single file, then adding a new layer is easy; work from the code of the most similar layer and implement what you need for your new layer type. This has the downside that all the code related to a specific backend is spread throughout the layer files.

In discussion we have decided to prefer (2) above. A machine learner using cortex is more likely to want to implement a new layer than a new backend. In general, we would like to move the implementation of cortex in that direction, simplifying the process of experimenting with new layer types.

Training on master takes longer than 0.5.0

At 0.5.0:

harold@gibson:~/src/cortex$ rm -rf ~/.cortex
harold@gibson:~/src/cortex$ git checkout 00f171f665f2c2778421300d384618728dd454f3
HEAD is now at 00f171f... Release 0.5.0
harold@gibson:~/src/cortex$ cd compute
harold@gibson:~/src/cortex/compute$ time lein test think.compute.nn.train-test

  [snip ...]

Ran 4 tests containing 4 assertions.
0 failures, 0 errors.

real	0m49.554s
user	1m15.276s
sys	3m3.520s

At master

harold@gibson:~/src/cortex$ rm -rf ~/.cortex/
harold@gibson:~/src/cortex$ git checkout master
Previous HEAD position was 00f171f... Release 0.5.0
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
harold@gibson:~/src/cortex$ git pull
Current branch master is up to date.
harold@gibson:~/src/cortex$ time lein test cortex.compute.nn.train-test

  [snip ...]

Ran 4 tests containing 4 assertions.
0 failures, 0 errors.

real	7m50.945s
user	11m48.408s
sys	45m20.232s

Confusion matrix in mnist example fails to update with cortex 0.9

Running the mnist example will fail to update the confusion matrix on the client (browser). I believe this works with 0.5.0 but I haven't specifically tracked it down.

Confusion when dataset item count isn't evenly divisible by batch number

Currently this results in dropping the last partial batch. This is not such a big deal when training but it can be a very large deal when running.

I am not sure this should be solved in the engine given that the implementation of the dataset can solve this but we have had two teams confused by this (albeit in their first day of using cortex) so either an error needs to happen or we need to pad the input but silently dropping the input, especially during inference is probably not the best answer.

@harold @charlesg3 @CalderBot

tests for serialization of parameters

Hi,

I'd like to add tests for serialization of parameter values in a trained network (the code is in cortex/suite/src/cortex/suite/train.clj ).
May I presume that a the right place to add it would be in cortex/suite/test/cortex/suite/train_test.clj? If someone else is working on adding tests for the same, I could help/collaborate.

thanks!

Unable to access s3p://thinktopic.jars/snapshots/

It seems that the repos s3p://thinktopic.jars/snapshots/ and s3p://thinktopic.jars/releases/ are password protected. This causes lein test to not work.

Retrieving thinktopic/cortex/0.2.1-SNAPSHOT/cortex-0.2.1-SNAPSHOT.pom from snapshots
Oct 13, 2016 9:33:21 PM org.jets3t.service.impl.rest.httpclient.RestS3Service performRequest
WARNING: Error Response: GET '/snapshots%2Fthinktopic%2Fcortex%2F0.2.1-SNAPSHOT%2Fcortex-0.2.1-SNAPSHOT.pom' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Date: Thu, 13 Oct 2016 19:33:20 GMT, Content-Type: , User-Agen
t: JetS3t/0.7.1 (Mac OS X/10.12; x86_64; en; JVM 1.8.0_102), Host: thinktopic.jars.s3.amazonaws.com], Response Headers: [x-amz-request-id: 25A9111D05BA29A6, x-amz-id-2: abnaUX0bYsEzNHkCGICDEaSbmkqJsUdaUUiHWZyADJYJrVmvK3B5sWYGohE+/HoedwVuV
ZwNiAM=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 13 Oct 2016 19:33:22 GMT, Server: AmazonS3]
Oct 13, 2016 9:33:21 PM org.jets3t.service.impl.rest.httpclient.RestS3Service performRequest
SEVERE: Request Failed.
org.jets3t.service.S3ServiceException: S3 Error Message. GET '/snapshots%2Fthinktopic%2Fcortex%2F0.2.1-SNAPSHOT%2Fcortex-0.2.1-SNAPSHOT.pom' on Host 'thinktopic.jars.s3.amazonaws.com' @ 'Thu, 13 Oct 2016 19:33:22 GMT' -- ResponseCode: 403
, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>25A9111D05BA29A6</RequestId><HostId>abnaUX0bYsEzNHkCGICDEaSbmkqJsUdaUUiHWZyAD
JYJrVmvK3B5sWYGohE+/HoedwVuVZwNiAM=</HostId></Error>

CUDA Out of Memory Error when building description using first pair of vgg16 conv layers and weights

I am trying to build a network description using the first couple of vgg16 conv+pooling layers, loading pre-trained weights and setting learning attenuation to 0. This generates an out of memory error.

The error can be reproduced as such:

(def keras-model
(memoize
#(keras/load-sidecar-and-verify "/mnt/thinkdrive/cortex-debug/keras/vgg_configuration.json"
"/mnt/thinkdrive/cortex-debug//keras/vgg_16_weights.h5"
"/mnt/thinkdrive/cortex-debug/keras/vgg_16_output.h5")))

;; Reading from pre-trained keras model
(defn create-bottom-layers
[num-keras-layers]
(concat (desc/input image-size image-size channels)
(->> (keras-model)
(drop 1) ;;drop input layer
(take num-keras-layers))))

(defn create-top-layers
[num-classes]
[(desc/dropout 0.5)
(desc/linear->relu 128)
(desc/linear->softmax num-classes)])

;; Combining FC layers with lower level pre-trained keras layers
(defn create-complete-description
[num-keras-layers num-classes]
(let [bottom-layers (->> (create-bottom-layers num-keras-layers)
(mapv #(assoc % :learning-attenuation 0.0)))]
(concat bottom-layers (create-top-layers num-classes))))

If you call (create-complete-description 5 2), this will give you a portion of the vgg-net description that contains a set of conv (x2) + pooling layers.

Docker support

#143

Almost there, @charlesg3 had a few more things he wanted to add.

UnsatisfiedLinkError: no hdf5_cpp in java.library.path

I'm getting this error and UnsatisfiedLinkError: no jnihdf5, no jnicuda when trying to use Cortex. I'm on a mac with cudann 8.0 installed.

I see the comment in the Read be but believe cudann is installed correctly as it works fine with Keras/Theano/TensorFlow.

I see in #24 there is a mention of another branch with a different dependency but that branch does not seem to be available any longer.

Is there anything I can do to see if my installation is in fact correct? What was the difference in that "cuda-8.0" branch?

Thanks.

Julio

Cortex projects are bound to a specific cuda library version

This has bothered me for some time and there isn't too much I can do about it but here we go:

The runtime dependency on the cuda libraries is not ideal the way it is structured.

If the user does not have cuda installed the entire system now fails.
If the user does not have cudnn installed the entire system now fails.
If the user has a newer (older) version of cuda installed the entire system fails at startup. Regardless of the fact that we aren't using and blisteringly new features of cuda.

What people have done for many years with opengl is they bind to the actual shared library dynamically. They then look for the symbols they need in the shared library and those symbols along with the version of opengl detected (with an API call from the library) then dictates their path forward. They dynamically switch rendering paths depending on the feature set available in opengl and often times the specific hardware features available on the card.

Because the binding is dynamic, the program will start start of opengl isn't present but will exit with a nice error message. Also, because the binding is dynamic and they search for specific symbols in the shared library they can have one wrapper library that binds to several versions of opengl and it just exposes the symbols it finds.

This is the ideal situation. Currently in cortex for instance you have the change the project.clj in order to bind to a different version of cuda despite the fact that we aren't using any new features in that version and thus from a dynamic linking perspective this is unnecessary. This is a completely unnecessary incidental complexity that will come back to bite at some point.

The right answer here is to use an intermediate library that can do dynamic loading across the different platforms and find the symbols. You then set global pointers to the symbol value if it is found or not if it is not found (see gl wrangler: http://glew.sourceforge.net/).

Then we at least allow the program to decide if cuda is a necessary dependency and furthermore if particular versions of cuda (and cudnn, npp, cublas) are necessary dependencies
What is stopping me from going there is a proper cross platform build system where I can build a library for at least linux, mac, and windows. That and the time required to actually do this.

There may be a solution in the dynamic linking facilities now present in Java but that path needs to be researched. To do this with javacpp we would need to build a small wrapper library that did the dynamic binding to the shared libraries and the symbols in the shared libraries.

In any case, a best-in-class CUDA development system would not have this issue. I suspect the same type of issue would be present should we decide to put effort into opencl.

suite/train - more per-epoch functional user overrides

https://github.com/thinktopic/cortex/blob/f6951075a83076bec11e6520202e913c0541423a/suite/src/cortex/suite/train.clj#L176

In talking with @charlesg3 there may be an opportunity here for take-while and a user-provided function that could run extra tests, do think.peer things or re-implement the old train-until-error-stabilizes. Interested to hear other's thoughts here.

Running a net with `float-array` inputs when the context's datatype is `:double` sadly hits a slow path

@cnuernber --- Fast-path marshalling code for this exists, but is apparently not wired up.

Noting this here, since this can make cortex appear a lot slower than it is. 😄

Importing old networks trained using cortex 0.3* to 0.5*

If you have a saved network that was trained using 0.3*, you can use this function to import it to 0.5* for inference or further training/fine-tuning.

(defn import-older-models
  [network]
  (when-let [network-desc (:network-description network)]
    (-> (mapv (fn [{:keys [type] :as layer}]
                (cond
                  (= type :convolutional) (assoc layer :dimension-op (layers/default-layer-type-dimension-op :convolutional))
                  (= type :max-pooling) (assoc layer :dimension-op (layers/default-layer-type-dimension-op :max-pooling))
                  :else layer))
              network-desc)
      network/build-network
      traverse/auto-bind-io)))

Using gaussian dropout creates an incorrect network description

This incorrect description causes the process to crash during training and serialization

                   RT.java: 1464  clojure.lang.RT/uncheckedLongCast
          convolution.cljc:   66  cortex.nn.impl.layers.convolution$get_padded_strided_dimension/invokeStatic
          convolution.cljc:   63  cortex.nn.impl.layers.convolution$get_padded_strided_dimension/invoke
           description.clj:  239  cortex.nn.description/eval25237/fn
              MultiFn.java:  233  clojure.lang.MultiFn/invoke
           description.clj:  140  cortex.nn.description/recurse-build-desc/fn
             protocols.clj:  167  clojure.core.protocols/fn
             protocols.clj:  124  clojure.core.protocols/fn
             protocols.clj:   19  clojure.core.protocols/fn/G
             protocols.clj:   31  clojure.core.protocols/seq-reduce
             protocols.clj:   75  clojure.core.protocols/fn
             protocols.clj:   75  clojure.core.protocols/fn
             protocols.clj:   13  clojure.core.protocols/fn/G
                  core.clj: 6545  clojure.core/reduce
                  core.clj: 6527  clojure.core/reduce
           description.clj:  138  cortex.nn.description/recurse-build-desc
           description.clj:  136  cortex.nn.description/recurse-build-desc
           description.clj:  338  cortex.nn.description/build-full-network-description
           description.clj:  333  cortex.nn.description/build-full-network-description
           description.clj:  111  think.compute.nn.description/build-and-create-network
           description.clj:  109  think.compute.nn.description/build-and-create-network
             inference.clj:   22  cortex.suite.inference/infer-n-observations/fn
                  AFn.java:  152  clojure.lang.AFn/applyToHelper
                  AFn.java:  144  clojure.lang.AFn/applyTo
                  core.clj:  646  clojure.core/apply
                  core.clj: 1881  clojure.core/with-bindings*
                  core.clj: 1881  clojure.core/with-bindings*
               RestFn.java:  425  clojure.lang.RestFn/invoke
             inference.clj:   12  cortex.suite.inference/infer-n-observations
             inference.clj:   10  cortex.suite.inference/infer-n-observations
               RestFn.java:  529  clojure.lang.RestFn/invoke
             inference.clj:   37  cortex.suite.inference/classify-one-observation
             inference.clj:   32  cortex.suite.inference/classify-one-observation
               RestFn.java:  500  clojure.lang.RestFn/invoke
            classifier.clj:  137  image-type-classifier.classifier/label-one
            classifier.clj:  129  image-type-classifier.classifier/label-one
                      REPL:   13  image-type-classifier.classifier/eval48777
                      REPL:   13  image-type-classifier.classifier/eval48777
             Compiler.java: 6927  clojure.lang.Compiler/eval
             Compiler.java: 6890  clojure.lang.Compiler/eval
                  core.clj: 3105  clojure.core/eval
                  core.clj: 3101  clojure.core/eval
                  main.clj:  240  clojure.main/repl/read-eval-print/fn
                  main.clj:  240  clojure.main/repl/read-eval-print
                  main.clj:  258  clojure.main/repl/fn
                  main.clj:  258  clojure.main/repl
                  main.clj:  174  clojure.main/repl
               RestFn.java: 1523  clojure.lang.RestFn/invoke
    interruptible_eval.clj:   87  clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
                  AFn.java:  152  clojure.lang.AFn/applyToHelper
                  AFn.java:  144  clojure.lang.AFn/applyTo
                  core.clj:  646  clojure.core/apply
                  core.clj: 1881  clojure.core/with-bindings*
                  core.clj: 1881  clojure.core/with-bindings*
               RestFn.java:  425  clojure.lang.RestFn/invoke
    interruptible_eval.clj:   85  clojure.tools.nrepl.middleware.interruptible-eval/evaluate
    interruptible_eval.clj:   55  clojure.tools.nrepl.middleware.interruptible-eval/evaluate
    interruptible_eval.clj:  222  clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
    interruptible_eval.clj:  190  clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn
                  AFn.java:   22  clojure.lang.AFn/run
   ThreadPoolExecutor.java: 1142  java.util.concurrent.ThreadPoolExecutor/runWorker
   ThreadPoolExecutor.java:  617  java.util.concurrent.ThreadPoolExecutor$Worker/run
               Thread.java:  745  java.lang.Thread/run

`concat` layer example

From this google group post: https://groups.google.com/forum/#!topic/clojure-cortex/AtE-kAQCO8Y

We could have some examples of using concat (and the other layer graph operations) as tests with some .md descriptions in the docs referencing the tests.

The tests could go into experiment, or perhaps into the train tests (or layer tests? but maybe that's too low level).

@charlesg3 @cnuernber --- relevant to your interests.

Is there any documentation available for Cortex?

GPU optimization meditation

https://arxiv.org/abs/1404.5997

automatic differentiation?

Is there any planned support for automatic differentiation? I took CS224D (Deep Learning/NLP) at Stanford and this seemed to be a given in any framework discussions we had on TensorFlow vs Theano vs. etc.

.

[documentation] Saved network format

Make a document explaining the output of execute/train / compute-binding/save-to-network.

Explain:
Why do these functions return both a network and an optimizer?

What is the structure of the returned network (it's a map with ... keys, etc...)

Maybe: Which graph functions might be useful to call on a saved network? (e.g., for stripping layers, fine-tuning, etc...)

SEGV while training on VGG net

I was training a network which consisted of VGG16 with two 4096 FC layers at the end and feeding the network 192x192 images and got the following error:

#                                                                                                                                             
# A fatal error has been detected by the Java Runtime Environment:                                                                            
#                                                                                                                                             
#  SIGSEGV (0xb) at pc=0x00007fbf69cd0d50, pid=30098, tid=0x00007fbf4f                                                                        
cfc700                                                                                                                                        
#                                                                                                                                             
# JRE version: OpenJDK Runtime Environment (8.0_102-b14) (build 1.8.0_                                                                        
102-8u102-b14.1-2-b14)                                                                                                                        
# Java VM: OpenJDK 64-Bit Server VM (25.102-b14 mixed mode linux-amd64                                                                        
 compressed oops)                                                                                                                             
# Problematic frame:                                                                                                                          
# C  [libcuda.so.1+0x1a8d50]                                                                                                                  
#                                                                                                                                             
# Core dump written. Default location: /home/charles/src/think.cars/co                                                                        
re or core.30098                                                                                                                              
#                                                                                                                                             
# An error report file with more information is saved as:                                                                                     
# /home/charles/src/think.cars/hs_err_pid30098.log                                                                                            
#                                                                                                                                             
# If you would like to submit a bug report, please visit:                                                                                     
#   http://bugreport.java.com/bugreport/crash.jsp                                                                                             
# The crash happened outside the Java Virtual Machine in native code.                                                                         
# See problematic frame for where to report the bug.

Research feeding a net data and pulling data off the net for inference.

The intent of this issue is to characterize the problem while leaving the choice of implementation strategies wide open. If the problem is characterized as to arbitrarily narrow the choice of implementations then it is mis-characterized.

We would like to upgrade the time it takes to put data onto the gpu and pull it off. Plus we would like a set of standard automatic augmentations that can be performed ideally inline with loading the image (crop, flip, translate, scale, rotate, potentially color space transformation). Inline means during training and not a preprocess step; we would like our networks to never see the exact same image twice during training.

Most gpu-based neural networks tend to not get to full utilization of the GPU because at least in part getting the data to the GPU and off of the GPU effectively throttles the training/inference.

Because we have a few people working in networks that are doing image analysis and it seems this will continue for the near future, it would be good to invest some time building out tools and a system to use to do this.

Setting some baselines, assume 10,000 images of 256 by 256. An output of 1000 float/double numbers.

If we can get inline loading of images (meaning if we do not need to write out a specialized file) working on a normal compute we can get through 10,000 images fast enough we should be able to avoid writing out a specialized file. So the first step is can you load 10,000 images in under like 10 seconds on the cpu. Ideally under 5 because we would also like to apply some elementary operations in order to augment datasets so having another 5 seconds to apply random loss-invariant transformations would be ideal.

You could also write these images into a memory mapped file (of bytes or floats) and load the file but there is a solid chance that opencv implements the ideal transformations in considerably less time than we can implement them in java but there is also a chance that is false.

The worst scenario is to write out a binary file post-transformations. This means that our nets could potentially learn the specific transformations which we certainly do not want.

Then can we need to shuffle data onto the GPU in into a coalesced buffer for a batch size of say 64-128. Then a similar system to shuffle the 1000 doubles off the gpu to the cpu with same batch size, and perform some analysis on those vectors (like generate loss/softmax accuracy, etc).

[documentation] Setup Codox with Travis and publish to cortex.ml

Auto-generated api documentation would be great, and would be a good seed for the cortex.ml site.

Clojure 1.9 support?

Cortex deals with a lot of complex datastructures that we could use some help with. I would like to start a discussion on some reasonable strategy to make these datastructures a bit less opaque which means at the very least an informal schema with some definitions. If we go there then why not use spec and think through the datastructures and their implications a bit?

This is marked for 1.0 because this process will help greatly with the longevity of the software.

Sulong for native interop?

(posted similar message at uncomplicate/neanderthal#21)

Sulong (https://github.com/graalvm/sulong) will run on unmodified JVM in Java 9. It is the basis for native extension interop for JRuby/Truffle. Full paper at http://ssw.jku.at/General/Staff/ManuelRigger/VMIL16.pdf

They make several claims regarding their FFI: 1) 0-overhead calls into native, 2) inlining of native calls. This is in addition to their primary claim of interpreting LLVM bitcode at seeds comparable to gcc o3.

I'm curious what people think for for use cases like Cortex. It seems like a path to get progressive optimization down to the fastest achievable speeds.

The basic idea would be to develop a Truffle interpreter for Cortex primitives, starting with matrix computation but building up to the differentiable graph. The interpreter could specialize down into whatever combination of computation engines are available. Down to generating LLVM code (check out their impl of the LLVM AST, its pretty clean) and/or calling into native stuff.

cortex/local-install.sh is not in the repo

The readme says make sure to run cortex/local-install.sh before running an example, but that script doesn't exist in the repo.

Has this script been replaced by just lein install? If yes, we could update the readme. If not, we could dig it up from an older commit.

versioned models

Models at rest on disk should have a version key so that we can detect forward/backward compatibility issues automatically.

@cnuernber @charlesg3 --- relevant to your interests.

Is `experiment` too general of a name?

Maybe cortex-experiment is a better name?

https://github.com/thinktopic/cortex/pull/141/files/242793ae249606c2f708e20baff15e607b273934#diff-e5046d343c37d85d4d736af438324a36R2

[example/documentation] Experimenting with new loss functions

A machine learner working with cortex reads about a loss function in a paper and wants to experiment with something similar.

A document and/or example explaining how to do this with cortex would be super-useful.

For example, this loss function:

From page 4 of the yolo paper. What is the correct way to do something like this in cortex?

That is just a single example, feel free to add others here.

Mistakenly deleted repository

@mikera, @rosejn

In a colossal bonehead move instead of deleting a branch I deleted the cortex repository. This means we lost our issues.

If anyone has work in a branch please re-instate the branch and I am sorry. We did lose our issues permanently and a lot of good papers along with some discussion.

Adam with Nesterov Momentum

See reference paper and Keras implementation.

Nesterov Adam should converge faster and behave better than the basic Adam optimizer for many cases.

As Adam is sufficient for most training, this is lower priority for now.

Resetting loss "high score"?

I'm training a net now and it has a certain loss score which is saved with the net. This loss score is from a randomly generated holdout dataset.

When I start a new training session, presumably a new holdout dataset is created, perhaps giving a higher base error score. Even though the net improves its score with the new holdout, the improved net is never saved since the score may never reach that of the old holdout dataset.

It might be a good idea to generate a new high score when there is a new holdout dataset, or have a way to explicitly reset the high score.

Is the Makefile in the mnist example necessary?

https://github.com/thinktopic/cortex/pull/141/files/242793ae249606c2f708e20baff15e607b273934#diff-84e11a2455b6087ed768a14fa211e4e5R84

Maybe a smarter shell script alone is good enough?

Error trying to run a trained network

When trying to run (net/run network data) I get the following error. This seems like I am using the wrong type, however network in the code is the result of a (compute-desc/build-and-create-network ...) call.

expected: nil                                                                                                                                 
  actual: java.lang.IllegalArgumentException: No implementation of method: :calc of protocol: #'cortex.nn.protocols/PModule found for class: t
hink.compute.nn.layers.LayerList                                                                                                              
 at clojure.core$_cache_protocol_fn.invokeStatic (core_deftype.clj:568)                                                                       
    clojure.core$_cache_protocol_fn.invoke (core_deftype.clj:560)                                                                             
    cortex.nn.protocols$eval20813$fn__20825$G__20802__20832.invoke (protocols.cljc:9)                                                         
    cortex.nn.core$calc.invokeStatic (core.cljc:22)                                                                                           
    cortex.nn.core$calc.invoke (core.cljc:19)                                                                                                 
    cortex.nn.network$run$fn__41321.invoke (network.cljc:18)                                                                                  
    clojure.core$mapv$fn__6953.invoke (core.clj:6627)                                                                                         
    clojure.lang.PersistentVector.reduce (PersistentVector.java:341)                                                                          
    clojure.core$reduce.invokeStatic (core.clj:6544)                                                                                          
    clojure.core$mapv.invokeStatic (core.clj:6618)                                                                                            
    clojure.core$mapv.invoke (core.clj:6618)                                                                                                  
    cortex.nn.network$run.invokeStatic (network.cljc:17)                                                                                      
    cortex.nn.network$run.invoke (network.cljc:15)                                                                                            
    think.cars.counting$scene_patches__GT_image_features.invokeStatic (counting.clj:111)                                                      
    think.cars.counting$scene_patches__GT_image_features.invoke (counting.clj:108)                                                            
    think.cars.counting$scene_patch__GT_image_features.invokeStatic (counting.clj:120)                                                        
    think.cars.counting$scene_patch__GT_image_features.invoke (counting.clj:117)                                                              
    think.scene_test$fn__36106.invokeStatic (scene_test.clj:18)                                                                               
    think.scene_test/fn (scene_test.clj:14)                                                                                                   
    clojure.test$test_var$fn__7983.invoke (test.clj:716)                                                                                      
    clojure.test$test_var.invokeStatic (test.clj:716)                                                                                         
    clojure.test$test_var.invoke (test.clj:707)                                                                                               
    think.scene_test$generate_image_features.invokeStatic (scene_test.clj:14)                                                                 
    think.scene_test$generate_image_features.invoke (scene_test.clj:14)                                                                       
    think.scene_test$eval43539.invokeStatic (form-init2125960026649907142.clj:1)

Note that I have also tried to accomplish the same task using the datasets interface, however if I set the training-split to be 0.0 (as this is a dataset for "running", I don't want any samples to be used for training), I get the following error:

ERROR in (generate-image-features) (cuda_backend.clj:540)                                                                                     
Uncaught exception, not in assertion.                                                                                                         
expected: nil                                                                                                                                 
  actual: java.lang.Exception: Cudnn error: CUDNN_STATUS_MAPPING_ERROR                                                                        
 at think.compute.nn.cuda_backend$eval28597$fn__28598.invoke (cuda_backend.clj:540)                                                           
    think.compute.nn.backend$eval27359$fn__27360$G__27348__27373.invoke (backend.clj:157)                                                     
    think.compute.nn.layers.Convolutional.calc (layers.clj:252)                                                                               
    cortex.nn.protocols$eval21429$fn__21434.invoke (protocols.cljc:161)                                                                       
    cortex.nn.protocols$eval21338$fn__21411$G__21321__21418.invoke (protocols.cljc:139)                                                       
    think.compute.nn.layers.LayerList/fn (layers.clj:453)                                                                                     
    think.compute.nn.layers$layer_list_forward$fn__27971.invoke (layers.clj:426)                                                              
    clojure.lang.PersistentVector.reduce (PersistentVector.java:341)                                                                          
    clojure.core$reduce.invokeStatic (core.clj:6544)                                                                                          
    clojure.core$reduce.invoke (core.clj:6527)                                                                                                
    think.compute.nn.layers$layer_list_forward.invokeStatic (layers.clj:425)                                                                  
    think.compute.nn.layers$layer_list_forward.invoke (layers.clj:421)                                                                        
    think.compute.nn.layers.LayerList.multi_calc (layers.clj:452)                                                                             
    think.compute.nn.train$run_config$fn__31846.invoke (train.clj:68)                                                                         
    clojure.core.protocols$fn__6755.invokeStatic (protocols.clj:167)                                                                          
    clojure.core.protocols/fn (protocols.clj:124)                                                                                             
    clojure.core.protocols$fn__6710$G__6705__6719.invoke (protocols.clj:19)                                                                   
    clojure.core.protocols$seq_reduce.invokeStatic (protocols.clj:31)                                                                         
    clojure.core.protocols$fn__6738.invokeStatic (protocols.clj:75)                                                                           
    clojure.core.protocols/fn (protocols.clj:75)                                                                                              
    clojure.core.protocols$fn__6684$G__6679__6697.invoke (protocols.clj:13)                                                                   
    clojure.core$reduce.invokeStatic (core.clj:6545)                                                                                          
    clojure.core$reduce.invoke (core.clj:6527)                                                                                                
    think.compute.nn.train$run_config.invokeStatic (train.clj:66)                                                                             
    think.compute.nn.train$run_config.invoke (train.clj:62)                                                                                   
    think.compute.nn.train$run_and_reshape.invokeStatic (train.clj:94)

Clearly I'm doing something wrong, which looks like I'm giving the wrong types to the train/run and net/run commands, but I'm not sure how exactly to fix this.

Weight normalization upgrade.

https://papers.nips.cc/paper/6114-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks.pdf

Need someone to get into this, test it, see where it works and where it does not.

no GPU on machine-can't get cortex/examples/suite-classification to run

Hi all,

While executing "lein run" to try the classification example, the imported ns cortex/suite/src/cortex/suite/classification.clj throws the following error:

...
Caused by: java.io.FileNotFoundException: Could not locate think/compute/nn/cuda_backend__init.class or think/compute/nn/cuda_backend.clj on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.

I do not have a GPU on the machine.
Is there a way to switch between the gpu-compute/cuda-backend and the cpu-backend, without changing the dependencies/imports ? Possibly a cpu/gpu profile can be defined in the project.clj.

thanks!

Document GPU setup for Cortex for multiple environments

I believe we've tested Cortex setup on Ubuntu 14.04, 16.04, and 16.10. It would be nice to get input and documentation on configuration steps. In addition, I believe @rosejn has gotten Cortex working on an older Nvidia Mac, relevant to an issue raised by @gigasquid on the mailing list.

The older cortex-gpu project used to contain detailed instruction steps for getting Cortex configured to run on the GPU. The reason this went away is, fortunately, that the setup has been greatly simplified compared to where it was. Even though instructions are simpler now, it would be nice to have these steps documented. I know @charlesg3 has recently setup both an AWS configuration and a local 16.10 ubuntu desktop.

If anyone wants to point me at resources anyone's used or document steps they've taken here, I can also just take a stab at consolidating everything.

HACKING.md

We should have a HACKING.md file at the root that explains how to get things up and running locally for development. Mostly this would explain how to lein install and depend on SNAPSHOT versions for libraries of interest, but could also grow to include things like extension points for new layers/optimizers, etc...

cf. http://cider.readthedocs.io/en/latest/hacking_on_cider/

Note also: #137

"Existing Framework Comparisons" link broken

Just a minor one.

local-install.sh file not found

The README has a reference to run local-install.sh, but the file was deleted a while ago...
I am not sure how to replicate the file so that it works with the current state of the project.

Memory usage issues (OOM exceptions) with classification confusion matrix design

We use far too much memory during classification and this actually has nothing to do with cortex proper but with the display of the confusion matrix.

The issue is that the confusion matrix stores images and not paths or something like that. Thus it ends up with the entire cross-validation dataset held in memory which is untenable for larger datasets.

In general I think that it is unnecessary to display the entire dataset for the confusion matrix or at least it isn't necessary to show more than like 10 or 20 examples per entry.

In any case, this causes current projects to sometimes run into OOM exceptions so setting up some testing environment for this and working with it a bit to minimize this would help.

Make file fails on fresh install and MNIST run

When running train.sh in in the suite-classification project I get this:

suite-classification[master] % ./train.sh
rm resources/public/css/app.css
rm: resources/public/css/app.css: No such file or directory
make: *** [resources/public/css/app.css] Error 1
Error: Unable to access jarfile target/classify-example.jar