originrose / cortex Goto Github PK
View Code? Open in Web Editor NEWMachine learning in Clojure
License: Eclipse Public License 1.0
Machine learning in Clojure
License: Eclipse Public License 1.0
Hi all,
Other neural network implementations have tools that aid visualization and debugging of a network while being trained. Specifically, I've found that the Tensorboard UI (from Tensorflow ) is quite capable.
The toolkit should be capable of turning on (or off) instrumentation for different parameters, such as
To enable the same kind of functionality in Cortex, there are a few design choices:
To me, the second approach is preferred for two reasons:
I would like to hear your thoughts on this topic
Thanks!
Cortex will need a defined save format that we believe has a decent chance for forward compatibility. We want something that is very minimal but just a bit vetted because I would like to promise that from 1.0 any data created with cortex will be useable indefinitely with future versions of cortex. This is a heavy burden if it is not treated with some respect.
With following simple patch, I've managed to run http://gigasquidsoftware.com/blog/2016/12/27/deep-learning-in-clojure-with-cortex/ though I'm not sure I've done correctly. Can we add support for
backend selection in suite?
diff --git a/suite/src/cortex/suite/train.clj b/suite/src/cortex/suite/train.clj
index 3a9e9b1..63ca992 100644
--- a/suite/src/cortex/suite/train.clj
+++ b/suite/src/cortex/suite/train.clj
@@ -4,6 +4,7 @@
[think.compute.nn.train :as train]
[think.compute.nn.description :as compute-desc]
[think.compute.nn.cuda-backend :as gpu-compute]
+ [think.compute.nn.cpu-backend :as cpu-compute]
[think.resource.core :as resource]
[clojure.java.io :as io]
[think.compute.optimise :as opt]
@@ -60,6 +61,12 @@ in initial description. Else returns the initial description"
batch-size))
+(defn build-cpu-network
+ [network-description batch-size]
+ (compute-desc/build-and-create-network network-description
+ (cpu-compute/create-cpu-backend :float)
+ batch-size))
+
(defn backup-trained-network
[network-filestem]
(let [network-filename (str network-filestem ".nippy")]
@@ -105,13 +112,15 @@ we continue to train forever.
[dataset initial-description input-labels output-labels-and-loss
& {:keys [batch-size epoch-count
network-filestem best-network-fn
- optimiser loss-compare-fn]
+ optimiser loss-compare-fn
+ backend]
:or {batch-size 128
network-filestem "trained-network"
optimiser (opt/adam)
loss-compare-fn (fn [new-loss old-loss]
(< (first new-loss)
- (first old-loss)))}}]
+ (first old-loss)))
+ backend :cpu}}]
(resource/with-resource-context
(let [network-filename (str network-filestem ".nippy")
;;Backup the trained network if we haven't already
@@ -136,7 +145,9 @@ we continue to train forever.
cv-labels (mapv vec cv-labels)
best-network-atom (atom network-desc-loss-map)
network-description (:network-description network-desc-loss-map)
- network (build-gpu-network network-description batch-size)
+ network (if (= backend :cpu)
+ (build-cpu-network network-description batch-size)
+ (build-gpu-network network-description batch-size))
train-sequence (train/create-train-epoch-sequence network optimiser dataset
input-labels output-labels-and-loss)
epoch-processor (partial per-epoch-eval-training-network
@@ -154,16 +165,20 @@ we continue to train forever.
"Given a single-output network description and a dataset with the keys
:data and :labels produced set of inferences, answers, and the observations
used for both along with the original dataset."
- [dataset network-description & {:keys [batch-size batch-type input-labels output-labels]
+ [dataset network-description & {:keys [batch-size batch-type input-labels output-labels
+ backend]
:or {batch-size 128
batch-type :holdout
input-labels [:data]
- output-labels [:labels]}}]
+ output-labels [:labels]
+ backend :cpu}}]
(resource/with-resource-context
(let [[cv-input cv-labels] (ds/batch-sequence->column-groups
dataset batch-size batch-type
[input-labels output-labels])
- network (build-gpu-network network-description batch-size)
+ network (if (= backend :cpu)
+ (build-cpu-network network-description batch-size)
+ (build-gpu-network network-description batch-size))
inferences (train/run network dataset input-labels :batch-type batch-type)]
{:dataset dataset
:labels cv-labels
Subprojects cortex/dataset
and examples/dropout
still have the standard boilerplate for the license, with
Copyright © 2016 FIXME
This applies to both the README and the project.clj
files. While the license on the project files is stated as Eclipse, it'd likely look more official if the copyright wasn't the default.
https://papers.nips.cc/paper/5969-sparse-local-embeddings-for-extreme-multi-label-classification.pdf
Let's get into this, figure out what extensions we need in cortex to make it work well and make it happen.
I've observed in Keras that SpatialDropout2D
seems to have a fairly large impact on regularization effectiveness for convolutional layers with less impact on training time. I haven't dug deeply yet, but the paper describing spatial dropout can be found here. The details on Spatial Dropout are in section 3.2.
I am trying to load the pre-trained VGG16 net from keras, add two linear->relu 4096 layers and replace the input layer. When trying to build the description with a batch size of 4, I get the following error:
Exception CUDA Error: out of memory think.compute.cuda-driver/eval26130/fn--26139 (cuda_driver.clj:349)
Example code that reproduces this error for me:
(defn test-vgg
[]
(-> (concat (->> "resources/vgg16_combined.h5"
(keras/load-combined-hdf5-file)
(:model)
(drop 1)
(concat (desc/input 192 192 3)))
[(desc/linear->relu 4096)
(desc/dropout 0.7)
(desc/linear->relu 4096)
(desc/dropout 0.7)
(desc/linear->softmax 40)])
(compute-desc/build-and-create-network (cuda-backend/create-backend :float) 4)))
Full Output:
loading weights/bias for :conv1_1
Reshaping weights for :conv1_1
loading weights/bias for :conv1_2
Reshaping weights for :conv1_2
loading weights/bias for :conv2_1
Reshaping weights for :conv2_1
loading weights/bias for :conv2_2
Reshaping weights for :conv2_2
loading weights/bias for :conv3_1
Reshaping weights for :conv3_1
loading weights/bias for :conv3_2
Reshaping weights for :conv3_2
loading weights/bias for :conv3_3
Reshaping weights for :conv3_3
loading weights/bias for :conv4_1
Reshaping weights for :conv4_1
loading weights/bias for :conv4_2
Reshaping weights for :conv4_2
loading weights/bias for :conv4_3
Reshaping weights for :conv4_3
loading weights/bias for :conv5_1
Reshaping weights for :conv5_1
loading weights/bias for :conv5_2
Reshaping weights for :conv5_2
loading weights/bias for :conv5_3
Reshaping weights for :conv5_3
Using file input data
Reshaping output for: :conv1_1-activation [224 224 64] 3211264 :Activation
Reshaping output for: :conv1_2-activation [224 224 64] 3211264 :Activation
Reshaping output for: :maxpooling2d_6 [112 112 64] 802816 :MaxPooling2D
Reshaping output for: :conv2_1-activation [112 112 128] 1605632 :Activation
Reshaping output for: :conv2_2-activation [112 112 128] 1605632 :Activation
Reshaping output for: :maxpooling2d_7 [56 56 128] 401408 :MaxPooling2D
Reshaping output for: :conv3_1-activation [56 56 256] 802816 :Activation
Reshaping output for: :conv3_2-activation [56 56 256] 802816 :Activation
Reshaping output for: :conv3_3-activation [56 56 256] 802816 :Activation
Reshaping output for: :maxpooling2d_8 [28 28 256] 200704 :MaxPooling2D
Reshaping output for: :conv4_1-activation [28 28 512] 401408 :Activation
Reshaping output for: :conv4_2-activation [28 28 512] 401408 :Activation
Reshaping output for: :conv4_3-activation [28 28 512] 401408 :Activation
Reshaping output for: :maxpooling2d_9 [14 14 512] 100352 :MaxPooling2D
Reshaping output for: :conv5_1-activation [14 14 512] 100352 :Activation
Reshaping output for: :conv5_2-activation [14 14 512] 100352 :Activation
Reshaping output for: :conv5_3-activation [14 14 512] 100352 :Activation
Reshaping output for: :maxpooling2d_10 [7 7 512] 25088 :MaxPooling2D
Exception CUDA Error: out of memory think.compute.cuda-driver/eval26130/fn--26139 (cuda_driver.clj:349)
There is tension between ease of extension between backends
If things are organized such that each backend is implemented in a single file, then adding a new backend is easy; work from the code for the most similar backend and implement what's there in terms of the new backend. This has the downside that there is no one place to go to see the whole of the implementation of any given layer type.
If things are organized such that each layer is implemented in a single file, then adding a new layer is easy; work from the code of the most similar layer and implement what you need for your new layer type. This has the downside that all the code related to a specific backend is spread throughout the layer files.
In discussion we have decided to prefer (2) above. A machine learner using cortex is more likely to want to implement a new layer than a new backend. In general, we would like to move the implementation of cortex in that direction, simplifying the process of experimenting with new layer types.
At 0.5.0
:
harold@gibson:~/src/cortex$ rm -rf ~/.cortex
harold@gibson:~/src/cortex$ git checkout 00f171f665f2c2778421300d384618728dd454f3
HEAD is now at 00f171f... Release 0.5.0
harold@gibson:~/src/cortex$ cd compute
harold@gibson:~/src/cortex/compute$ time lein test think.compute.nn.train-test
[snip ...]
Ran 4 tests containing 4 assertions.
0 failures, 0 errors.
real 0m49.554s
user 1m15.276s
sys 3m3.520s
At master
harold@gibson:~/src/cortex$ rm -rf ~/.cortex/
harold@gibson:~/src/cortex$ git checkout master
Previous HEAD position was 00f171f... Release 0.5.0
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
harold@gibson:~/src/cortex$ git pull
Current branch master is up to date.
harold@gibson:~/src/cortex$ time lein test cortex.compute.nn.train-test
[snip ...]
Ran 4 tests containing 4 assertions.
0 failures, 0 errors.
real 7m50.945s
user 11m48.408s
sys 45m20.232s
Running the mnist example will fail to update the confusion matrix on the client (browser). I believe this works with 0.5.0 but I haven't specifically tracked it down.
Currently this results in dropping the last partial batch. This is not such a big deal when training but it can be a very large deal when running.
I am not sure this should be solved in the engine given that the implementation of the dataset can solve this but we have had two teams confused by this (albeit in their first day of using cortex) so either an error needs to happen or we need to pad the input but silently dropping the input, especially during inference is probably not the best answer.
Hi,
I'd like to add tests for serialization of parameter values in a trained network (the code is in cortex/suite/src/cortex/suite/train.clj ).
May I presume that a the right place to add it would be in cortex/suite/test/cortex/suite/train_test.clj? If someone else is working on adding tests for the same, I could help/collaborate.
thanks!
It seems that the repos s3p://thinktopic.jars/snapshots/ and s3p://thinktopic.jars/releases/ are password protected. This causes lein test to not work.
Retrieving thinktopic/cortex/0.2.1-SNAPSHOT/cortex-0.2.1-SNAPSHOT.pom from snapshots
Oct 13, 2016 9:33:21 PM org.jets3t.service.impl.rest.httpclient.RestS3Service performRequest
WARNING: Error Response: GET '/snapshots%2Fthinktopic%2Fcortex%2F0.2.1-SNAPSHOT%2Fcortex-0.2.1-SNAPSHOT.pom' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Date: Thu, 13 Oct 2016 19:33:20 GMT, Content-Type: , User-Agen
t: JetS3t/0.7.1 (Mac OS X/10.12; x86_64; en; JVM 1.8.0_102), Host: thinktopic.jars.s3.amazonaws.com], Response Headers: [x-amz-request-id: 25A9111D05BA29A6, x-amz-id-2: abnaUX0bYsEzNHkCGICDEaSbmkqJsUdaUUiHWZyADJYJrVmvK3B5sWYGohE+/HoedwVuV
ZwNiAM=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 13 Oct 2016 19:33:22 GMT, Server: AmazonS3]
Oct 13, 2016 9:33:21 PM org.jets3t.service.impl.rest.httpclient.RestS3Service performRequest
SEVERE: Request Failed.
org.jets3t.service.S3ServiceException: S3 Error Message. GET '/snapshots%2Fthinktopic%2Fcortex%2F0.2.1-SNAPSHOT%2Fcortex-0.2.1-SNAPSHOT.pom' on Host 'thinktopic.jars.s3.amazonaws.com' @ 'Thu, 13 Oct 2016 19:33:22 GMT' -- ResponseCode: 403
, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>25A9111D05BA29A6</RequestId><HostId>abnaUX0bYsEzNHkCGICDEaSbmkqJsUdaUUiHWZyAD
JYJrVmvK3B5sWYGohE+/HoedwVuVZwNiAM=</HostId></Error>
I am trying to build a network description using the first couple of vgg16 conv+pooling layers, loading pre-trained weights and setting learning attenuation to 0. This generates an out of memory error.
The error can be reproduced as such:
(def keras-model
(memoize
#(keras/load-sidecar-and-verify "/mnt/thinkdrive/cortex-debug/keras/vgg_configuration.json"
"/mnt/thinkdrive/cortex-debug//keras/vgg_16_weights.h5"
"/mnt/thinkdrive/cortex-debug/keras/vgg_16_output.h5")))
;; Reading from pre-trained keras model
(defn create-bottom-layers
[num-keras-layers]
(concat (desc/input image-size image-size channels)
(->> (keras-model)
(drop 1) ;;drop input layer
(take num-keras-layers))))
(defn create-top-layers
[num-classes]
[(desc/dropout 0.5)
(desc/linear->relu 128)
(desc/linear->softmax num-classes)])
;; Combining FC layers with lower level pre-trained keras layers
(defn create-complete-description
[num-keras-layers num-classes]
(let [bottom-layers (->> (create-bottom-layers num-keras-layers)
(mapv #(assoc % :learning-attenuation 0.0)))]
(concat bottom-layers (create-top-layers num-classes))))
If you call (create-complete-description 5 2), this will give you a portion of the vgg-net description that contains a set of conv (x2) + pooling layers.
Almost there, @charlesg3 had a few more things he wanted to add.
I'm getting this error and UnsatisfiedLinkError: no jnihdf5, no jnicuda when trying to use Cortex. I'm on a mac with cudann 8.0 installed.
I see the comment in the Read be but believe cudann is installed correctly as it works fine with Keras/Theano/TensorFlow.
I see in #24 there is a mention of another branch with a different dependency but that branch does not seem to be available any longer.
Is there anything I can do to see if my installation is in fact correct? What was the difference in that "cuda-8.0" branch?
Thanks.
Julio
This has bothered me for some time and there isn't too much I can do about it but here we go:
The runtime dependency on the cuda libraries is not ideal the way it is structured.
What people have done for many years with opengl is they bind to the actual shared library dynamically. They then look for the symbols they need in the shared library and those symbols along with the version of opengl detected (with an API call from the library) then dictates their path forward. They dynamically switch rendering paths depending on the feature set available in opengl and often times the specific hardware features available on the card.
Because the binding is dynamic, the program will start start of opengl isn't present but will exit with a nice error message. Also, because the binding is dynamic and they search for specific symbols in the shared library they can have one wrapper library that binds to several versions of opengl and it just exposes the symbols it finds.
This is the ideal situation. Currently in cortex for instance you have the change the project.clj in order to bind to a different version of cuda despite the fact that we aren't using any new features in that version and thus from a dynamic linking perspective this is unnecessary. This is a completely unnecessary incidental complexity that will come back to bite at some point.
The right answer here is to use an intermediate library that can do dynamic loading across the different platforms and find the symbols. You then set global pointers to the symbol value if it is found or not if it is not found (see gl wrangler: http://glew.sourceforge.net/).
Then we at least allow the program to decide if cuda is a necessary dependency and furthermore if particular versions of cuda (and cudnn, npp, cublas) are necessary dependencies
What is stopping me from going there is a proper cross platform build system where I can build a library for at least linux, mac, and windows. That and the time required to actually do this.
There may be a solution in the dynamic linking facilities now present in Java but that path needs to be researched. To do this with javacpp we would need to build a small wrapper library that did the dynamic binding to the shared libraries and the symbols in the shared libraries.
In any case, a best-in-class CUDA development system would not have this issue. I suspect the same type of issue would be present should we decide to put effort into opencl.
In talking with @charlesg3 there may be an opportunity here for take-while
and a user-provided function that could run extra tests, do think.peer
things or re-implement the old train-until-error-stabilizes
. Interested to hear other's thoughts here.
@cnuernber --- Fast-path marshalling code for this exists, but is apparently not wired up.
Noting this here, since this can make cortex appear a lot slower than it is. 😄
If you have a saved network that was trained using 0.3*, you can use this function to import it to 0.5* for inference or further training/fine-tuning.
(defn import-older-models
[network]
(when-let [network-desc (:network-description network)]
(-> (mapv (fn [{:keys [type] :as layer}]
(cond
(= type :convolutional) (assoc layer :dimension-op (layers/default-layer-type-dimension-op :convolutional))
(= type :max-pooling) (assoc layer :dimension-op (layers/default-layer-type-dimension-op :max-pooling))
:else layer))
network-desc)
network/build-network
traverse/auto-bind-io)))
This incorrect description causes the process to crash during training and serialization
RT.java: 1464 clojure.lang.RT/uncheckedLongCast
convolution.cljc: 66 cortex.nn.impl.layers.convolution$get_padded_strided_dimension/invokeStatic
convolution.cljc: 63 cortex.nn.impl.layers.convolution$get_padded_strided_dimension/invoke
description.clj: 239 cortex.nn.description/eval25237/fn
MultiFn.java: 233 clojure.lang.MultiFn/invoke
description.clj: 140 cortex.nn.description/recurse-build-desc/fn
protocols.clj: 167 clojure.core.protocols/fn
protocols.clj: 124 clojure.core.protocols/fn
protocols.clj: 19 clojure.core.protocols/fn/G
protocols.clj: 31 clojure.core.protocols/seq-reduce
protocols.clj: 75 clojure.core.protocols/fn
protocols.clj: 75 clojure.core.protocols/fn
protocols.clj: 13 clojure.core.protocols/fn/G
core.clj: 6545 clojure.core/reduce
core.clj: 6527 clojure.core/reduce
description.clj: 138 cortex.nn.description/recurse-build-desc
description.clj: 136 cortex.nn.description/recurse-build-desc
description.clj: 338 cortex.nn.description/build-full-network-description
description.clj: 333 cortex.nn.description/build-full-network-description
description.clj: 111 think.compute.nn.description/build-and-create-network
description.clj: 109 think.compute.nn.description/build-and-create-network
inference.clj: 22 cortex.suite.inference/infer-n-observations/fn
AFn.java: 152 clojure.lang.AFn/applyToHelper
AFn.java: 144 clojure.lang.AFn/applyTo
core.clj: 646 clojure.core/apply
core.clj: 1881 clojure.core/with-bindings*
core.clj: 1881 clojure.core/with-bindings*
RestFn.java: 425 clojure.lang.RestFn/invoke
inference.clj: 12 cortex.suite.inference/infer-n-observations
inference.clj: 10 cortex.suite.inference/infer-n-observations
RestFn.java: 529 clojure.lang.RestFn/invoke
inference.clj: 37 cortex.suite.inference/classify-one-observation
inference.clj: 32 cortex.suite.inference/classify-one-observation
RestFn.java: 500 clojure.lang.RestFn/invoke
classifier.clj: 137 image-type-classifier.classifier/label-one
classifier.clj: 129 image-type-classifier.classifier/label-one
REPL: 13 image-type-classifier.classifier/eval48777
REPL: 13 image-type-classifier.classifier/eval48777
Compiler.java: 6927 clojure.lang.Compiler/eval
Compiler.java: 6890 clojure.lang.Compiler/eval
core.clj: 3105 clojure.core/eval
core.clj: 3101 clojure.core/eval
main.clj: 240 clojure.main/repl/read-eval-print/fn
main.clj: 240 clojure.main/repl/read-eval-print
main.clj: 258 clojure.main/repl/fn
main.clj: 258 clojure.main/repl
main.clj: 174 clojure.main/repl
RestFn.java: 1523 clojure.lang.RestFn/invoke
interruptible_eval.clj: 87 clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
AFn.java: 152 clojure.lang.AFn/applyToHelper
AFn.java: 144 clojure.lang.AFn/applyTo
core.clj: 646 clojure.core/apply
core.clj: 1881 clojure.core/with-bindings*
core.clj: 1881 clojure.core/with-bindings*
RestFn.java: 425 clojure.lang.RestFn/invoke
interruptible_eval.clj: 85 clojure.tools.nrepl.middleware.interruptible-eval/evaluate
interruptible_eval.clj: 55 clojure.tools.nrepl.middleware.interruptible-eval/evaluate
interruptible_eval.clj: 222 clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
interruptible_eval.clj: 190 clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn
AFn.java: 22 clojure.lang.AFn/run
ThreadPoolExecutor.java: 1142 java.util.concurrent.ThreadPoolExecutor/runWorker
ThreadPoolExecutor.java: 617 java.util.concurrent.ThreadPoolExecutor$Worker/run
Thread.java: 745 java.lang.Thread/run
From this google group post: https://groups.google.com/forum/#!topic/clojure-cortex/AtE-kAQCO8Y
We could have some examples of using concat (and the other layer graph operations) as tests with some .md
descriptions in the docs referencing the tests.
The tests could go into experiment, or perhaps into the train tests (or layer tests? but maybe that's too low level).
@charlesg3 @cnuernber --- relevant to your interests.
Is there any planned support for automatic differentiation? I took CS224D (Deep Learning/NLP) at Stanford and this seemed to be a given in any framework discussions we had on TensorFlow vs Theano vs. etc.
Make a document explaining the output of execute/train
/ compute-binding/save-to-network
.
Explain:
Why do these functions return both a network and an optimizer?
What is the structure of the returned network (it's a map with ... keys, etc...)
Maybe: Which graph
functions might be useful to call on a saved network? (e.g., for stripping layers, fine-tuning, etc...)
I was training a network which consisted of VGG16 with two 4096 FC layers at the end and feeding the network 192x192 images and got the following error:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fbf69cd0d50, pid=30098, tid=0x00007fbf4f
cfc700
#
# JRE version: OpenJDK Runtime Environment (8.0_102-b14) (build 1.8.0_
102-8u102-b14.1-2-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.102-b14 mixed mode linux-amd64
compressed oops)
# Problematic frame:
# C [libcuda.so.1+0x1a8d50]
#
# Core dump written. Default location: /home/charles/src/think.cars/co
re or core.30098
#
# An error report file with more information is saved as:
# /home/charles/src/think.cars/hs_err_pid30098.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
The intent of this issue is to characterize the problem while leaving the choice of implementation strategies wide open. If the problem is characterized as to arbitrarily narrow the choice of implementations then it is mis-characterized.
We would like to upgrade the time it takes to put data onto the gpu and pull it off. Plus we would like a set of standard automatic augmentations that can be performed ideally inline with loading the image (crop, flip, translate, scale, rotate, potentially color space transformation). Inline means during training and not a preprocess step; we would like our networks to never see the exact same image twice during training.
Most gpu-based neural networks tend to not get to full utilization of the GPU because at least in part getting the data to the GPU and off of the GPU effectively throttles the training/inference.
Because we have a few people working in networks that are doing image analysis and it seems this will continue for the near future, it would be good to invest some time building out tools and a system to use to do this.
Setting some baselines, assume 10,000 images of 256 by 256. An output of 1000 float/double numbers.
If we can get inline loading of images (meaning if we do not need to write out a specialized file) working on a normal compute we can get through 10,000 images fast enough we should be able to avoid writing out a specialized file. So the first step is can you load 10,000 images in under like 10 seconds on the cpu. Ideally under 5 because we would also like to apply some elementary operations in order to augment datasets so having another 5 seconds to apply random loss-invariant transformations would be ideal.
You could also write these images into a memory mapped file (of bytes or floats) and load the file but there is a solid chance that opencv implements the ideal transformations in considerably less time than we can implement them in java but there is also a chance that is false.
The worst scenario is to write out a binary file post-transformations. This means that our nets could potentially learn the specific transformations which we certainly do not want.
Then can we need to shuffle data onto the GPU in into a coalesced buffer for a batch size of say 64-128. Then a similar system to shuffle the 1000 doubles off the gpu to the cpu with same batch size, and perform some analysis on those vectors (like generate loss/softmax accuracy, etc).
Auto-generated api documentation would be great, and would be a good seed for the cortex.ml site.
Cortex deals with a lot of complex datastructures that we could use some help with. I would like to start a discussion on some reasonable strategy to make these datastructures a bit less opaque which means at the very least an informal schema with some definitions. If we go there then why not use spec and think through the datastructures and their implications a bit?
This is marked for 1.0 because this process will help greatly with the longevity of the software.
(posted similar message at uncomplicate/neanderthal#21)
Sulong (https://github.com/graalvm/sulong) will run on unmodified JVM in Java 9. It is the basis for native extension interop for JRuby/Truffle. Full paper at http://ssw.jku.at/General/Staff/ManuelRigger/VMIL16.pdf
They make several claims regarding their FFI: 1) 0-overhead calls into native, 2) inlining of native calls. This is in addition to their primary claim of interpreting LLVM bitcode at seeds comparable to gcc o3.
I'm curious what people think for for use cases like Cortex. It seems like a path to get progressive optimization down to the fastest achievable speeds.
The basic idea would be to develop a Truffle interpreter for Cortex primitives, starting with matrix computation but building up to the differentiable graph. The interpreter could specialize down into whatever combination of computation engines are available. Down to generating LLVM code (check out their impl of the LLVM AST, its pretty clean) and/or calling into native stuff.
The readme says make sure to run cortex/local-install.sh before running an example
, but that script doesn't exist in the repo.
Has this script been replaced by just lein install
? If yes, we could update the readme. If not, we could dig it up from an older commit.
Models at rest on disk should have a version key so that we can detect forward/backward compatibility issues automatically.
@cnuernber @charlesg3 --- relevant to your interests.
Maybe cortex-experiment is a better name?
A machine learner working with cortex reads about a loss function in a paper and wants to experiment with something similar.
A document and/or example explaining how to do this with cortex would be super-useful.
For example, this loss function:
From page 4 of the yolo paper. What is the correct way to do something like this in cortex?
That is just a single example, feel free to add others here.
In a colossal bonehead move instead of deleting a branch I deleted the cortex repository. This means we lost our issues.
If anyone has work in a branch please re-instate the branch and I am sorry. We did lose our issues permanently and a lot of good papers along with some discussion.
See reference paper and Keras implementation.
Nesterov Adam should converge faster and behave better than the basic Adam optimizer for many cases.
As Adam is sufficient for most training, this is lower priority for now.
I'm training a net now and it has a certain loss score which is saved with the net. This loss score is from a randomly generated holdout dataset.
When I start a new training session, presumably a new holdout dataset is created, perhaps giving a higher base error score. Even though the net improves its score with the new holdout, the improved net is never saved since the score may never reach that of the old holdout dataset.
It might be a good idea to generate a new high score when there is a new holdout dataset, or have a way to explicitly reset the high score.
Maybe a smarter shell script alone is good enough?
When trying to run (net/run network data)
I get the following error. This seems like I am using the wrong type, however network
in the code is the result of a (compute-desc/build-and-create-network ...)
call.
expected: nil
actual: java.lang.IllegalArgumentException: No implementation of method: :calc of protocol: #'cortex.nn.protocols/PModule found for class: t
hink.compute.nn.layers.LayerList
at clojure.core$_cache_protocol_fn.invokeStatic (core_deftype.clj:568)
clojure.core$_cache_protocol_fn.invoke (core_deftype.clj:560)
cortex.nn.protocols$eval20813$fn__20825$G__20802__20832.invoke (protocols.cljc:9)
cortex.nn.core$calc.invokeStatic (core.cljc:22)
cortex.nn.core$calc.invoke (core.cljc:19)
cortex.nn.network$run$fn__41321.invoke (network.cljc:18)
clojure.core$mapv$fn__6953.invoke (core.clj:6627)
clojure.lang.PersistentVector.reduce (PersistentVector.java:341)
clojure.core$reduce.invokeStatic (core.clj:6544)
clojure.core$mapv.invokeStatic (core.clj:6618)
clojure.core$mapv.invoke (core.clj:6618)
cortex.nn.network$run.invokeStatic (network.cljc:17)
cortex.nn.network$run.invoke (network.cljc:15)
think.cars.counting$scene_patches__GT_image_features.invokeStatic (counting.clj:111)
think.cars.counting$scene_patches__GT_image_features.invoke (counting.clj:108)
think.cars.counting$scene_patch__GT_image_features.invokeStatic (counting.clj:120)
think.cars.counting$scene_patch__GT_image_features.invoke (counting.clj:117)
think.scene_test$fn__36106.invokeStatic (scene_test.clj:18)
think.scene_test/fn (scene_test.clj:14)
clojure.test$test_var$fn__7983.invoke (test.clj:716)
clojure.test$test_var.invokeStatic (test.clj:716)
clojure.test$test_var.invoke (test.clj:707)
think.scene_test$generate_image_features.invokeStatic (scene_test.clj:14)
think.scene_test$generate_image_features.invoke (scene_test.clj:14)
think.scene_test$eval43539.invokeStatic (form-init2125960026649907142.clj:1)
Note that I have also tried to accomplish the same task using the datasets interface, however if I set the training-split
to be 0.0 (as this is a dataset for "running", I don't want any samples to be used for training), I get the following error:
ERROR in (generate-image-features) (cuda_backend.clj:540)
Uncaught exception, not in assertion.
expected: nil
actual: java.lang.Exception: Cudnn error: CUDNN_STATUS_MAPPING_ERROR
at think.compute.nn.cuda_backend$eval28597$fn__28598.invoke (cuda_backend.clj:540)
think.compute.nn.backend$eval27359$fn__27360$G__27348__27373.invoke (backend.clj:157)
think.compute.nn.layers.Convolutional.calc (layers.clj:252)
cortex.nn.protocols$eval21429$fn__21434.invoke (protocols.cljc:161)
cortex.nn.protocols$eval21338$fn__21411$G__21321__21418.invoke (protocols.cljc:139)
think.compute.nn.layers.LayerList/fn (layers.clj:453)
think.compute.nn.layers$layer_list_forward$fn__27971.invoke (layers.clj:426)
clojure.lang.PersistentVector.reduce (PersistentVector.java:341)
clojure.core$reduce.invokeStatic (core.clj:6544)
clojure.core$reduce.invoke (core.clj:6527)
think.compute.nn.layers$layer_list_forward.invokeStatic (layers.clj:425)
think.compute.nn.layers$layer_list_forward.invoke (layers.clj:421)
think.compute.nn.layers.LayerList.multi_calc (layers.clj:452)
think.compute.nn.train$run_config$fn__31846.invoke (train.clj:68)
clojure.core.protocols$fn__6755.invokeStatic (protocols.clj:167)
clojure.core.protocols/fn (protocols.clj:124)
clojure.core.protocols$fn__6710$G__6705__6719.invoke (protocols.clj:19)
clojure.core.protocols$seq_reduce.invokeStatic (protocols.clj:31)
clojure.core.protocols$fn__6738.invokeStatic (protocols.clj:75)
clojure.core.protocols/fn (protocols.clj:75)
clojure.core.protocols$fn__6684$G__6679__6697.invoke (protocols.clj:13)
clojure.core$reduce.invokeStatic (core.clj:6545)
clojure.core$reduce.invoke (core.clj:6527)
think.compute.nn.train$run_config.invokeStatic (train.clj:66)
think.compute.nn.train$run_config.invoke (train.clj:62)
think.compute.nn.train$run_and_reshape.invokeStatic (train.clj:94)
Clearly I'm doing something wrong, which looks like I'm giving the wrong types to the train/run
and net/run
commands, but I'm not sure how exactly to fix this.
Need someone to get into this, test it, see where it works and where it does not.
Hi all,
While executing "lein run" to try the classification example, the imported ns cortex/suite/src/cortex/suite/classification.clj throws the following error:
...
Caused by: java.io.FileNotFoundException: Could not locate think/compute/nn/cuda_backend__init.class or think/compute/nn/cuda_backend.clj on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.
I do not have a GPU on the machine.
Is there a way to switch between the gpu-compute/cuda-backend and the cpu-backend, without changing the dependencies/imports ? Possibly a cpu/gpu profile can be defined in the project.clj.
thanks!
I believe we've tested Cortex setup on Ubuntu 14.04, 16.04, and 16.10. It would be nice to get input and documentation on configuration steps. In addition, I believe @rosejn has gotten Cortex working on an older Nvidia Mac, relevant to an issue raised by @gigasquid on the mailing list.
The older cortex-gpu
project used to contain detailed instruction steps for getting Cortex configured to run on the GPU. The reason this went away is, fortunately, that the setup has been greatly simplified compared to where it was. Even though instructions are simpler now, it would be nice to have these steps documented. I know @charlesg3 has recently setup both an AWS configuration and a local 16.10 ubuntu desktop.
If anyone wants to point me at resources anyone's used or document steps they've taken here, I can also just take a stab at consolidating everything.
We should have a HACKING.md
file at the root that explains how to get things up and running locally for development. Mostly this would explain how to lein install
and depend on SNAPSHOT versions for libraries of interest, but could also grow to include things like extension points for new layers/optimizers, etc...
cf. http://cider.readthedocs.io/en/latest/hacking_on_cider/
Note also: #137
Just a minor one.
The README has a reference to run local-install.sh, but the file was deleted a while ago...
I am not sure how to replicate the file so that it works with the current state of the project.
We use far too much memory during classification and this actually has nothing to do with cortex proper but with the display of the confusion matrix.
The issue is that the confusion matrix stores images and not paths or something like that. Thus it ends up with the entire cross-validation dataset held in memory which is untenable for larger datasets.
In general I think that it is unnecessary to display the entire dataset for the confusion matrix or at least it isn't necessary to show more than like 10 or 20 examples per entry.
In any case, this causes current projects to sometimes run into OOM exceptions so setting up some testing environment for this and working with it a bit to minimize this would help.
When running train.sh
in in the suite-classification project I get this:
suite-classification[master] % ./train.sh
rm resources/public/css/app.css
rm: resources/public/css/app.css: No such file or directory
make: *** [resources/public/css/app.css] Error 1
Error: Unable to access jarfile target/classify-example.jar
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.