hollance / forge Goto Github PK

A neural network toolkit for Metal

License: MIT License

Swift 92.43% Objective-C 0.46% Metal 7.11%

metal deep-learning deep-neural-networks neural-network ios swift mobilenets machine-learning

forge's Introduction

Forge: a neural network toolkit for Metal

⚠️ ⚠️ ⚠️ IMPORTANT: I'm no longer maintaining Forge. It uses an older version of the MPSCNN API that is no longer supported by Apple. I also feel that Core ML has largely taken away the need for a library like this. However, neural networks implemented in Metal are still faster than Core ML. If you're looking for very fast implementations of MobileNet V1, MobileNet V2, and SSD for iOS and macOS, check out my new source code library.

Forge is a collection of helper code that makes it a little easier to construct deep neural networks using Apple's MPSCNN framework.

Read the blog post

What does this do?

Features of Forge:

Conversion functions. MPSCNN uses MPSImages and MTLTextures for everything, often using 16-bit floats. But you probably want to work with Swift [Float] arrays. Forge's conversion functions make it easy to work with Metal images and textures.

Easy layer creation. Reduce the boilerplate when building the layers for your neural network. Forge's domain-specific language makes defining a neural net as simple as:

let input = Input()

let output = input
        --> Resize(width: 28, height: 28)
        --> Convolution(kernel: (5, 5), channels: 20, activation: relu, name: "conv1")
        --> MaxPooling(kernel: (2, 2), stride: (2, 2))
        --> Convolution(kernel: (5, 5), channels: 50, activation: relu, name: "conv2")
        --> MaxPooling(kernel: (2, 2), stride: (2, 2))
        --> Dense(neurons: 320, activation: relu, name: "fc1")
        --> Dense(neurons: 10, name: "fc2")
        --> Softmax()

let model = Model(input: input, output: output)

Custom layers. MPSCNN only supports a limited number of layers, so we've added a few of our own:

Depth-wise convolution
Transpose channels
Deconvolution (coming soon!)

Preprocessing kernels. Often you need to preprocess data before it goes into the neural network. Forge comes with a few handy kernels for this:

SubtractMeanColor
RGB2Gray
RGB2BGR

Custom compute kernels. Many neural networks require custom compute kernels, so Forge provides helpers that make it easy to write and launch your own kernels.

Debugging tools. When you implement a neural network in Metal you want to make sure it actually computes the correct thing. Due to the way Metal encodes the data, inspecting the contents of the MTLTexture objects is not always straightforward. Forge can help with this.

Example projects. Forge comes with a number of pretrained neural networks, such as LeNet-5 on MNIST, Inception3 on ImageNet, and MobileNets.

Note: A lot of the code in this library is still experimental and subject to change. Use at your own risk!

iOS 10 and iOS 11 compatibility

Forge supports both iOS 10 and iOS 11.

Forge must be compiled with Xcode 9 and the iOS 11 SDK. (An older version is available under the tag xcode8, but is no longer supported.)

Important changes:

The order of the weights for DepthwiseConvolution layers has changed. It used to be:

[kernelHeight][kernelWidth][channels]

now it is:

[channels][kernelHeight][kernelWidth]

This was done to make this layer compatible with MPS's new depthwise convolution. On iOS 10, Forge will use its own DepthwiseConvolutionKernel, on iOS 11 and later is uses the MPS version (MPSCNNDepthWiseConvolutionDescriptor).

Note: Forge does not yet take advantage of all the MPS improvements in iOS 11, such as the ability to load batch normalization parameters or loading weights via data sources. This functionality will be added in a future version.

Run the examples!

To see a demo of Forge in action, open Forge.xcworkspace in Xcode and run one of the example apps on your device.

You need at least Xcode 9 and a device with an A8 processor (iPhone 6 or better) running iOS 10 or later. You cannot build for the simulator as it does not support Metal.

The included examples are:

MNIST

This example implements a very basic LeNet5-type neural network, trained on the MNIST dataset for handwritten digit recognition.

Run the app and point the camera at a handwritten digit (there are some images in the Test Images folder you can use for this) and the app will tell you what digit it is, and how sure it is of this prediction.

The small image in the top-left corner shows what the network sees (this is the output of the preprocessing shader that attempts to increase the contrast between black and white).

There are two targets in this project:

MNIST
MNIST-DSL

They do the exact same thing, except the first one is written with straight MPSCNN code and the second one uses the Forge DSL and is therefore much easier to read.

Inception-v3

Google's famous Inception network for image classification. Point your phone at some object and the app will give you its top-5 predictions:

The Inception example app is based on Apple's sample code but completely rewritten using the DSL. We use their learned parameters. Thanks, Apple!

YOLO

YOLO is an object detection network. It can detect multiple objects in an image and will even tell you where they are!

The example app implements the Tiny YOLO network, which is not as accurate as the full version of YOLO9000 and can detect only 20 different kinds of objects.

YOLO9000: Better, Faster, Stronger by Joseph Redmon and Ali Farhadi (2016).

MobileNets

The MobileNets example app is an implementation of the network architecture from the paper MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.

It works like Inception-v3 but is much faster. On the iPhone 6s it runs at 20 FPS with only moderate-to-high energy usage.

Forge uses the pretrained weights from shicai/MobileNet-Caffe.

How to add Forge to your own project

Use Xcode 9 or better.

Copy the Forge folder into your project.
Use File > Add Files to "YourProject" > Forge.xcodeproj to add the Forge project inside your own project.
Drag Products/Forge.framework into the Embedded Binaries section of your project settings.
import Forge in your code.

NOTE: You cannot build for the simulator, only for "Generic iOS Device" or an actual device with arm64 architecture.

How to use Forge

Where are the unit tests?

Run the ForgeTests app on a device.

The reason the tests are in a separate app is that Metal does not work on the simulator and Xcode can't run logic tests on the device. Catch-22.

TODO

Forge is under active development. Here is the list of bugs and upcoming features.

License and credits

forge's People

Contributors

Stargazers

Watchers

Forkers

tlewisii jstart wanjinchang benjamesbabala yak0xff chagge alvincrisuy ifelsego adamnemecek ofirbb alphabikram ozgurshn johndpope dacson nicolewang nandotwang craigomac joeferrucci darkerk bujiandi fisher158163 yaoq justinjing grigoryshushakov zententacles liam-i zmoon111 dreadlord1984 rae89 3a4ot ferasos caicai0525 baiyancheng20 luckymore0520 pythagoraskitty cityleaf muharremokutan cdwat faisal-w ahuang1900 alldev0825 soledad89 tombao2007 tony32769 shiuh-yaw kenji-go-go-go ricardopereira-codeshelf taggerone syx528911137 521314 luos9 allonli sfellner bibiteix whycoding126 marisawilsonqa rbrovko connyhakansson harrychav tognos jgabriellima rayliu2015 rrawther boosting sunatthegilddotcom strategist922 fangaohz rosssong geokal kwccoin bad-present guokr1991 walter1218 eweill vancentvan lwqbrell mrlzla image-amazing vade yidian7 vistarsoft wulio edwardburgin prayog04 shasthojoy tonychouzju lgyhero xhqglorry11 lilohuang devssh rtejo-urp undercontroller andrewzhucc fabian7593 waitingkuo pymia tanglaoya321 leispeed aaron-szt huanwang1995

forge's Issues

Use Forge with deployment target < 10.3

Hi,

I'm trying to add Forge to my app (deployment target = 9.0).

I get this compiling error :

YOLO.swift:2:8: Module file's minimum deployment target is ios10.3 v10.3: Forge.framework/Modules/Forge.swiftmodule/arm64.swiftmodule

I've tried to set the Framework as Optional and to do the last part of this page but I still get the same error.

I can't change the target of my app.

Any help would be appreciated :)

App Store?

Any thoughts on submitting any of the demos as to the app store? I think offering a ready to go mobile version of inception-v3 / yolo / mobilenets would be awesome.

code signing blocked mmap()

I get the following error message when I'm running the app on my iphone:

dyld: Library not loaded: @rpath/Forge.framework/Forge
  Referenced from: /var/containers/Bundle/Application/xxxxxxxxxxxxxxx/Inception.app/Inception
  Reason: no suitable image found.  Did find:
	/private/var/containers/Bundle/Application/xxxxxxxxxxxxxxx/Inception.app/Frameworks/Forge.framework/Forge: code signing blocked mmap() of '/private/var/containers/Bundle/Application/xxxxxxxxxxxxxx/Inception.app/Frameworks/Forge.framework/Forge'
(lldb)

also a warning:

CodeSign /Users/adamszendrei/Library/Developer/Xcode/DerivedData/Forge-gikysnjirgrtmefxffiekblqjkzy/Build/Products/Debug-iphoneos/Inception.app/Frameworks/Forge.framework
    cd /Users/adamszendrei/ObjDetect/Forge/Examples/Inception
    export CODESIGN_ALLOCATE=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/codesign_allocate
    export PATH="/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin:/Applications/Xcode.app/Contents/Developer/usr/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"
    
Signing Identity:     "iPhone Developer: Adam Szendrei (762xxxxxxxx)"

    /usr/bin/codesign --force --sign 6xxxxxx --preserve-metadata=identifier,entitlements,flags --timestamp=none /Users/adamszendrei/Library/Developer/Xcode/DerivedData/Forge-gikysnjirgrtmefxffiekblqjkzy/Build/Products/Debug-iphoneos/Inception.app/Frameworks/Forge.framework

Warning: unable to build chain to self-signed root for signer "iPhone Developer: Adam Szendrei (762xxxxxxxx)"

do you have an idea how can I fix it?

No such module MetalPerformanceShaders

I have this error issue. Can you have a look? thanks

Slicing and Assigning Tensor forge

Hi!

Is there anyway to slice a tensor in forge? Or use an array as the input tensor values?

Thank you so much!

Regarding offset for picking values for bounding box values

Hello,

I am trying to work on YOLO in Windows ML. Initially I converted the darknet Yolo v2 tiny model to keras using yad2k script. And used keras2onnx converter to convert from keras to ONNX.

So, model is successfully converted to ONNX with output shape as NHWC ( 13 x 13 x 125 ). Now I have to generate bounding boxes for which I tried referring your code for OFFSET but I see "Array Index: Out of bound exception". I think this is because you have 128 channels in swift and in windows ML its just 125.

So, how can I handle this?

Could you please help me on this?

How do I only use 1 or 2 classes for the YOLO example?

Hi I love the framework. How do I make it so yolo won't use every class, for instance so it will work on just humans?

Upload to appstore error

Hey! I loved your framework, but I get this error when I tried to upload in appstore:
"Unexpected CFBundleExecutable Key - The bundle at '/Payload/MyApp.app/Forge/Forge/Info.plist' does not contain a bundle executable. If this bundle intentionally does not contain an executable, consider removing the CFBundleExecutable key from its Info.plist and using a CFBundlePackageType of BNDL. If this bundle is part of a third-party framework, consider contacting the developer of the framework for an update to address this issue."
I tried a lot of things in the past days, but I was unable to fix this error. Do you have any idea what should I do? Thank you very much for your help!

Building own app causes Libmobilegestalt issue

Hi,

I had to make this a new issue since it's unrelated to the previous one. I've built my app using Forge's DSL and converted the collective weights into layer by layer weights. The app builds correctly but when I try to run it, I encounter the following issue:

libMobileGestalt MobileGestaltSupport.m:153: pid 10398 (Labels) does not have sandbox access for frZQaeyWLUvLjeuEK43hmg and IS NOT appropriately entitled

I've tried to track down the bug, but I can't seem to locate it. My development environment is xcode 8.3 and iOS 10.3. Any pointers would be appreciated in this direction

Do you have any plan to implement ResNet with Forge?

First of all, your Forge make me happy. Thank you.
Now, I'm trying to implement ResNet using MPS, but stuck at adding two conv layers.
Is there any MPS API to support ResNet or do I have to copy two weights of conv layers from GPU to CPU and pushing again to GPU after calculating 'ADD'? (I think the latter is bad idea). I'm glad if you give me a little hint.

thank you.

Update ios version and xcode version, can not run any more.

Hi, I just update my ios version to 11.3, my bad!
So I also update xcode to match my iphone, and I got some error like: "Module compiled with Swift 4.0.3 cannot be imported in Swift 4.1".

And I also got some error in Layer.swift:"Type of expression is ambiguous without more context"
or "'MPSRectNoClip' is only available on iOS 11.0 or newer".

Could you please help me to fix this problem?
Thank you!

Hello World example and extreme example

I understand that MNST is the hello world usually (at least for my Keras learning experience). But sometimes you want to go down a bit like a simple neutral network so to check weights, understand the flow etc. Could that be a simple network for learning purpose from modeling, learning to running.

For your kind advice.

Unable to archive the app with Forge.

Hey!

Awesome framework, really like using it. But here's a big issue. I'm unable to archive the app because it throws a whole bunch of unresolved error issues. The app works great during development but I'm unable to archive the app. :)

Attaching a pic for you.

Do let me know how to fix this.

Thanks!

Are you going to develop Deconvolution layers?)

Hi! Amazing framework! Thanks! The framework would be even more amazing if you provide convolution transpose layer ;))) Are you going to do this? :)

Greate job!

Instand of MTLTexture with MTLBuffer will get much more better performace. Subsample a pixel is slow.

the performance is Unexpectedly

today i test Tensorflow(TF) iOS example with my iPhone 6S , according to the introduction in TF Website and source code , i know it use Apple's Accelerate framework , i build the protobuf , and TF's source code in my Mac , then run iOS example , i record the time with the code

tensorflow::Status run_status = tf_session->Run(
        {{input_layer_name, image_tensor}}, {output_layer_name}, {}, &outputs);

and the time is fast, only 90ms, i know TF's iOS example use the Google Inception V1 Model , and i test Apple's example which use Google Inception V3 Model , the time is 120ms, metal is more slow than Accelerate framework ? i can not understand . i do not think there is too much different feature that affect performance between inception V1 and V3... so how to explain it ?

Error: framework not found Forge for architecture arm64

I need to embed Forge framework inside our static library. And the static library builds successfully for all architectures: arm64, armv7, armv7s.
Xcode gives following error when static library is used in a sample app: framework not found Forge for architecture arm64. Linker command failed with exit code 1 (use -v to see invocation)

update to Xcode 9.4

Hi,There are two problems come up when update to Xcode 9.4.
Layers.swift has one error :Type of expression is ambiguous without more context

conv = MPSCNNConvolution(device: device,
                             convolutionDescriptor: desc,
                             kernelWeights: weights.pointer,
                             biasTerms: biases?.pointer,
                             flags: .none)

and in file LayerHelpers.swift has the same error in the following line.

  let layer = MPSCNNConvolution(device: device,
                                convolutionDescriptor: desc,
                                kernelWeights: weightsData.pointer,
                                biasTerms: biasData?.pointer,
                                flags: .none)

Error: the destination image texture is temporary and has a readCount of 0.

when i combine mobilenet and shortcut connection, i get such an error:

/BuildRoot/Library/Caches/com.apple.xbs/Sources/MetalImage/MetalImage-100.6/MPSNeuralNetwork/Filters/MPSCNNKernel.mm:729: failed assertion `[MPSCNNConvolution encodeToCommandBuffer:sourceImage:inState:destinationImage:] Error: the destination image texture is temporary and has a readCount of 0.
Its texel storage is probably in use for another texture now.

net summary is correct but error occurs in method Model.encode(exactly, in MPSCNNLayer.encode), and i can not figure out why.net definetion is something like this:

    let relu = MPSCNNNeuronReLU(device: device, a : 0.0)
    let input = Input(width: 256, height: 512, channels:3)
    let mbv1_conv_1 = input
        --> Resize(width: 256, height: 512)
        --> Convolution(kernel: (3, 3), channels: 16, stride: (2, 2), padding: .same, activation: relu, useBias: true, name: "0")
        --> DepthwiseConvolution(kernel: (3, 3), stride: (1, 1), activation: nil, useBias: false, name: "1_d")
        --> PointwiseConvolution(channels: 32, stride: (1, 1), activation: relu, useBias: true, name: "1_p")
        --> DepthwiseConvolution(kernel: (3, 3), stride: (2, 2), activation: nil, useBias: false, name: "2_d")
        --> PointwiseConvolution(channels: 64, stride: (1, 1), activation: relu, useBias: true, name: "2_p")
    
    let mbv1_conv_2 = mbv1_conv_1
        --> DepthwiseConvolution(kernel: (3, 3), stride: (2, 2), activation: nil, useBias: false, name: "3_d")
        --> PointwiseConvolution(channels: 128, stride: (1, 1), activation: relu, useBias: true, name: "3_p")
        --> DepthwiseConvolution(kernel: (3, 3), stride: (1, 1), activation: nil, useBias: false, name: "4_d")
        --> PointwiseConvolution(channels: 128, stride: (1, 1), activation: relu, useBias: true, name: "4_p")
    
    let mbv1_conv_3 = mbv1_conv_2
        --> DepthwiseConvolution(kernel: (3, 3), stride: (1, 1), activation: nil, useBias: false, name: "5_d")
        --> PointwiseConvolution(channels: 256, stride: (1, 1), activation: relu, useBias: true, name: "5_p")
        --> DepthwiseConvolution(kernel: (3, 3), stride: (1, 1), activation: nil, useBias: false, name: "6_d")
        --> PointwiseConvolution(channels: 256, stride: (1, 1), activation: relu, useBias: true, name: "6_p")
    
    let mbv1_maxpool = mbv1_conv_1
        --> MaxPooling(kernel: (2, 2), stride: (2, 2), padding: .valid)
    
    let concat = Concatenate([ mbv1_maxpool, mbv1_conv_2, mbv1_conv_3])
    
    let mbv1_conv4 = concat
        --> DepthwiseConvolution(kernel: (3, 3), stride: (1, 1), activation: nil, useBias: false, name: "7_d")
        --> PointwiseConvolution(channels: 32, stride: (1, 1), activation: relu, useBias: true, name: "7_p")

Forge slows down after few hundred frames

Hello,

I am building app that requires real time performance. I've run few tests on Inceptionv3 example and here are results:
First run:
https://pastebin.com/G3ErxAcA
Second run:
https://pastebin.com/Se0zHifF

For the first ~300 frames GPU execution time starts with 0.07, but later increases to ~0.085.
I thought that possible cause is GPU overheating, but right after first run I tried running app again and on the second run results are similar: first few hundreds frames are processed much faster than last ones.

I see that it also depends on fps:
https://pastebin.com/cmFkFP5P
I use iPad Pro for testing and I set fps to 15. It runs smoothly for few 300-400 frames and then later slows down a lot (even 0.44 time for one or two frames) and than runs faster ~ 0.11 s, but still slower than at the beginning.
This experiment is repeatable: it is always really fast -> two frames super slow -> slower than at the beginning.

What causes this? Maybe some problem with resource management in Forge?

In my app execution time increases from 0.12 s to 0.2s per one execution, which makes my app unusable.

Thanks for help in advance :)

Use more sensible defaults

There are a few places where I think adding some defaults in constructors would be beneficial/sensible.

Stuff like inflightBuffers and kernel in Convolution and Pooling layers could have defaults that would reduce repetition and clean up model construction.

On the flip side, perhaps some people might not notice the default params and it could lead to errors.

Thoughts?

mhh, not running

Hi, i can compile (message success) but the App are not running in simulator or iPhone. After build success message... nothing.

Supported weights file for Forge

Hi,

I wanted to know if .bin or .dat are the only file types supported, for interacting with the Forge framework. Can Forge work with files like .caffemodel or .pb or with files which do not specify weights and biases for every layer?

If .bin or .dat are the only supported file types, are you aware of any conversion tools to convert from other binary weight file types?

Thanks

Tip "No such module 'Forge'"?

open and run the YOLO demo project, Error:No such module 'Forge

how to fix it?

what is this anchor

In your article explaining how to use the yolo model what is this anchors list

[question] TensorFlow Lite

Hi, @hollance san,

Have you ever evaluate TensorFlow Lite on iOS/iPadOS for GPU(I mean not for Neural Engine)?

As you had pointed out at some article, Core ML is slower than MPSCNN.
So I expected Metal delegate of TensorFlow Lite and tried but am disappointed it’s performance.
If you have some insights, could you tell me some?

Thanks.

Has Mobilenet-SSD been supported on iOS yet ?

Hi Hollance
I followed the tutorial https://github.com/chuanqi305/MobileNet-SSD
After that I try to convert my deployed model to CoreML and I got the issue
[libprotobuf ERROR /Users/sohaibqureshi/github/coremltools/deps/protobuf/src/google/protobuf/text_format.cc:287] Error parsing text-format caffe.NetParameter: 1177:17: Message type "caffe.LayerParameter" has no field named "permute_param". Traceback (most recent call last): File "mobilenet_2_coreml.py", line 23, in <module> class_labels='caffe_model/synset_words.txt') File "/Users/ln160c/Downloads/YOLO-CoreML-MPSNNGraph-master/Convert/coreml/coreml/lib/python2.7/site-packages/coremltools/converters/caffe/_caffe_converter.py", line 171, in convert predicted_feature_name) File "/Users/ln160c/Downloads/YOLO-CoreML-MPSNNGraph-master/Convert/coreml/coreml/lib/python2.7/site-packages/coremltools/converters/caffe/_caffe_converter.py", line 230, in _export predicted_feature_name) RuntimeError: Unable to load caffe network Prototxt file: caffe_model/MobileNetSSD_deploy.prototxt

I'm not sure if CoreML supports MobileNetSSD or not. Could you take a look ?

why my metal shader is much slow than MPSCNN

hello , i am following you for a long time . i am also a iOS developer with deep learning . your code give me many help , thank you !

now i have a question about convolution. i use MPSCNN to run the CNN network for a long time ,for example ,VGG-NET , ResNet , SqueezeNet and so on . the performance is very good , SqueezeNet only need 20ms , i can use it to recognize image realtime with my iPhone. i am curious ， i do not know why MPSCNN is so fast adn high performance. i just know it use Metal and GPU. so i want write the kernel code myself and compare to MPSCNN .

i construct the convolution example for that:
the input is 3x224x224
the convolution kernel is 64x3x3
the pading is 1
the stride is 1
so the output is 64x224x224
and datatype is float

the MPSCNN code is that

NSDate *start2 = [NSDate date];
    MPSImageDescriptor *desc = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatFloat32 width:224 height:224 featureChannels:3];
    MPSImage *srcImage = [[MPSImage alloc] initWithDevice:self.device imageDescriptor:desc];
    
    MPSImageDescriptor *desc2 = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatFloat32 width:224 height:224 featureChannels:64];
    MPSImage *outImage = [[MPSImage alloc] initWithDevice:self.device imageDescriptor:desc2];
    
    id<MTLCommandBuffer> commandBuffer = [self.commandQueue commandBuffer];

    int co = 4*224*224;
    int kernel_size = 3;
    int pad = 1;
    int stride = 1;
    int count = 64*224*224;
    
    float *buf = new float[co];
    for(int i =0;i<co;i++){
        buf[i] = 1.0;
    }
    
    int weight_count = 3*64*kernel_size*kernel_size;
    float *weight = new float[weight_count];
    for(int i =0;i<weight_count;i++){
        weight[i] = 0.123;
    }

    float *bias = new float[64];
    for(int i =0;i<64;i++){
        bias[i] = 1.23456789;
    }
    MTLRegion region = MTLRegionMake3D(0, 0, 0,224,224,1);
    [srcImage.texture replaceRegion:region mipmapLevel:0 slice:0 withBytes:buf bytesPerRow:srcImage.width*4*sizeof(float) bytesPerImage:0];
    
    MPSCNNConvolutionDescriptor *convdesc = [MPSCNNConvolutionDescriptor cnnConvolutionDescriptorWithKernelWidth:kernel_size kernelHeight:kernel_size inputFeatureChannels:3 outputFeatureChannels:64 neuronFilter:nil];
    convdesc.strideInPixelsX = stride;
    convdesc.strideInPixelsY = stride;
    convdesc.groups = 1;
    
    MPSCNNConvolution *conv = [[MPSCNNConvolution alloc] initWithDevice:self.device convolutionDescriptor:convdesc kernelWeights:weight biasTerms:bias flags:MPSCNNConvolutionFlagsNone];
    MPSOffset offset;
    offset.x = 0;
    offset.y = 0;
    offset.z = 0;
    conv.offset = offset;
    
    
    [conv encodeToCommandBuffer:commandBuffer sourceImage:srcImage destinationImage:outImage];
    NSTimeInterval localtime2 = [[NSDate date] timeIntervalSinceDate:start2] * 1000;
    cout << "data init used " << localtime2 << "ms" << endl;
    
    
    NSDate *start = [NSDate date];

    [commandBuffer commit];
    [commandBuffer waitUntilCompleted];
    
    delete [] buf;
    delete [] weight;
    delete [] bias;
    NSTimeInterval localtime = [[NSDate date] timeIntervalSinceDate:start] * 1000;

    cout << "gpu calc used " << localtime << "ms" << endl;

my metal code is that (because 4 channel is easy to process ,so i convert input to 4x224x224)

id <MTLComputePipelineState> pipline = self.pipelineShaderTex;
    
    int co = 4*224*224;
    int kernel_size = 3;
    int pad = 1;
    int stride = 1;
    int count = 64*224*224;
    
    float *buf = new float[co];
    for(int i =0;i<co;i++){
        buf[i] = 1.0;
    }
    
    int weight_count = 4*64*kernel_size*kernel_size;
    float *weight = new float[weight_count];
    for(int i =0;i<weight_count;i++){
        weight[i] = i%4 == 3 ? 0 : 0.123;
    }

    float *bias = new float[64];
    for(int i =0;i<64;i++){
        bias[i] = 1.23456789;
    }
    
    MetalConvolutionParameter param;
    param.count = count;
    param.padSize = pad;
    param.kernelSize = kernel_size;
    param.stride = stride;
    param.inputChannel = 3;
    param.outputChannel = 64;
    param.inputWidth = 224;
    param.inputHeight = 224;
    param.outputWidth = 224;
    param.outputHeight = 224;
    
    MTLTextureDescriptor *indesc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA32Float width:224 height:224 mipmapped:NO];
    indesc.textureType = MTLTextureType2D;
    
    MTLTextureDescriptor *outdesc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA32Float width:224 height:224 mipmapped:NO];
    outdesc.textureType = MTLTextureType2DArray;
    outdesc.arrayLength = 64/4;
    
    MTLTextureDescriptor *weightdesc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA32Float width:3 height:3 mipmapped:NO];
    weightdesc.textureType = MTLTextureType2DArray;
    weightdesc.arrayLength = 64;

    MTLTextureDescriptor *biasdesc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA32Float width:1 height:1 mipmapped:NO];
    biasdesc.textureType = MTLTextureType2DArray;
    biasdesc.arrayLength = 64/4;
    
    if(!self.inTexture){
        self.inTexture = [self.device newTextureWithDescriptor:indesc];
        self.outTexture = [self.device newTextureWithDescriptor:outdesc];
        self.weightTexture = [self.device newTextureWithDescriptor:weightdesc];
        self.biasTexture = [self.device newTextureWithDescriptor:biasdesc];
        
        [self.inTexture replaceRegion:MTLRegionMake3D(0, 0, 0, 224, 224, 1) mipmapLevel:0 slice:0 withBytes:buf bytesPerRow:224*4*sizeof(float) bytesPerImage:0];
        for(int i =0;i<weightdesc.arrayLength;i++){
            [self.weightTexture replaceRegion:MTLRegionMake3D(0, 0, 0, 3, 3, 1) mipmapLevel:0 slice:i withBytes:weight+3*3*4*i bytesPerRow:3*4*sizeof(float) bytesPerImage:0];
            
        }
        for(int i =0;i<biasdesc.arrayLength;i++){
            [self.biasTexture replaceRegion:MTLRegionMake3D(0, 0, 0, 1, 1, 1) mipmapLevel:0 slice:i withBytes:bias+4*i bytesPerRow:1*4*sizeof(float) bytesPerImage:0];
        }
    }
    id<MTLBuffer> parambuffer = [self.device newBufferWithBytes:&param length:sizeof(param) options:MTLResourceCPUCacheModeDefaultCache];

    id<MTLCommandBuffer> commandBuffer = [self.commandQueue commandBuffer];
    id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder];
    [encoder setComputePipelineState:pipline];
    [encoder setTexture:self.inTexture atIndex:0];
    [encoder setTexture:self.outTexture atIndex:1];
    [encoder setTexture:self.weightTexture atIndex:2];
    [encoder setTexture:self.biasTexture atIndex:3];
    [encoder setBuffer:parambuffer offset:0 atIndex:0];
    
    MTLSize threadsPerGroups = MTLSizeMake(32, 16, 1);
    MTLSize threadGroups = MTLSizeMake((224 + threadsPerGroups.width -1 ) / threadsPerGroups.width,
                                       (224 + threadsPerGroups.height -1 ) / threadsPerGroups.height, 16);
    
    [encoder dispatchThreadgroups:threadGroups threadsPerThreadgroup:threadsPerGroups];
    [encoder endEncoding];
    
    NSDate *start = [NSDate date];

    [commandBuffer commit];
    [commandBuffer waitUntilCompleted];
    
    delete [] buf;
    delete [] weight;
    delete [] bias;
    NSTimeInterval localtime = [[NSDate date] timeIntervalSinceDate:start] * 1000;
    cout << "Time used " << localtime << "ms" << endl;

and metal kernel function is （i do not process the pad and stride , and input is reading (0,0), ignore it , i just test calculator performance）

kernel void convolutionForwardTexture(texture2d<float, access::read> inTexture [[texture(0)]],
                                      texture2d_array<float, access::write> outTexture [[texture(1)]],
                                      texture2d_array<float, access::read> weights [[ texture(2) ]],
                                      texture2d_array<float, access::read> bias [[ texture(3) ]],
                                      const device MetalConvolutionParameter *convolvParams [[ buffer(0) ]],
                                      ushort3 gid [[ thread_position_in_grid ]]){
    if(gid.x>=224||gid.y>=224){
        return;
    }
    
    float total = 0;
    float total2 = 0;
    float total3 = 0;
    float total4 = 0;
    
    float4 k,input;
    int slice = gid.z;
    for(int kh =0;kh<3;kh++){
        for(int kw =0;kw<3;kw++) {
            k = weights.read(uint2(kw,kh),slice*4);
            input = inTexture.read(uint2(0,0));
            total+=dot(k,input);
            
            k = weights.read(uint2(kw,kh),slice*4+1);
            input = inTexture.read(uint2(0,0));
            total2+=dot(k,input);
            
            k = weights.read(uint2(kw,kh),slice*4+2);
            input = inTexture.read(uint2(0,0));
            total3+=dot(k,input);
            
            k = weights.read(uint2(kw,kh),slice*4+3);
            input = inTexture.read(uint2(0,0));
            total4+=dot(k,input);
        }
    }
    
    float4 output = float4(total,total2,total3,total4) + bias.read(uint2(0,0),slice);
    outTexture.write(output,uint2(gid.x,gid.y),gid.z);
    
}

the result is MPSCNN need only 10ms , and my code is 40ms , why my code is so slow ? i do not know how MPSCNN do it ? can you give some help for me ?

How to implement element wise layer in forge

Hi, I want to convert a caffemodel which includes the elementwise(sum operation) layer, however, there is no implementation in forge, so I want to write it by myself. How to implement this layer as soon as possible? Please help me, thanks!

Matching with predefined images (markers)

Is it possible to predefine some images(or files) which will work like markers?

Forge is not support ios 11.3

I update my xcode and iphone to ios11.3. Then the Forge can not build success.You maybe update the forge to support ios11.3,because some class have made some changes in ios 11.3

Results of MPSCNNConvlotion

Hi, I have a following float array as an input buffer for MPSImage

let buffer4c = [
// R    G     B   A    R    G     B     A
1.0, 0.0, 0.0, 1.0,  1.0, 0.0, 0.0, 1.0, 
// R    G     B   A    R    G     B     A
1.0, 0.0, 0.0, 1.0,  1.0, 0.0, 0.0, 1.0, 
]

From my understanding, this should represent a 2x2x3 tensor whose the 4th channel is padded as 1.0. Then I created a MPSImage object using that buffer via the category method defined in MPSImage+Floats.swift

inputImg  = MPSImage(device: device,
                                             numberOfImages: 1,
                                             width: 2,
                                             height: 2,
                                             featureChannels: 3,
                                             array: &buffer4c,
                                             count: 2*2*4)

After that, I created a weight buffer whose dimension is 1x3x2x2(NCHW). I understand this needs to be converted to NHWC. To make things easier, I set all values in the buffer to 1.0

nums = [1.0,1.0,1.0, 1.0,1.0,1.0,
                1.0,1.0,1.0, 1.0,1.0,1.0]

The last step is to setup the convolution, here is what I did

class Conv2d : NeuralNetwork {
    typealias PredictionType = Float16
    
    var inputImg: MPSImage!
    var outputImg: MPSImage!
    var oid = MPSImageDescriptor(channelFormat: .float16, width: 1, height: 1, featureChannels: 1)
    var conv2d: MPSCNNConvolution
    
    init(device: MTLDevice, inflightBuffers: Int) {
        weightsLoader   = { name, count in ParameterLoaderBundle(name: name, count: count, suffix: "_W", ext: "txt") }
        outputImg       = MPSImage(device: device, imageDescriptor: oid)
        conv2d          = convolution(device: device, kernel: (2, 2), inChannels: 3, outChannels: 1, activation: nil, name: "conv", useBias: false)
    }
    
    func encode(commandBuffer: MTLCommandBuffer, texture: MTLTexture, inflightIndex: Int) {
        conv2d.encode(commandBuffer: commandBuffer, sourceImage: inputImg, destinationImage: outputImg)
    }
    func fetchResult(inflightIndex: Int) -> NeuralNetworkResult<Float16> {
        let probabilities = outputImg.toFloatArray()
        print(probabilities)
        return NeuralNetworkResult<Float16>()
    }
}

From my understanding, the result of the convolution should be 4.0 （I aslo verified using pytorch）. However, the output was 1.0. I experimented a little bit, seems like only the first 4 elements of image buffer get multiplied with the corresponding weights.

Is there anything that I'm missing here?

`[MPSTemporaryImage prefetchStorageWithCommandBuffer:imageDescriptorList:] Error: the descriptor must be configured with MTLStorageModePrivate'

Hi.
I'm trying to run the YOLO application of this project.
When I try it, this error occurs.

failed assertion `[MPSTemporaryImage prefetchStorageWithCommandBuffer:imageDescriptorList:] Error: the descriptor must be configured with MTLStorageModePrivate'

in YOLO.swift line 69, which calls models.swift line 306.

I edited DataShape.swift line 52,
from

return MPSImageDescriptor(channelFormat: .float16, width: width,
                              height: height, featureChannels: channels)

return MPSImageDescriptor(channelFormat: .float16, width: width,
                              height: height, featureChannels: channels,
                              storageMode: .private) // and MTLStorageMode.private instead of .private

but it doesn't work announcing "Expression type 'MPSImageDescriptor' is ambiguous without more context".
I'm working on Xcode 9 and iOS 11. (is it the reason?)
What can I do for it?
Thank you.

Relu6 in metal

Could you please write a swift class of Relu6 to replace MPSCNNNeuronReLU? I'm new with Metal and want to get a example to define new layers except kinds of Convolutions. Thank you very much!
Relu6 in tensorflow get f(x;a)=min(a*min(0, x)+max(0, x), 6)

Add layer

Is there some way to create Add layer that takes 2 or more tensors as an inputs and returns sum of them? (F.e. Add layer in keras)

Quantized model from Tensorflow

Did Forge support 8-bit float value for quantized model from tensorflow?

implant Yolo to ARKit

i tried use yolo in ARKit.
i have implanted your code.

call predict in seesion delegate

func session(_ session: ARSession, didUpdate frame: ARFrame) {
        let seekingCM = CMTimeMakeWithSeconds(frame.timestamp, 1000000);
        let timestamp = seekingCM
        let deltaTime = timestamp - lastTimestamp
        if fps == -1 || deltaTime >= CMTimeMake(1, Int32(fps)) {
            lastTimestamp = timestamp
            
            if let texture = convertToMTLTexture(pixelBuffer:frame.capturedImage){
                predict(texture: texture)
            }
            
        }
    }

and convert texture with CVPixelBuffer instead SampleBuffer

func convertToMTLTexture(pixelBuffer: CVPixelBuffer?) -> MTLTexture? {
        if let textureCache = textureCache,
            let pixelBuffer = pixelBuffer{

            let width = CVPixelBufferGetWidth(pixelBuffer)
            let height = CVPixelBufferGetHeight(pixelBuffer)
            
            var texture: CVMetalTexture?
            CVMetalTextureCacheCreateTextureFromImage(kCFAllocatorDefault, textureCache,
                                                      pixelBuffer, nil, .bgra8Unorm, width, height, 0, &texture)
            if let texture = texture {
                return CVMetalTextureGetTexture(texture)
            }
        }
        return nil
    }

because arkit run camera with full screen ,and output 1280X720
so i changed height to 16/9

private func show(predictions: [YOLO.Prediction]) {
        DEBUGLOG(message: predictions.count)

        for i in 0..<boundingBoxes.count {
            if i < predictions.count {
                let prediction = predictions[i]
                
                // The predicted bounding box is in the coordinate space of the input
                // image, which is a square image of 416x416 pixels. We want to show it
                // on the video preview, which is as wide as the screen and has a 4:3
                // aspect ratio. The video preview also may be letterboxed at the top
                // and bottom.
                let width = view.bounds.width
                let height = width * 16 / 9
                let scaleX = width / CGFloat(YOLO.inputWidth)
                let scaleY = height / CGFloat(YOLO.inputHeight)
//                let top = (view.bounds.height - height) / 2
                
                // Translate and scale the rectangle to our own coordinate system.
                var rect = prediction.rect
                rect.origin.x *= scaleX
                rect.origin.y *= scaleY
//                rect.origin.y += top
                rect.size.width *= scaleX
                rect.size.height *= scaleY
                
                // Show the bounding box.
                let label = String(format: "%@ %.1f", labels[prediction.classIndex], prediction.score * 100)
                let color = colors[prediction.classIndex]
                boundingBoxes[i].show(frame: rect, label: label, color: color)
                
            } else {
                boundingBoxes[i].hide()
            }
        }
    }

it can run . But not effect right .
same bottle ,it can recog in your demo . but can't in my.

where am i missing ?
please help me

Custom TinyYOLO doesn't work

I have been trying to use Forge to detect a custom object based on TinyYolo model. By the way, every time I tried to run it gave me this error. My custom model consists of 2 classes, and tried to changes several parameters and seems not to work

The TinyYolo which based on voc that came with your example works fine.

Getting all channels for point

Hi!

Quick question. For the post-processing, is there a version of printChannelsForPixel() that just returns all the channels at the point for an MPSImage instead of printing them? Or to use the output MPSImage do I need to just make wrapper functions for slicing and indexing?

Thank you so much!

ETC_BAD_ACCESS code = 10

Hi!

I wrote my own CNN in the YOLO demo (currently trying to replicate squeezeDet). I changed the model layers in the init and have my weights converted as .bin files in the parameters folder. The only other item changed was labels. When I run I get "ETC_BAD_ACCESS code = 10" on the flags line of
"conv = MPSCNNConvolution(device: device,
convolutionDescriptor: desc,
kernelWeights: weights.pointer,
biasTerms: biases?.pointer,
flags: .none)"

Photo included below. What could be the issue causing this? Any advice on solving?

Thank you so much!

Edit: My problem is from this line
let output = fire11Result --> Convolution(kernel: (3, 3), channels: 72, stride: (1,1), activation: nil, name: "conv12") //error is caused by this line

The app runs using fire11Result, the second to last layer, as the output.

Any idea?

Thank you!

Reshape Layer

Maybe i'm missing it in the documentation, but does Forge support a reshape layer?

how do you think the core ML framework with WWDC 2017

i think a lot of apps that use core ML will come to App Store :)
and Metal 2 is only for Mac OS ?

A issues about memory leak

i add a viewcontroller which as the first vc in MobileNetsDemo.Then present the cameracontroller in demo ,and then dismiss the cameracontroller .But i find that when i do this ,there is about 10M memory is not released.And everytime i present the cameracontroller ,there is more 10M in memory which is not released. i guess the issues is everytime createNeuralNetwork the memory will in increase，but i cannot solve the problem.How could i solve this problem??

EXC_BAD_ACCESS on release executions

I tried to run this project using the release version and they all crash at:

mpscnn = MPSCNNFullyConnected(device: device, convolutionDescriptor: desc, kernelWeights: weights.pointer, biasTerms: biasTerms, flags: .none)

In createCompute of layers.swift

Only MNIST project, which does not use Forge, seems to work.

This means that an app using this library cannot be compiled to be sent to the app store, ad hoc or enterprise distribution.

hollance / forge Goto Github PK

forge's Introduction

Forge: a neural network toolkit for Metal

What does this do?

iOS 10 and iOS 11 compatibility

Run the examples!

MNIST

Inception-v3

YOLO

MobileNets

How to add Forge to your own project

How to use Forge

Where are the unit tests?

TODO

License and credits

forge's People

Contributors

Stargazers

Watchers

Forkers

forge's Issues

Recommend Projects

Recommend Topics

Recommend Org