Giter Club home page Giter Club logo

parkereggvisionarkit's Introduction

Using Vision in Real Time with ARKit

Manage Vision resources for efficient execution of a Core ML image classifier, and use SpriteKit to display image classifier output in AR.

Overview

This sample app runs an ARKit world-tracking session with content displayed in a SpriteKit view. The app uses the Vision framework to pass camera images to a Core ML classifier model, displaying a label in the corner of the screen to indicate whether the classifier recognizes anything in view of the camera. After the classifier produces a label for the image, the user can tap the screen to place that text in AR world space.

  • Note: The Core ML image classifier model doesn't recognize and locate the 3D positions of objects. (In fact, the Inceptionv3 model attempts only to identify an entire scene.) When the user taps the screen, the app adds a label at a real-world position corresponding to the tapped point. How closely a label appears to relate to the object it names depends on where the user taps.

Getting Started

ARKit requires iOS 11.0 and a device with an A9 (or later) processor. ARKit is not available in iOS Simulator. Building the sample code requires Xcode 9.0 or later.

Implement the Vision/Core ML Image Classifier

The sample code's classificationRequest property, classifyCurrentImage() method, and processClassifications(for:error:) method manage:

  • A Core ML image-classifier model, loaded from an mlmodel file bundled with the app using the Swift API that Core ML generates for the model
  • VNCoreMLRequest and VNImageRequestHandler objects for passing image data to the model for evaluation

For more details on using VNImageRequestHandler, VNCoreMLRequest, and image classifier models, see the Classifying Images with Vision and Core ML sample-code project.

Run the AR Session and Process Camera Images

The sample ViewController class manages the AR session and displays AR overlay content in a SpriteKit view. ARKit captures video frames from the camera and provides them to the view controller in the session(_:didUpdate:) method, which then calls the classifyCurrentImage() method to run the Vision image classifier.

func session(_ session: ARSession, didUpdate frame: ARFrame) {
    // Do not enqueue other buffers for processing while another Vision task is still running.
    // The camera stream has only a finite amount of buffers available; holding too many buffers for analysis would starve the camera.
    guard currentBuffer == nil, case .normal = frame.camera.trackingState else {
        return
    }
    
    // Retain the image buffer for Vision processing.
    self.currentBuffer = frame.capturedImage
    classifyCurrentImage()
}

View in Source

Serialize Image Processing for Real-Time Performance

The classifyCurrentImage() method uses the view controller's currentBuffer property to track whether Vision is currently processing an image before starting another Vision task.

// Most computer vision tasks are not rotation agnostic so it is important to pass in the orientation of the image with respect to device.
let orientation = CGImagePropertyOrientation(UIDevice.current.orientation)

let requestHandler = VNImageRequestHandler(cvPixelBuffer: currentBuffer!, orientation: orientation)
visionQueue.async {
    do {
        // Release the pixel buffer when done, allowing the next buffer to be processed.
        defer { self.currentBuffer = nil }
        try requestHandler.perform([self.classificationRequest])
    } catch {
        print("Error: Vision request failed with error \"\(error)\"")
    }
}

View in Source

  • Important: Making sure only one buffer is being processed at a time ensures good performance. The camera recycles a finite pool of pixel buffers, so retaining too many buffers for processing could starve the camera and shut down the capture session. Passing multiple buffers to Vision for processing would slow down processing of each image, adding latency and reducing the amount of CPU and GPU overhead for rendering AR visualizations.

In addition, the sample app enables the usesCPUOnly setting for its Vision request, freeing the GPU for use in rendering.

Visualize Results in AR

The processClassifications(for:error:) method stores the best-match result label produced by the image classifier and displays it in the corner of the screen. The user can then tap in the AR scene to place that label at a real-world position. Placing a label requires two main steps.

First, a tap gesture recognizer fires the placeLabelAtLocation(sender:) action. This method uses the ARKit hitTest(_:types:) method to estimate the 3D real-world position corresponding to the tap, and adds an anchor to the AR session at that position.

@IBAction func placeLabelAtLocation(sender: UITapGestureRecognizer) {
    let hitLocationInView = sender.location(in: sceneView)
    let hitTestResults = sceneView.hitTest(hitLocationInView, types: [.featurePoint, .estimatedHorizontalPlane])
    if let result = hitTestResults.first {
        
        // Add a new anchor at the tap location.
        let anchor = ARAnchor(transform: result.worldTransform)
        sceneView.session.add(anchor: anchor)
        
        // Track anchor ID to associate text with the anchor after ARKit creates a corresponding SKNode.
        anchorLabels[anchor.identifier] = identifierString
    }
}

View in Source

Next, after ARKit automatically creates a SpriteKit node for the newly added anchor, the view(_:didAdd:for:) delegate method provides content for that node. In this case, the sample TemplateLabelNode class creates a styled text label using the string provided by the image classifier.

func view(_ view: ARSKView, didAdd node: SKNode, for anchor: ARAnchor) {
    guard let labelText = anchorLabels[anchor.identifier] else {
        fatalError("missing expected associated label for anchor")
    }
    let label = TemplateLabelNode(text: labelText)
    node.addChild(label)
}

View in Source

parkereggvisionarkit's People

Stargazers

 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.