Giter Club home page Giter Club logo

teleport-vision-api's Introduction

Alt text

Vision API

alt text teleportHQ Vision API is a computer vision API specifically trained for detecting atomic UI elements in pictures of hand-drawn wireframes (as seen in the picture above). It uses an architecture based on Resnet101 for extracting features and Faster R-CNN for bounding-box proposals.

The machine learning model was built and trained using TensorFlow.

List of elements it can distinguish: paragraph, label, header, button, checkbox, radiobutton, rating, toggle, dropdown, listbox, textarea, textinput, datepicker, stepperinput, slider, progressbar, image, video.

The API is currently in closed alpha, but feel free to contact us if you want early access.

Guideline

We had to decide on some conventions to obtain better results, you can learn more in this blog post.

alt text

Using the Vision API

Request

Send all requests to the API endpoint: https://api.vision.teleporthq.io/v2/detection

Request header

Make sure to add a Content-Type key with the value application/json and a Teleport-Token key with the key provided by us.

Request body

The body of the request is a json with two keys: image and threshold.

  • image is a required string parameter that denotes the direct url to a publicly available jpg or png image.
  • threshold is an optional parameter. Default value is 0.1. The detection model outputs a confidence score for each detection (between 0 and 1) and won't include in the response detections with confidence lower than this threshold.

Request body example:

{
    "image": "https://i.imgur.com/HzTWzLS.jpg", 
    "threshold": 0.5
}

Request example

curl \
  -X POST https://api.vision.teleporthq.io/v2/detection \
  -H 'Content-Type: application/json' \
  -H 'Teleport-Token: your_token' \
  -d '{ 
    "image": "https://i.imgur.com/HzTWzLS.jpg",
    "threshold": 0.5 
  }'

Response

If your request is a valid one, you will recieve back a json with the following structure:

[
    {
        "box": [y, x, height, width],
        "detectionClass": numeric_label,
        "detectionString": string_label,
        "score": confidence_rating
    },
    ...
]

The json contains a list of objects, each one of this objects corresponding to a detected atomic UI element in the image sent in the request. All of the keys will appear in all of the objects in your response array.

  • box contains the coordinates of the bounding box surrounding the detected element. x and y are the coordinates of the top left corner of the box and width and height are self explanatory. All coordinates are normalized between [0, 1] where (0,0) is the top left corner of your image and (1, 1) is the bottom right corner. In other words, if you want to get the pixel coordinates you have to multiply x and width with the width of your image and y and height with the height of your image.
  • detectionClass is the numeric class of the detection.
  • detectionString is the human-readable label of the detection.
  • score represents how confident the algorithm is that the predicted object is a correct / valid one. It takes values between [0, 1], where 1 represents a 100% confidence in its detection.

The detectionClass to detectionString mapping is done according to this dictionary:

{
    1: 'paragraph',
    2: 'dropdown',
    3: 'checkbox',
    4: 'radiobutton',
    5: 'rating',
    6: 'toggle',
    7: 'textarea',
    8: 'datepicker',
    9: 'stepperinput',
    10: 'slider',
    11: 'video',
    12: 'label',
    13: 'table',
    14: 'list',
    15: 'header',
    16: 'button',
    17: 'image',
    18: 'linebreak',
    19: 'container',
    20: 'link',
    21: 'textinput'
}

Response example

Full response here.

[
    {
        "box": [
            0.06640399247407913,
            0.18573421239852905,
            0.0626835897564888,
            0.43779563903808594
        ],
        "detectionClass": 15,
        "detectionString": "header",
        "score": 0.995826005935669
    },
    {
        "box": [
            0.16810636222362518,
            0.18520960211753845,
            0.04797615110874176,
            0.17563629150390625
        ],
        "detectionClass": 16,
        "detectionString": "button",
        "score": 0.9924671053886414
    },
    {
        "box": [
            0.8350381255149841,
            0.5098391771316528,
            0.05998152494430542,
            0.23138082027435303
        ],
        "detectionClass": 16,
        "detectionString": "button",
        "score": 0.9921296238899231
    }
]

Previous version

The previous version of the API is still available at this end point: https://api.vision.teleporthq.io/v1/detection

The detectionClass to detectionString mapping for this previous version is done according to this dictionary:

{
    1: "paragraph",
    2: "label",
    3: "header",
    4: "button",
    5: "checkbox",
    6: "radiobutton",
    7: "rating",
    8: "toggle",
    9: "dropdown",
    10: "listbox",
    11: "textarea",
    12: "textinput",
    13: "datepicker",
    14: "stepperinput",
    15: "slider",
    16: "progressbar",
    17: "image",
    18: "video"
}

How do I get a Teleport-Token?

If you are interested in using this API, feel free to get in touch with us via the following form.

teleport-vision-api's People

Contributors

utwo avatar alexpausan avatar raulincze avatar dimitrif avatar murakamikennzo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.