Giter Club home page Giter Club logo

Comments (6)

Luxonis-Brandon avatar Luxonis-Brandon commented on August 17, 2024

Thank you for documenting this! I was wondering why the boxes are so big...

from depthai-experiments.

VanDavv avatar VanDavv commented on August 17, 2024

the bounding boxes for eyes are generated on the UI, so they can be changed. However, they are now reflecting what is being sent to the neural network. It accepts eye images scaled to 60x60, and from landmarks neural network I receive a single eye point, so using padded_point(..., padding=30, ...) is producing a correct bounding box that I can send further.

This is also why this bounding box is sometimes too big / too small - as it doesn't take the face dimensions into account.

One fix for this, that would be fairly easy to do, is to somehow correlate the face dimensions with eye size, so that for given face width/height we can estimate the eyes width/height. This way, we could draw a better bounding box and the scaled image for nn would contain less non-eye parts, so it may also improve efficiency

Sharing also here for readability

from depthai-experiments.

 avatar commented on August 17, 2024

Drawing Bbox should have as main objective to be used as debugging.
The gaze model requires a 60x60 px input for the eyes.
Having too small or a large bbox, help understand why the model works well or badly. (If we are too close or too far).

To improve the cropped area (and thus have a better bbox for the eyes), we must find a way to adapt the resolution of the cropped face, using multiple resolutions/zoom for example ...

Changing the drawing of the bbox, to adapt well, change the effect and not the cause ...
So, I agree with @VanDavv about it.

from depthai-experiments.

raymondlo84 avatar raymondlo84 commented on August 17, 2024

So am I right that currently we are feeding the wrong 'size' into the NN? I mean we don't take the face size into account?

from depthai-experiments.

 avatar commented on August 17, 2024

Yes, rescaling for the near range (depending on input resolution) and using the zoom feature (not yet implemented / finalized) for far distances should improve the result.

from depthai-experiments.

 avatar commented on August 17, 2024

Ideally, we would have access to the training dataset and try to crop it the same (for eyes) ... ;-)

from depthai-experiments.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.