Giter Club home page Giter Club logo

Comments (3)

ludovic-carre avatar ludovic-carre commented on July 27, 2024 2

I understand now. The network actually downsamples the image until the first prediction and then upsamples it until the last. So the first prediction is on the smaller feature map and the last one on the biggest feature map which explains why small objects are detected at the end (more information) and big objects in the begining. I know you mentioned it on your post @alexandru-dinu but I didn't notice at the time I read it.

from pytorch-yolov3.

alexandru-dinu avatar alexandru-dinu commented on July 27, 2024
  1. For the first yolo layer, at the most coarse scale (13x13 for 416x416 image), the last 3 anchors are used:
mask = 6,7,8
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326

This makes sense, you want this layer to detect large objects in the image. The last yolo layer, at the finest scale (52x52 for 416x416 image), uses the first 3 anchors (smallest); this will detect smaller objects in the image.

  1. If I recall correctly, anchors are bounding-boxes with (w, h) in [416, 416]. So they get normalized by dividing their sizes by 416.

from pytorch-yolov3.

ludovic-carre avatar ludovic-carre commented on July 27, 2024

But the receptive field of the first yolo layer is smaller compared to the one of the last layer so it would make sense to me that the first layer would detect the small objects. And because of the stride doesn't it make more sense that the first layer would detect small objects that the last one might miss since there is 32 pixel gap between the last yolo layer and the input image ?

from pytorch-yolov3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.