Skip to content

YOLO ground truth width and length are not relative to image size but to S #140

Open
@oonisim

Description

@oonisim

Code

dataset.py calculate thewidth_cell and height_cell to be set to the label_matrix Tensor.

"""
...
Then to find the width relative to the cell is simply:
width_pixels/cell_pixels, simplification leads to the
formulas below.
"""
width_cell, height_cell = (
    width * self.S,
    height * self.S,
)

Question

Please help understand why the unit of width_cell and width_cell are cells, that is, relative to S instead of image size.

In my understanding, width andheight are from the YOLO Darknet annotation where width and height are relative to the image size whose value is between 0 and 1. Suppose width=0.7, then width_cell will be 4.9 cells.

If width_cell and width_cell are used as the ground truth for YOLO v1 training, I suppose they should be relative to image size as in the YOLO v1 paper.

Each bounding box consists of 5 predictions: x, y, w, h,
and confidence. The (x; y) coordinates represent the center
of the box relative to the bounds of the grid cell. The width
and height are predicted relative to the whole image
.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions