Custom datasets applying to predict.py issue #1075

deng0326 · 2023-08-14T05:14:26Z

deng0326
Aug 14, 2023

Hi Mike, I am trying to apply the BoxMOT to the predict.py script of my yolov5 model. I read the Custom Dataset tutorial in the READme for instructions and followed step by step... However, no matter how I tried to fix the script, no tracks were found throughout the video (the prediction of the yolov5 model showed several detections). Please kindly read the script below, and give me some advice. I appreciate what you've done for providing us with such a useful tool. 🙏🙏

`# YOLOv5 🚀 by Ultralytics, AGPL-3.0 license
"""
Run YOLOv5 segmentation inference on images, videos, directories, streams, etc.

Usage - sources:
$ python segment/predict.py --weights yolov5s-seg.pt --source 0 # webcam
img.jpg # image
vid.mp4 # video
screen # screenshot
path/ # directory
list.txt # list of images
list.streams # list of streams
'path/*.jpg' # glob
'https://youtu.be/Zgi9g1ksQHc' # YouTube
'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream

Usage - formats:
$ python segment/predict.py --weights yolov5s-seg.pt # PyTorch
yolov5s-seg.torchscript # TorchScript
yolov5s-seg.onnx # ONNX Runtime or OpenCV DNN with --dnn
yolov5s-seg_openvino_model # OpenVINO
yolov5s-seg.engine # TensorRT
yolov5s-seg.mlmodel # CoreML (macOS-only)
yolov5s-seg_saved_model # TensorFlow SavedModel
yolov5s-seg.pb # TensorFlow GraphDef
yolov5s-seg.tflite # TensorFlow Lite
yolov5s-seg_edgetpu.tflite # TensorFlow Edge TPU
yolov5s-seg_paddle_model # PaddlePaddle
"""

import argparse
import os
import platform
import sys
from pathlib import Path
import torch

FILE = Path(file).resolve()
ROOT = FILE.parents[1] # YOLOv5 root directory
if str(ROOT) not in sys.path:
sys.path.append(str(ROOT)) # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd())) # relative

from models.common import DetectMultiBackend
from utils.dataloaders import IMG_FORMATS, VID_FORMATS, LoadImages, LoadScreenshots, LoadStreams
from utils.general import (LOGGER, Profile, check_file, check_img_size, check_imshow, check_requirements, colorstr, cv2,
increment_path, non_max_suppression, print_args, scale_boxes, scale_segments,
strip_optimizer)
from utils.plots import Annotator, colors, save_one_box
from utils.segment.general import masks2segments, process_mask, process_mask_native
from utils.torch_utils import select_device, smart_inference_mode

#boxmot
import cv2
import numpy as np
from pathlib import Path
from boxmot import DeepOCSORT

tracker = DeepOCSORT(
model_weights=Path('osnet_x0_25_msmt17.pt'), # which ReID model to use
device='cuda:0',
iou_threshold= 0.3,
fp16=False,
)

@smart_inference_mode()
def run(
weights=ROOT / '/home/deng/Documents/yolov5/runs/train-seg/exp56/weights/best.pt', # model.pt path(s)
source=ROOT / '/home/deng/Documents/yolov8_tracking-master/worm0809videos_und/und_0809_600s_1.mp4', # file/dir/URL/glob/screen/0(webcam)
data=ROOT / '/home/deng/Documents/yolov5/data/0626_YOLODataset/dataset.yaml', # dataset.yaml path
imgsz=(640, 640), # inference size (height, width)
conf_thres=0.25, # confidence threshold
iou_thres=0.45, # NMS IOU threshold
max_det=1000, # maximum detections per image
device='', # cuda device, i.e. 0 or 0,1,2,3 or cpu
view_img=False, # show results
save_txt=False, # save results to *.txt
save_conf=False, # save confidences in --save-txt labels
save_crop=False, # save cropped prediction boxes
nosave=False, # do not save images/videos
classes=None, # filter by class: --class 0, or --class 0 2 3
agnostic_nms=False, # class-agnostic NMS
augment=False, # augmented inference
visualize=False, # visualize features
update=False, # update all models
project=ROOT / 'runs/predict-seg', # save results to project/name
name='exp', # save results to project/name
exist_ok=False, # existing project/name ok, do not increment
line_thickness=3, # bounding box thickness (pixels)
hide_labels=False, # hide labels
hide_conf=False, # hide confidences
half=False, # use FP16 half-precision inference
dnn=False, # use OpenCV DNN for ONNX inference
vid_stride=1, # video frame-rate stride
retina_masks=False,
):
source = str(source)
save_img = not nosave and not source.endswith('.txt') # save inference images
is_file = Path(source).suffix[1:] in (IMG_FORMATS + VID_FORMATS)
is_url = source.lower().startswith(('rtsp://', 'rtmp://', 'http://', 'https://'))
webcam = source.isnumeric() or source.endswith('.streams') or (is_url and not is_file)
screenshot = source.lower().startswith('screen')
if is_url and is_file:
source = check_file(source) # download

# Directories
save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)  # increment run
(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

# Load model
device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
stride, names, pt = model.stride, model.names, model.pt
imgsz = check_img_size(imgsz, s=stride)  # check image size

# Dataloader
bs = 1  # batch_size
if webcam:
    view_img = check_imshow(warn=True)
    dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
    bs = len(dataset)
elif screenshot:
    dataset = LoadScreenshots(source, img_size=imgsz, stride=stride, auto=pt)
else:
    dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
vid_path, vid_writer = [None] * bs, [None] * bs

# Run inference
model.warmup(imgsz=(1 if pt else bs, 3, *imgsz))  # warmup
seen, windows, dt = 0, [], (Profile(), Profile(), Profile())
for path, im, im0s, vid_cap, s in dataset:
    with dt[0]:
        im = torch.from_numpy(im).to(model.device)
        im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32
        im /= 255  # 0 - 255 to 0.0 - 1.0
        if len(im.shape) == 3:
            im = im[None]  # expand for batch dim

    # Inference
    with dt[1]:
        visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
        pred, proto = model(im, augment=augment, visualize=visualize)[:2]

    # NMS
    with dt[2]:
        pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det, nm=32)

    # Second-stage classifier (optional)
    # pred = utils.general.apply_classifier(pred, classifier_model, im, im0s)

    # Process predictions
    for i, det in enumerate(pred):  # per image
        seen += 1
        if webcam:  # batch_size >= 1
            p, im0, frame = path[i], im0s[i].copy(), dataset.count
            s += f'{i}: '
        else:
            p, im0, frame = path, im0s.copy(), getattr(dataset, 'frame', 0)

        p = Path(p)  # to Path
        save_path = str(save_dir / p.name)  # im.jpg
        txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # im.txt
        s += '%gx%g ' % im.shape[2:]  # print string
        imc = im0.copy() if save_crop else im0  # for save_crop
        annotator = Annotator(im0, line_width=line_thickness, example=str(names))
        if len(det):
            if retina_masks:
                # scale bbox first the crop masks
                det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()  # rescale boxes to im0 size
                masks = process_mask_native(proto[i], det[:, 6:], det[:, :4], im0.shape[:2])  # HWC
            else:
                masks = process_mask(proto[i], det[:, 6:], det[:, :4], im.shape[2:], upsample=True)  # HWC
                det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()  # rescale boxes to im0 size

            # Segments
            if save_txt:
                segments = [
                    scale_segments(im0.shape if retina_masks else im.shape[2:], x, im0.shape, normalize=True)
                    for x in reversed(masks2segments(masks))]

            # Print results
            for c in det[:, 5].unique():
                n = (det[:, 5] == c).sum()  # detections per class
                s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

            # Mask plotting
            annotator.masks(
                masks,
                colors=[colors(x, True) for x in det[:, 5]],
                im_gpu=torch.as_tensor(im0, dtype=torch.float16).to(device).permute(2, 0, 1).flip(0).contiguous() /
                255 if retina_masks else im[i])
            
            # id = 0
            dets = np.empty((0,6))
            # print(dets.shape)
            # Write results

            for j, (*xyxy, conf, cls) in enumerate(reversed(det[:, :6])):
                if save_txt:  # Write to file
                    seg = segments[j].reshape(-1)  # (n,2) to (n*2)
                    line = (cls, *seg, conf) if save_conf else (cls, *seg)  # label format
                    with open(f'{txt_path}.txt', 'a') as f:
                        f.write(('%g ' * len(line)).rstrip() % line + '\n')

                if save_img or save_crop or view_img:  # Add bbox to image
                    c = int(cls)  # integer class
                    label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
                    annotator.box_label(xyxy, label, color=colors(c, True))
                    # annotator.draw.polygon(segments[j], outline=colors(c, True), width=3)
                if save_crop:
                    save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR=True)

                x1y1 = [xyxy[0].item(), xyxy[1].item()]   # Extract x1, y1 values from the list
                x2y2 = [xyxy[2].item(), xyxy[3].item()]   # Extract x2, y2 values from the list

                # Create the final array
                det = np.array([
                    *x1y1,
                    *x2y2,
                    conf.item(),
                    cls.item()
                ])

                dets = np.vstack((dets, det))
            print(dets)

            # im_np = im0
            # im_reshaped = np.squeeze(im_np)  # Remove singleton dimension
            # im_np = np.transpose(im_reshaped, (1, 2, 0))
            # print(im_np.shape)

            # Ensure the array is in the correct data type (usually uint8 for images)
            # im_np = im_np.astype(np.uint8)
            # print(im_np)

            # Update tracker with np.ndarray image
            tracks = tracker.update(dets, im0)  # --> (x, y, x, y, id, conf, cls)
            # print(tracks)

            xyxys = tracks[:, 0:4].astype('int')  # float64 to int
            ids = tracks[:, 4].astype('int')  # float64 to int
            confs = tracks[:, 5]
            clss = tracks[:, 6]

            color = (0, 0, 255)  # BGR
            thickness = 2
            fontscale = 0.5

            # print bboxes with their associated id, cls and conf
            if tracks.shape[0] != 0:
                for xyxy, id, conf, cls in zip(xyxys, ids, confs, clss):
                    im0 = cv2.rectangle(
                        im,
                        (xyxy[0], xyxy[1]),
                        (xyxy[2], xyxy[3]),
                        color,
                        thickness
                    )
                    cv2.putText(
                        im,
                        f'id: {id}, conf: {conf}, c: {cls}',
                        (xyxy[0], xyxy[1]-10),
                        cv2.FONT_HERSHEY_SIMPLEX,
                        fontscale,
                        color,
                        thickness
                    )
            else:
                print('No tracks found!')

        # Stream results (Detection+Track)
        im0 = annotator.result()
        if view_img:
            if platform.system() == 'Linux' and p not in windows:
                windows.append(p)
                cv2.namedWindow(str(p), cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO)  # allow window resize (Linux)
                cv2.resizeWindow(str(p), im0.shape[1], im0.shape[0])
            cv2.imshow(str(p), im0)
            if cv2.waitKey(1) == ord('q'):  # 1 millisecond
                exit()

        # Save results (image with detections)
        if save_img:
            if dataset.mode == 'image':
                cv2.imwrite(save_path, im0)
            else:  # 'video' or 'stream'
                if vid_path[i] != save_path:  # new video
                    vid_path[i] = save_path
                    if isinstance(vid_writer[i], cv2.VideoWriter):
                        vid_writer[i].release()  # release previous video writer
                    if vid_cap:  # video
                        fps = vid_cap.get(cv2.CAP_PROP_FPS)
                        w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                        h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                    else:  # stream
                        fps, w, h = 30, im0.shape[1], im0.shape[0]
                    save_path = str(Path(save_path).with_suffix('.mp4'))  # force *.mp4 suffix on results videos
                    vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
                vid_writer[i].write(im0)

    # Print time (inference-only)
    LOGGER.info(f"{s}{'' if len(det) else '(no detections), '}{dt[1].dt * 1E3:.1f}ms")

# Print results
t = tuple(x.t / seen * 1E3 for x in dt)  # speeds per image
LOGGER.info(f'Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {(1, 3, *imgsz)}' % t)
if save_txt or save_img:
    s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
    LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}{s}")
if update:
    strip_optimizer(weights[0])  # update model (to fix SourceChangeWarning)

def parse_opt():
parser = argparse.ArgumentParser()
parser.add_argument('--weights', nargs='+', type=str, default=ROOT / '/home/deng/Documents/yolov5/runs/train-seg/exp51/weights/best.pt', help='model path(s)')
parser.add_argument('--source', type=str, default=ROOT / '/home/deng/Documents/yolov8_tracking-master/worm0809videos_und/und_0809_600s_1.mp4', help='file/dir/URL/glob/screen/0(webcam)')
parser.add_argument('--data', type=str, default=ROOT / '/home/deng/Documents/yolov5/data/0626_YOLODataset/dataset.yaml', help='(optional) dataset.yaml path')
parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w')
parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold')
parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--view-img', action='store_true', help='show results')
parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
parser.add_argument('--save-crop', action='store_true', help='save cropped prediction boxes')
parser.add_argument('--nosave', action='store_true', help='do not save images/videos')
parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --classes 0, or --classes 0 2 3')
parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
parser.add_argument('--augment', action='store_true', help='augmented inference')
parser.add_argument('--visualize', action='store_true', help='visualize features')
parser.add_argument('--update', action='store_true', help='update all models')
parser.add_argument('--project', default=ROOT / 'runs/predict-seg', help='save results to project/name')
parser.add_argument('--name', default='exp', help='save results to project/name')
parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
parser.add_argument('--line-thickness', default=3, type=int, help='bounding box thickness (pixels)')
parser.add_argument('--hide-labels', default=False, action='store_true', help='hide labels')
parser.add_argument('--hide-conf', default=False, action='store_true', help='hide confidences')
parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference')
parser.add_argument('--dnn', action='store_true', help='use OpenCV DNN for ONNX inference')
parser.add_argument('--vid-stride', type=int, default=1, help='video frame-rate stride')
parser.add_argument('--retina-masks', action='store_true', help='whether to plot masks in native resolution')
opt = parser.parse_args()
opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1 # expand
print_args(vars(opt))
return opt

def main(opt):
check_requirements(ROOT / 'requirements.txt', exclude=('tensorboard', 'thop'))
run(**vars(opt))

if name == 'main':
opt = parse_opt()
main(opt)
`

The error message I received from the terminal show as followed

`2023-08-14 13:58:49.961 | SUCCESS | boxmot.appearance.reid_model_factory:load_pretrained_weights:207 - Successfully loaded pretrained weights from "osnet_x0_25_msmt17.pt"
2023-08-14 13:58:49.961 | WARNING | boxmot.appearance.reid_model_factory:load_pretrained_weights:211 - The following layers are discarded due to unmatched keys or layer size: ('classifier.weight', 'classifier.bias')
predict_boxmot: weights=/home/deng/Documents/yolov5/runs/train-seg/exp51/weights/best.pt, source=/home/deng/Documents/yolov8_tracking-master/worm0809videos_und/und_0809_600s_1.mp4, data=/home/deng/Documents/yolov5/data/0626_YOLODataset/dataset.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/predict-seg, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1, retina_masks=False
requirements: /home/deng/Documents/requirements.txt not found, check failed.
YOLOv5 🚀 v7.0-181-g3812a1a Python-3.9.0 torch-2.0.0 CUDA:0 (NVIDIA GeForce RTX 3080 Ti, 12045MiB)

Fusing layers...
Model summary: 330 layers, 88249583 parameters, 0 gradients, 264.0 GFLOPs
[[ 606 3 706 212 0.32379 1]
[ 1131 1024 1322 1058 0.64214 1]
[ 73 1355 106 1414 0.77086 0]
[ 421 131 604 195 0.79974 1]
[ 591 201 739 268 0.82485 1]
[ 565 924 642 1001 0.89392 0]
[ 433 473 546 657 0.90461 0]
[ 729 341 825 426 0.90699 0]
[ 740 73 773 233 0.91203 0]
[ 432 475 545 660 0.93087 1]
[ 861 287 1077 373 0.93214 0]
[ 352 327 433 505 0.9368 1]
[ 641 386 751 555 0.94841 0]
[ 497 792 583 995 0.95206 0]]
Traceback (most recent call last):
File "/home/deng/Documents/yolov5_basic/predict_boxmot.py", line 356, in
main(opt)
File "/home/deng/Documents/yolov5_basic/predict_boxmot.py", line 351, in main
run(**vars(opt))
File "/home/deng/miniconda3/envs/yolov5_worm/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/deng/Documents/yolov5_basic/predict_boxmot.py", line 252, in run
im0 = cv2.rectangle(
cv2.error: OpenCV(4.6.0) 👎 error: (-5:Bad argument) in function 'rectangle'

Overload resolution failed:

img is not a numpy array, neither a scalar

Expected Ptrcv::UMat for argument 'img'

img is not a numpy array, neither a scalar

Expected Ptrcv::UMat for argument 'img'
`

Thanks for helping!

mikel-brostrom · 2023-08-14T06:16:30Z

mikel-brostrom
Aug 14, 2023
Maintainer

Seems like you are using an old version of this repo. I would strongly recommend to start from a clean slate as both requirements and repo got a complete rework 😄

0 replies

deng0326 · 2023-08-14T06:55:49Z

deng0326
Aug 14, 2023
Author

Hi Mike, Thanks for replying. I fixed the code, and it's now working properly. However, the results showed quite poor. Though I adjust the iou_threshold to 0.1 (which should make the tracking easier), the rate of tracking success is still extremely low. The tracking object is worms, which is not in the COCO dataset. I am wondering if it is the reason why the results is poor since I am applying model_weights=Path('osnet_x0_25_msmt17.pt') into the tracker.

PS: Would the following error message be the problem? (2023-08-14 13:58:49.961 | WARNING | boxmot.appearance.reid_model_factory:load_pretrained_weights:211 - The following layers are discarded due to unmatched keys or layer size: ('classifier.weight', 'classifier.bias'))

Thanks again!

2 replies

deng0326 Aug 14, 2023
Author

PS: How can I train a re-ID model using custom dataset?

mikel-brostrom Aug 14, 2023
Maintainer

The results largely depend on your video content. If the framerate is low and/or the objects more very fast you will have a lot of motion uncertainty. The Kalman Filters are not tuned for these cases

deng0326 · 2023-08-15T09:24:23Z

deng0326
Aug 15, 2023
Author

Thanks, Mikel. I assume that it is the low frame rate of my video that causes the tracker's poor result. Now I am wondering if training my own re-ID weight will help to improve the tracker... (The object I am tracking isn't in the COCODataset) How can I train a re-ID model using custom dataset? Thanks again for your help.

1 reply

mikel-brostrom Aug 16, 2023
Maintainer

What will help the most is either to tune the KFs or higher your frame rate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom datasets applying to predict.py issue #1075

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Custom datasets applying to predict.py issue #1075

deng0326 Aug 14, 2023

Replies: 3 comments · 3 replies

mikel-brostrom Aug 14, 2023 Maintainer

deng0326 Aug 14, 2023 Author

deng0326 Aug 14, 2023 Author

mikel-brostrom Aug 14, 2023 Maintainer

deng0326 Aug 15, 2023 Author

mikel-brostrom Aug 16, 2023 Maintainer

deng0326
Aug 14, 2023

Replies: 3 comments 3 replies

mikel-brostrom
Aug 14, 2023
Maintainer

deng0326
Aug 14, 2023
Author

deng0326 Aug 14, 2023
Author

mikel-brostrom Aug 14, 2023
Maintainer

deng0326
Aug 15, 2023
Author

mikel-brostrom Aug 16, 2023
Maintainer