Giter Club home page Giter Club logo

deep_sort_realtime's Introduction

Deep SORT

Introduction

A more realtime adaptation of Deep SORT.

Adapted from the official repo of Simple Online and Realtime Tracking with a Deep Association Metric (Deep SORT)

See their paper for more technical information.

Dependencies

requirements.txt gives the default packages required (it installs torch/torchvision to use the default mobilenet embedder), modify accordingly.

Main dependencies are:

  • Python3
  • NumPy, pip install numpy
  • SciPy, pip install scipy
  • cv2, pip install opencv-python
  • (optional) Embedder requires Pytorch & Torchvision (for the default MobiletnetV2 embedder) or Tensorflow 2+
    • pip install torch torchvision
    • pip install tensorflow
  • (optional) Additionally, to use Torchreid embedder, torchreid Python package needs to be installed. You can follow installation guide on Torchreid's page. Without using conda, you can simply clone that repository and do a python3 -m pip install . from inside the repo.
  • (optional) To use CLIP embedder, pip install git+https://github.com/openai/CLIP.git

Install

  • from PyPI via pip3 install deep-sort-realtime or python3 -m pip install deep-sort-realtime
  • or, clone this repo & install deep-sort-realtime as a python package using pip or as an editable package if you like (-e flag)
cd deep_sort_realtime && pip3 install .
  • or, download .whl file in this repo's releases

Run

Example usage:

from deep_sort_realtime.deepsort_tracker import DeepSort
tracker = DeepSort(max_age=5)
bbs = object_detector.detect(frame) 
tracks = tracker.update_tracks(bbs, frame=frame) # bbs expected to be a list of detections, each in tuples of ( [left,top,w,h], confidence, detection_class )
for track in tracks:
    if not track.is_confirmed():
        continue
    track_id = track.track_id
    ltrb = track.to_ltrb()
  • To add project-specific logic into the Track class, you can make a subclass (of Track) and pass it in (override_track_class argument) when instantiating DeepSort.

  • Example with your own embedder/ReID model:

from deep_sort_realtime.deepsort_tracker import DeepSort
tracker = DeepSort(max_age=5)
bbs = object_detector.detect(frame) # your own object detection
object_chips = chipper(frame, bbs) # your own logic to crop frame based on bbox values
embeds = embedder(object_chips) # your own embedder to take in the cropped object chips, and output feature vectors
tracks = tracker.update_tracks(bbs, embeds=embeds) # bbs expected to be a list of detections, each in tuples of ( [left,top,w,h], confidence, detection_class ), also, no need to give frame as your chips has already been embedded
for track in tracks:
    if not track.is_confirmed():
        continue
    track_id = track.track_id
    ltrb = track.to_ltrb()

Getting bounding box of original detection

The original Track.to_* methods for retrieving bounding box values returns only the Kalman predicted values. However, in some applications, it is better to return the bb values of the original detections the track was associated to at the current round.

Here we added an orig argument to all the Track.to_* methods. If orig is flagged as True and this track is associated to a detection this update round, then the bounding box values returned by the method will be that associated to the original detection. Otherwise, it will still return the Kalman predicted values.

orig_strict argument in all the Track.to_* methods is only active when orig is True. Flagging orig_strict=True will mean it will output None when there's no original detection associated to this track at current frame, otherwise normally it will return Kalman predicted values.

Storing supplementary info of original detection

Supplementary info can be pass into the track from the detection. Detection class now has an others argument to store this and pass it to the associate track during update. Can be retrieved through Track.get_det_supplementary method. Can be passed in through others argument of DeepSort.update_tracks, expects to be a list with same length as raw_detections. Examples of when you will this includes passing in corresponding instance segmentation masks, to be consumed when iterating through the tracks output.

Polygon support

Other than horizontal bounding boxes, detections can now be given as polygons. We do not track polygon points per se, but merely convert the polygon to its bounding rectangle for tracking. That said, if embedding is enabled, the embedder works on the crop around the bounding rectangle, with area not covered by the polygon masked away.

When instantiating a DeepSort object (as in deepsort_tracker.py), polygon argument should be flagged to True. See DeepSort.update_tracks docstring for details on the polygon format. In polygon mode, the original polygon coordinates are passed to the associated track through the supplementary info.

Differences from original repo

  • Remove "academic style" offline processing style and implemented it to take in real-time detections and output accordingly.

  • Provides both options of using an in-built appearance feature embedder or to provide embeddings during update

  • Added pytorch mobilenetv2 as appearance embedder (tensorflow embedder is also available now too).

  • Added CLIP network from OpenAI as embedder (pytorch).

  • Skip nms completely in preprocessing detections if nms_max_overlap == 1.0 (which is the default), in the original repo, nms will still be done even if threshold is set to 1.0 (probably because it was not optimised for speed).

  • Now able to override the Track class with a custom Track class (that inherits from Track class) for custom track logic

  • Takes in today's date now, which provides date for track naming and facilities track id reset every day, preventing overflow and overly large track ids when system runs for a long time.

    from datetime import datetime
    today = datetime.now().date()
  • Now supports polygon detections. We do not track polygon points per se, but merely convert the polygon to its bounding rectangle for tracking. That said, if embedding is enabled, the embedder works on the crop around the bounding rectangle, with area not covered by the polygon masked away. Read more here.

  • The original Track.to_* methods for retrieving bounding box values returns only the Kalman predicted values. In some applications, it is better to return the bb values of the original detections the track was associated to at the current round. Added a orig argument which can be flagged True to get that. Read more here.

  • Added get_det_supplementary method to Track class, in order to pass detection related info through the track. Read more here.

  • [As of 2fad967] Supports background masking by giving instance mask to DeepSort.update_tracks. Read more here.

  • Other minor adjustments/optimisation of code.

Highlevel overview of source files in deep_sort (from original repo)

In package deep_sort is the main tracking code:

  • detection.py: Detection base class.
  • kalman_filter.py: A Kalman filter implementation and concrete parametrization for image space filtering.
  • linear_assignment.py: This module contains code for min cost matching and the matching cascade.
  • iou_matching.py: This module contains the IOU matching metric.
  • nn_matching.py: A module for a nearest neighbor matching metric.
  • track.py: The track class contains single-target track data such as Kalman state, number of hits, misses, hit streak, associated feature vectors, etc.
  • tracker.py: This is the multi-target tracker class.

Test

python3 -m unittest

Appearance Embedding Network

Pytorch Embedder (default)

Default embedder is a pytorch MobilenetV2 (trained on Imagenet).

For convenience (I know it's not exactly best practice) & since the weights file is quite small, it is pushed in this github repo and will be installed to your Python environment when you install deep_sort_realtime.

TorchReID

Torchreid is a person re-identification library, and is supported here especially useful for extracting features of humans. Torchreid will need to be installed (see dependencies section above) It provides a zoo of models. Select model type to use, note the model name and provide as arguments. Download the corresponding model weights file on the model zoo site and point to the downloaded file. Model 'osnet_ain_x1_0' with domain generalized training on (MS+D+C) is provide by default, together with the corresponding weights. If embedder='torchreid' when initalizing DeepSort object without specifying embedder_model_name or embedder_wts, it will default to that.

from deep_sort_realtime.deepsort_tracker import DeepSort
tracker = DeepSort(max_age=5, embedder='torchreid')
bbs = object_detector.detect(frame) 
tracks = tracker.update_tracks(bbs, frame=frame) # bbs expected to be a list of detections, each in tuples of ( [left,top,w,h], confidence, detection_class )
for track in tracks:
    if not track.is_confirmed():
        continue
    track_id = track.track_id
    ltrb = track.to_ltrb()

CLIP

CLIP is added as another option of embedder due to its proven flexibility and generalisability. Download the CLIP model weights you want at deep_sort_realtime/embedder/weights/download_clip_wts.sh and store the weights at that directory as well, or you can provide your own CLIP weights through embedder_wts argument of the DeepSort object.

Tensorflow Embedder

Available now at deep_sort_realtime/embedder/embedder_tf.py, as alternative to (the default) pytorch embedder. Tested on Tensorflow 2.3.1. You need to make your own code change to use it.

The tf MobilenetV2 weights (pretrained on imagenet) are not available in this github repo (unlike the torch one). Download from this link or run download script. You may drop it into deep_sort_realtime/embedder/weights/ before pip installing.

Background Masking

If instance mask is given during DeepSort.update_tracks with no external appearance embeddings given, the mask will be used to mask out the background of the corresponding detection crop so that only foreground information goes into the embedder. This reduces background bias.

Example

Example cosine distances between images in ./test/ ("diff": rock vs smallapple, "close": smallapple vs smallapple slightly augmented)

.Testing pytorch embedder
close: 0.012196660041809082 vs diff: 0.4409685730934143

.Testing Torchreid embedder
Model: osnet_ain_x1_0
- params: 2,193,616
- flops: 978,878,352
Successfully loaded pretrained weights from "/Users/levan/Workspace/deep_sort_realtime/deep_sort_realtime/embedder/weights/osnet_ain_ms_d_c_wtsonly.pth"
close: 0.012312591075897217 vs diff: 0.4590487480163574

deep_sort_realtime's People

Contributors

abewley avatar ang-zy avatar dinohubber avatar jxk20 avatar levan92 avatar nwojke avatar yhsmiley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deep_sort_realtime's Issues

Update distance metric module (partial fit) only considers latest feature vectors

track.features = [track.features[-1]]

When preparing the list of feature vectors per confirmed track, the track.features list is updated with the latest vector; hence
the features vector history is lost in confirmed tracks. When computing the distance, the new detections are compared to the confirmed tracks with at most: 1. the latest 2 feature vectors (an already confirmed track), or 2. min_hit feature vectors (when the track is confirmed the first time).

Do you think this is an intended behavior? The nn_budget parameter is then somehow overridden. Comparing new detections to the few latest vectors in the confirmed tracks reduces the association performance; however, not doing so will increase the tracking time.

cv2.imshow broken when using tracker

Hello. When i use the realtime deepsort tracker, the program freezes when trying to run cv2.imshow. If i remove the function, the code runs fine. Any help?

Training example

I want to make custom PyTorch model, is here any example with training code?

Question about embedders

Thanks for sharing your code.
In original DeepSORT implementation, they use Cosine Metric Learning for appearance feature embedding generation.
However, in this repo, you used mobilenet, torchreid and clip as embedder models. I guess these models are pre-trained on a general ImageNet dataset and therefore, provide more "general" embeddings.
Have you compared the embedding (and/or tracking) results between cosine metric learning and the embedder models used here?
Does it make sense to use CLIP like embedder for specific objects such as a plant in a greenhouse or would it be better to train metric learning model?

Implementation of Deep sort with yolo as an object detection

# %%
import cv2
from deep_sort_realtime.deepsort_tracker import DeepSort

# %%
from ultralytics import YOLO

# %%
cap = cv2.VideoCapture('840_iSXIa0hE8Ek_seg.mp4')

# Read the first frame
ret, frame = cap.read()

# %%
model=YOLO('yolov8n.pt')

# %%
results=model(frame)

# %%
results[0].boxes

# %%
import numpy as np

# %%
answers=results[0].to('cpu')

# %%
boxes = answers.boxes
bbs=[]
for box in boxes[2]:
    temp_answer={}
    pred_conf = float(box.conf)
    # print(int(box.cls))
    bbox = np.array(box.xywh[0])
    x1, y1, x2, y2 = float(bbox[0]), float(bbox[1]), float(bbox[2]), float(bbox[3])
    bbs.append(([x1,y1,x2,y2],pred_conf,int(box.cls)))
    # print(x1,y1,x2,y2)

# %%
bbs

# %%
tracker = DeepSort(max_age=2,n_init=1,embedder="clip_ViT-B/16")

# %%
tracks=tracker.update_tracks(bbs,frame=frame)

# %%
for track in tracks:
    print(track.is_confirmed())
    if not track.is_confirmed():
        print("yo")
        continue
    track_id = track.track_id
    print(track_id)
    ltrb = track.to_ltrb()
    tlwh = track.to_tlwh()
    tlbr = track.to_tlbr()
    xyah = track.to_xyah()
    print(ltrb)

# %%

I was able to get the bounding boxes properly, and even put them in the format required yet I keep getting track.is_confirmed() as false, is there something that I am missing ?

Falling to implement deep -sort-realtime with YOLOv8

Hi there,
I'm trying to impement tracking for a specific use case, but before i was just trying to make the tracking work with yolov8 for a video , here's the whole code :

import cv2
from ultralytics import YOLO
from deep_sort_realtime.deepsort_tracker import DeepSort

object_tracker = DeepSort(max_age=5,
                          n_init=2,
                          nms_max_overlap=1.0,
                          max_cosine_distance=0.3,
                          nn_budget=None,
                          override_track_class=None,
                          embedder="mobilenet",
                          half=True,
                          bgr=True,
                          embedder_gpu=True,
                          embedder_model_name=None,
                          embedder_wts=None,
                          polygon=False,
                          today=None)

cap = cv2.VideoCapture("Videos/ASMR KITTY CRUNCH Dry Food Feast _ Extreme Cat Eating Sounds _  고양이 + 먹방.mp4")
model = YOLO("Yolo-Weights/yolov8s.pt")
classes = model.names
print(classes)
while True:
    success, image = cap.read()
    if success:

        results = model(image)

        for result in results:
            detections = []
            boxes = result.boxes
            for box in boxes:
                detection = []
                print("Only one Box:", box.boxes.cpu().numpy())
                r = box.boxes.cpu().numpy()[0]
                x1, y1, x2, y2 = r[:4]
                w, h = x2 - x1, y2 - y1
                coordinates = list((int(x1), int(x2), int(w), int(h)))
                conf = r[4]
                clsId = int(r[5])
                cls = classes[clsId]
                detection.extend((coordinates, conf, cls))
                detection = tuple(detection)
                detections.append(detection)
                print("detection: ", detection)

                print("r: ", r)
            print("detections: ", detections)
        tracks = object_tracker.update_tracks(detections, frame=image)

        cv2.imshow("Image", image)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()

I'm too much confused because i'm sure of the input format i give to the tracker , which is like this : detections: [([495, 794, 299, 167], 0.9417956, 'bowl'), ([111, 921, 810, 597], 0.9322504, 'cat'), ([75, 1126, 1050, 92], 0.33967713, 'dining table')]
I think this is the right format of input we should give, i tried also to initialize the tracker with only the max_age hyperparameter only but i still encounter the same error :

image

please any help !!!!

Matrix contains invalid numeric entries

I have been following a tutorial from this website Deepsort I however I am facing the following error

File "D:\Drive D\LearningML\Work\Experience.Digital\PeopleCounter\test_video.py", line 32, in
tracks = object_tracker.update_tracks(detections, frame=image)
File "C:\Users\91876\anaconda3\lib\site-packages\deep_sort_realtime\deepsort_tracker.py", line 228, in update_tracks
self.tracker.update(detections, today=today)
File "C:\Users\91876\anaconda3\lib\site-packages\deep_sort_realtime\deep_sort\tracker.py", line 97, in update
matches, unmatched_tracks, unmatched_detections = self._match(detections)
File "C:\Users\91876\anaconda3\lib\site-packages\deep_sort_realtime\deep_sort\tracker.py", line 151, in _match
) = linear_assignment.matching_cascade(
File "C:\Users\91876\anaconda3\lib\site-packages\deep_sort_realtime\deep_sort\linear_assignment.py", line 147, in matching_cascade
matches_l, _, unmatched_detections = min_cost_matching(
File "C:\Users\91876\anaconda3\lib\site-packages\deep_sort_realtime\deep_sort\linear_assignment.py", line 65, in min_cost_matching
indices = np.vstack(linear_sum_assignment(cost_matrix)).T
ValueError: matrix contains invalid numeric entries

Clarification on how tracking works

The detector I am using is quite slow, so I am wondering if this can fill in the frame gaps between detections.

To clarify, given an initial set of object detections, can I call tracker.update_tracks with the initial stale bboxes but with the new frame to track objects? Or am I supposed to use the Kalman-predicted bboxes?

Thanks for making this library btw.

ValueError: shapes not aligned (when running deep sort with yolov8 object detector)

I tried yolov8 object detection , and deep sort object tracking to track vehicles , using the Nicolai Nielsen tutorials.I got this below error.

Trace if the error.

`ValueError Traceback (most recent call last)
in <cell line: 26>()
69 # Update tracker with bounding boxes
70 print(list1)
---> 71 tracks = object_tracker.update_tracks(list1,frame)
72
73 # Process each tracked object

8 frames
/usr/local/lib/python3.10/dist-packages/deep_sort_realtime/deepsort_tracker.py in update_tracks(self, raw_detections, embeds, frame, today, others, instance_masks)
226 # Update tracker.
227 self.tracker.predict()
--> 228 self.tracker.update(detections, today=today)
229
230 return self.tracker.tracks

/usr/local/lib/python3.10/dist-packages/deep_sort_realtime/deep_sort/tracker.py in update(self, detections, today)
95
96 # Run matching cascade.
---> 97 matches, unmatched_tracks, unmatched_detections = self._match(detections)
98
99 # Update track set.

/usr/local/lib/python3.10/dist-packages/deep_sort_realtime/deep_sort/tracker.py in _match(self, detections)
149 unmatched_tracks_a,
150 unmatched_detections,
--> 151 ) = linear_assignment.matching_cascade(
152 gated_metric,
153 self.metric.matching_threshold,

/usr/local/lib/python3.10/dist-packages/deep_sort_realtime/deep_sort/linear_assignment.py in matching_cascade(distance_metric, max_distance, cascade_depth, tracks, detections, track_indices, detection_indices)
145 continue
146
--> 147 matches_l, _, unmatched_detections = min_cost_matching(
148 distance_metric,
149 max_distance,

/usr/local/lib/python3.10/dist-packages/deep_sort_realtime/deep_sort/linear_assignment.py in min_cost_matching(distance_metric, max_distance, tracks, detections, track_indices, detection_indices)
60 return [], track_indices, detection_indices # Nothing to match.
61
---> 62 cost_matrix = distance_metric(tracks, detections, track_indices, detection_indices)
63 cost_matrix[cost_matrix > max_distance] = max_distance + 1e-5
64 # indices = linear_assignment(cost_matrix)

/usr/local/lib/python3.10/dist-packages/deep_sort_realtime/deep_sort/tracker.py in gated_metric(tracks, dets, track_indices, detection_indices)
131 features = np.array([dets[i].feature for i in detection_indices])
132 targets = np.array([tracks[i].track_id for i in track_indices])
--> 133 cost_matrix = self.metric.distance(features, targets)
134 cost_matrix = linear_assignment.gate_cost_matrix(
135 self.kf, cost_matrix, tracks, dets, track_indices, detection_indices, only_position=self.gating_only_position

/usr/local/lib/python3.10/dist-packages/deep_sort_realtime/deep_sort/nn_matching.py in distance(self, features, targets)
172 cost_matrix = np.zeros((len(targets), len(features)))
173 for i, target in enumerate(targets):
--> 174 cost_matrix[i, :] = self._metric(self.samples[target], features)
175 return cost_matrix

/usr/local/lib/python3.10/dist-packages/deep_sort_realtime/deep_sort/nn_matching.py in _nn_cosine_distance(x, y)
93
94 """
---> 95 distances = _cosine_distance(x, y)
96 return distances.min(axis=0)
97

/usr/local/lib/python3.10/dist-packages/deep_sort_realtime/deep_sort/nn_matching.py in _cosine_distance(a, b, data_is_normalized)
52 a = np.asarray(a) / np.linalg.norm(a, axis=1, keepdims=True)
53 b = np.asarray(b) / np.linalg.norm(b, axis=1, keepdims=True)
---> 54 return 1.0 - np.dot(a, b.T)
55
56

ValueError: shapes (2,1280,3) and (3,1280,2) not aligned: 3 (dim 2) != 1280 (dim 1)`

code snippet:

while True :
ret, frame = cap.read()
print("frame:", frame.shape)
if not ret:
break
count += 1
# Perform object detection on the frame (assuming you have the model defined somewhere)
results = model.predict(frame, save=True, conf=0.25)
annotated=results[0].plot()
cv2_imshow(annotated)
z=results[0].boxes
list1 = []

cords=z.xywhn.cpu().numpy()
confs=list(z.conf)
cls_ids=(z.cls)
for i in range(len(cords)):
    x1=cords[i][0]
    y1=cords[i][1]
    w=cords[i][2]
    h=cords[i][3]
    c=(confs[i].cpu()).item()
    d=str(cls_ids[i].item())
    list1.append(([x1,y1,w,h],c,d))             #([left,top,w,h],confidence,class_name)
# Update tracker with bounding boxes
print(list1)
#here I am getting the error
tracks = object_tracker.update_tracks(list1,frame)  
# Process each tracked object
print(tracks)
for track in tracks:

    if not track.is_confirmed():
        continue
    tk_id = int(track.track_id)
    ltrb = track.to_ltrb()
    print(ltrb)
    x3,y3,x4,y4=ltrb
    cx=int(x3+x4)//2
    cy=int(y3+y4)//2

Deep sort Initialisation:

from deep_sort_realtime.deepsort_tracker import DeepSort

object_tracker = DeepSort(max_age=5,
n_init=2,
nms_max_overlap=0.9,
max_cosine_distance=0.3,
nn_budget=None,
override_track_class = None,
embedder="mobilenet",
half=True,
bgr=True,
embedder_gpu=True,
embedder_model_name=None,
embedder_wts=None,
polygon=False,
today=None)

I am new to object tracking, provide me with the solution or any article related to this error. Input image size is (720, 1280, 3) Thanks in advance.

Identity switch

Hi, I am using Yolo5 (Person class) with DeepSortRealtime. Tracking is happening but identity of person(s) switches even when really short occlusion occurs (between person-person, person-object). I have checked the result with CLIP embedder, max_age parameter, no luck. Can anyone provide some suggestions on minimizing the identity switch?

A more in-depth understanding of the matching step

First of all, i would like to say thank you for making this implementation of the DeepSORT algorithm.

I'd like to use DeepSORT to track objects for a few frames since our detector is slow. I saw a new commit to the repo which allows to call update_tracks() without detections. My guess is that, in that case, the algorithm predicts the bounding box from previous kalman filter estimations (by calling predict()), and doesn't run the matching cascade (the update() functions doesn't do anything as there are no detections), so each track simply follow the kalman estimation until a new detection is found. When that happens, the algorithm runs the kalman filter again and matches tracks with detections. Is that right?

Another issue i had with the original paper is the use of the Mahalanobis distance. From what i understood the algorithm runs the matching cascade using both the mahalanobis distance and the deep appearance metric obtained from the pretrained CNN, then calculates IoU for matching unmatched tracks from the previous step, is that right?

Thank you for your help.

tracked object last location

Hi, I'm trying to find last location of tracked object on IP Camera with adding object id and tracked loc taked from to_tlbr() function than grouped locations according to id's and access last location of the object according to the id. Because its leave and last loc always updated I do not handle with this problem. Could you guide me please, I'm lost :)
Thanks for your time..

yolo detections with your deepsort trackers

I am using yolo :
I've get the detection:

            detections = darknet.detect_image(network, class_names, darknet_image, thresh=args.thresh)
            for label, confidence, bbox in detections:

                  left, top, right, bottom = bbox2points(bbox)
                  if big_area_bboxs(left, top, right, bottom, 608, 608) == 0:
                      continue
                  confidence_scores.append(confidence)
                  bboxs.append([left, top, bbox[2], bbox[3]])

then work with the tracker:

        trackers = deepsort.update_tracks(bboxs, frame=frame_resized)
        bb_dict_curr_frame = {}
        print("trackers outside the loop: ", trackers)
        for tracker in trackers:
            print("tracker inside the loop: ", tracker)
            id_num = str(tracker.track_id)
            print("ID num: ", id_num)
            track_bb = tracker.to_tlbr()

And I got this error:

Error is: raw_detections = [d for d in raw_detections if d[0][2] > 0 and d[0][3] > 0] TypeError: 'int' object is not subscriptable

Is there something I missed ?
Or the tracker needs another arguments with another shape

custom max_iou_distance

The max_iou_distance parameter in the Tracker initializer is always 0.7, since the DeepSort instance sets self.tracker without specifying an explicit value for that parameter; hence the default value is used.

Could someone let max_iou_distance customizable through an additional DeepSort init parameter, just like max_age, n_init and so on?

This change would be very useful if deep_sort_realtime is imported after having installed it using setup.py

Thank you very much

np.float deprecated

In the latest version of deep_sort_realtime (1.3.1) within deepsort/detection.py there was the use of np.float which was deprecated in numpy version 1.20. I saw that the master branch already addressed this issue. Is there a release planned in the near future?

Thanks!

Typo in example

tracker = DeepSort(max_age=30, nn_budget=70, override_track_class=None)
bbs = object_detector.detect(frame)
tracks = **trackers**.update_tracks(bbs, frame=frame)

should be:

tracker = DeepSort(max_age=30, nn_budget=70, override_track_class=None)
bbs = object_detector.detect(frame)
tracks = **tracker**.update_tracks(bbs, frame=frame)

trackers is not defined.

Order of 'track.to_ltrb()' values

I am using the tracker as follows:


tracker = DeepSort(max_age=5)

while (capture.isOpened()):
    ret, frame = capture.read()
    if not ret:
        break
    else:
        result = model.detect([frame], verbose=0)[0] # MRCNN detector
        number_of_detections = len(result['rois'])
        
        # making bbs structure as required by the tracker
        bbs = []
        for i in range(number_of_detections):
            y1, x1, y2, x2 = result['rois'][i]
            scr = result['scores'][i]
            cls = result['class_ids'][i]
            a_tuple = ([x1, y1, x2, y2], scr, cls)
            bbs.append(a_tuple)
        


        if len(bbs) > 0:
            tracks = tracker.update_tracks(bbs, frame=frame)
            for track in tracks:
                if not track.is_confirmed():
                    continue
                track_id = track.track_id
                ltrb = track.to_ltrb(orig=True)
                x_1, y_1, x_2, y_2 = ltrb
                frame = cv2.rectangle(frame, (int(x_1), int(y_1)), (int(x_2), int(y_2)), (0,0,255), 2)
                frame = cv2.putText(frame, track_id, (int(x_1), int(y_1)), cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0,0,255), thickness)
        resized = cv2.resize(frame, (width - 10, height - 10))
        output.write(resized) # writing video frames to disk
capture.release()
output.release()

By manually debugging the ltrb = track.to_ltrb() I can see that it is returning bbox coordinates. Can you please tell me the order of the bbox coordinates (i.e. is each of them in x1, y1, x2, y2 order)? And why tracked bboxes are larger than original?

is any training required for my dataset? because, I have car, truck, bike.

Hi, Firstly, thanks for this wonderful package. I have multiple questions.

  1. Do I need to train the deep sort model for my dataset? or can I directly use your package?

  2. I have a question, in your example, you have mentioned only the bbox

from deep_sort_realtime.deepsort_tracker import DeepSort
tracker = DeepSort(max_age=30, nn_budget=70, override_track_class=None)
bbs = object_detector.detect(frame)
tracks = tracker.update_tracks(bbs, frame=frame) # here you have mentioned only the bbox
for track in tracks:
   track_id = track.track_id
   ltrb = track.to_ltrb()

but, in the issue, you have asked the questioner to add all the arguments like bbox, conf score, label, could you please explain us a bit more clearer?
#11 (comment)

  1. another question, is it mandatory to pass the bbox in [left,top,w,h] in this format? because my output from the model is [xmin, ymin, xmax, ymax]. Do you want me to convert the coordinates?
    ( [left,top,w,h] , confidence, detection_class)

is it possible for you to provide some practical demo?

Multiple Images or Cameras

Hello, I have questions about, How to implement deep sort with multiple images or cameras with different views? Because the readme doesn't have a configuration for multiple images or cameras with different views, just only single images or cameras. Thanks

Deep sort remembering bad state

I have flask API that takes base64 image from the client, converts it to jpeg and does yolov5 recognition on it. It's purpose is to detect humans/bottles. I was testing on a bottle first to see how it will behave.

saveImage = ''

data = request.get_json()
base64_str = data.get('image', '')
if not base64_str:
    return jsonify({'error': 'Invalid or missing image data.'}), 400

try:
    image_data = base64.b64decode(base64_str)
    image = Image.open(BytesIO(image_data))
 
    results = model(image)

    results.ims 
    rezultati = results.render()

    frame1 = np.ones((1080, 1088, 3), dtype=np.uint8) * 255

    bounding_boxes = results.xyxy[0].cpu().numpy()
    converted_detections2 = []

    bounding_box_deepsort = []
    
    for detection in results.xyxy[0].cpu().numpy():  # Iterate through all detections in the first image
        # Each detection contains [x1, y1, x2, y2, confidence, class]
        x1, y1, x2, y2, confidence, class_id = detection
        #print(f"Bounding box coordinates: x1={x1}, y1={y1}, x2={x2}, y2={y2}")
        arrayCords = [x1, y1, x2-x1, y2-y1]
        print(detection)
        bounding_box_deepsort.append((arrayCords, confidence, class_id))
    
    print(bounding_box_deepsort)
    #print(bounding_boxes)
    
    img_base64 = ''
    newImage = ''
    for img in results.ims:
        buffered = BytesIO()
        img_base64 = Image.fromarray(img)
        img_base64.save(buffered, format="JPEG")
        saveImage = base64.b64encode(buffered.getvalue()).decode('utf-8')
        buffered.seek(0)
        newImage = Image.open(buffered)
            
    tracks = tracker.update_tracks(
        bounding_box_deepsort, frame=frame1
    )
    draw = ImageDraw.Draw(newImage)
            
    for track in tracks:
        tlwh = track.to_tlwh()

        #print("id: ", track.track_id)
        #print("tracks: ", tlwh)
        x1, y1, w, h = tlwh
        x2, y2 = x1 + w, y1 + h
        draw.rectangle([x1, y1, x2, y2], outline="red", width=3)

    buffered = BytesIO()
    newImage.save(buffered, format="JPEG")
    saveImage = base64.b64encode(buffered.getvalue()).decode('utf-8')
    
    print("---------------------------------------")
except Exception as e:
    print(str(e))
    print(jsonify({'error': f'An error occurred while processing the image: {str(e)}'}))
    return jsonify({'error': f'An error occurred while processing the image: {str(e)}'}), 500

return jsonify({'image': saveImage}), 200

The way i utilize tracker is as follows
GPU = torch.cuda.is_available() and not os.environ.get("USE_CPU") embedder = "mobilenet" embeds = None today = datetime.now().date() tracker = DeepSort( max_age=30 )

The issue is that i place a bottle in front of me and the bounding boxes i get are following:

1

After I move my camera away from the bottle, it still shows the bounding box.... Like following
1a

This also happens when camera looks at our street and then at one point, humans become like ghosts, bounding box aka DeepSort tracker shows bounding box despite them not being there... For example when truck passes them and it loses focus for a second, DeepSort shows 2 objects (1 reall human and 1"ghost").
I don't understand why is this happening? Is this something settings wise and how can i fix it?

I want to say that my camera is displaying 2-3 FPS if this matters anything at all.

Best regards

Tracking with one-time object detection (One-Shot object detection and tracking)

How to detect an object only once and track it in other frames?

while video1.isOpened():
    ret1, frame1 = video1.read()
    if not ret1 or cv2.waitKey(60)==27:
        break

    if FlagDetect:
        boxes= PersonDetection(frame1)
        FlagDetect=False
        tracks = tracker.update_tracks(boxes, frame=frame1)

    for track in tracks:
        ltrb= track.to_ltrb()
        frame1 = cv2.rectangle(frame1, (int(ltrb[0]), int(ltrb[1])), (int(ltrb[2]), int(ltrb[3])), (255,0,0), 5)

    cv2.imshow('video1', frame1
```)

My code is not working properly!

Make the cuda optional please

It would be nice to have a option to disable cuda for local gpu-free tests.
Thank you.

P.S. the half precision is not supported in the cpu version, that much be false in the cpu version.

custom embedder max-batch-size

The max_batch_size parameter in the Embedder initializer is always 16, as set by DeepSort.__init__ when Mobilenet or CLIP* is selected as embedder.

Could someone let max_batch_size customizable through an additional DeepSort init parameter, just like max_age, n_init and so on?

Thank you very much

index out of range

update_tracks
assert len(raw_detections[0][0])==4
IndexError: list index out of range

Track.to_ltwh() method doesn't seem to provide the result I would expect

I am currently using this package to implement DeepSORT tracking over YOLOv5 inferred bounding boxes, and I've found myself struggling a little bit with the results of the Track.to_* methods. I can undertstand, from the comments ("returns LIES", lol) that the Track.to_ltbr() method shouldn't be used, as the result is not what one would expect from its name. However, I have also some problems with the Track.to_tlwh(): the first two values returned don't seem to be the top left x and y coordinates, but more the centre of the box. In fact, visualizing the results obtained, the tracking actually matches the object in the image when the top left corner of the cv rectangle method is set according to my assumptions, while using the values returned by the Track.to_tlwh() without any further computation is not properly working. Is it possible that the method is not actually returnin what one would expect? Or am I misunderstanding something?

Thanks for the work you're doing providing us such a useful tool :)

Bug with the detection file

Hello,
I have an error when calling update_tracks
Here is the error returned:
AttributeError: module 'numpy' has no attribute 'float'

It is located in the detection.py file at line 35.

I use numpy version 1.24.1 and python version 3.9

Thank you in advance

Cuda version

I have a pre-built version of the opencv-contrib to run on the GPU and it's already working. However, I preferd using the deep_sort_realtime instead of the Deepsort it self.

But I could notice that the deep_sort_realtime uses the opencv-python instead of the opencv, which makes my script run on my CPU instead of GPU.

Below follows a simple script that is working already however only on my cpu.

Is there a way to build opencv-python for GPU as the oficial package?

import cv2
import numpy as np
from deep_sort_realtime.deepsort_tracker import DeepSort

# Load YOLO
net = cv2.dnn.readNet("yolov4.weights", "yolov4.cfg")
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers().flatten()]
classes = []
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]

# Initialize Deep SORT
deepsort = DeepSort(max_age=30)

cap = cv2.VideoCapture('./p3.mp4')  # Use 0 for webcam or replace with video file path

while True:
    ret, frame = cap.read()
    if not ret:
        break

    height, width, channels = frame.shape

    # Detecting objects with YOLO
    blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
    net.setInput(blob)
    outs = net.forward(output_layers)

    class_ids = []
    confidences = []
    boxes = []

    for out in outs:
        for detection in out:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > 0.5 and class_id == 0:  # Check if the detected class is 'person'
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)

    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
    if len(indexes) > 0 and isinstance(indexes, np.ndarray):
        detections = []
        for i in indexes.flatten():
            x, y, w, h = boxes[i]
            conf = confidences[i]
            detection_class = class_ids[i]
            detections.append(([x, y, w, h], conf, detection_class))

        # Update tracker with filtered boxes
        tracks = deepsort.update_tracks(detections, frame=frame)

        for track in tracks:
            if not track.is_confirmed():
                continue
            bbox = track.to_tlbr()
            track_id = track.track_id
            cv2.rectangle(frame, (int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])), (0, 255, 0), 2)
            cv2.putText(frame, f'ID: {track_id}', (int(bbox[0]), int(bbox[1]) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    cv2.imshow("Camera", frame)

    if cv2.waitKey(1) == ord("q"):
        break

cap.release()
cv2.destroyAllWindows()

Sample Code

Thank you for providing this library.
Please provide some sample code to use this library in object tracking on video.
I am still working hard with this library.

It seems more like a question than a problem. Real-time processing is not achieved.

Hi,
First of all, thank you for sharing the code.
I am trying to use this library in ROS.
Below is my code, but real-time processing is not achieved.

from deep_sort_realtime.deepsort_tracker import DeepSort
import numpy as np
from datetime import datetime

today = datetime.now().date()
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["VECLIB_MAXIMUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"

class ImageProcessor:
def init(self, max_age, embedder):
self.bridge = CvBridge()
self.tracker = DeepSort(max_age=max_age, embedder=embedder)
self.current_frame = None # Variable to store the most recent frame

    # Initialize publishers
    #self.pub = rospy.Publisher('online_targets', String, queue_size=10)
    self.pub_image = rospy.Publisher('processed_image', Image, queue_size=1)

    # Initialize subscribers
    rospy.Subscriber("/darknet_ros/bounding_boxes", BoundingBoxes, self.bounding_box_callback)
    rospy.Subscriber("/normalized_flipped_image", Image, self.image_callback)

def image_callback(self, data):
    try:
        cv_image = self.bridge.imgmsg_to_cv2(data, "bgr8")
    except CvBridgeError as e:
        print(e)
    self.current_frame = cv_image  # Update the current frame

def bounding_box_callback(self, data):
    # Transform bounding box data to the required format
    dets = []

    for i, box in enumerate(data.bounding_boxes):
        width = box.xmax - box.xmin
        height = box.ymax - box.ymin
        det = [(box.xmin, box.ymin, width, height), box.probability, box.id]
        dets.append(det)

    # Provide the current frame to the update_tracks method
    online_targets = self.tracker.update_tracks(dets, frame=self.current_frame)
    
    for track in online_targets:
        if not track.is_confirmed():
            continue
        track_id = track.track_id

        ltrb = track.to_ltrb().astype(int)  # Convert to integer
        cv2.rectangle(self.current_frame, (ltrb[0], ltrb[1]), (ltrb[2], ltrb[3]), (0, 255, 0), 2)  # Draw bounding box
        cv2.putText(self.current_frame, str(track_id), (ltrb[0], ltrb[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0,255,0), 2)  # Put track_id on top of bounding box

    # Convert the image with bounding boxes to a ROS image message and publish
    ros_image = self.bridge.cv2_to_imgmsg(self.current_frame, "bgr8")
    self.pub_image.publish(ros_image)

    # Here you might need to convert online_targets to a suitable type for publishing
    #self.pub.publish(online_targets)

def main(max_age,embedder):
rospy.init_node('image_processor', anonymous=True)
ip = ImageProcessor(max_age, embedder)
try:
rospy.spin()
except KeyboardInterrupt:
print("Shutting down")

if name == 'main':
# Initialize your max_age here
max_age = 30
embedder = "torchreid" #clip_ViT-B/16, torchreid
main(max_age, embedder)

Add versions to the dependecies, please

Having specific versions for each requirement would be nice. It would be great to have a requirements.txt on the repo.

Currently I'm having problems finding a suitable version of Pytorch for the repo.

Question about embeds

Do all forms of tracking require embeds?

I am asking because I am looking at your example below:

from deep_sort_realtime.deepsort_tracker import DeepSort
tracker = DeepSort(max_age=5)
bbs = object_detector.detect(frame) 
tracks = tracker.update_tracks(bbs, frame=frame) # bbs expected to be a list of detections, each in tuples of ( [left,top,w,h], confidence, detection_class )
for track in tracks:
    if not track.is_confirmed():
        continue
    track_id = track.track_id
    ltrb = track.to_ltrb()

Looking at your code I see all frames being funneled to a "generate_embeds()", however I don't see where the frame is given to the tracker similar to the above example.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.