Add node for object detection using yolo obb inference#45
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #45 +/- ##
=====================================
Coverage 0.00% 0.00%
=====================================
Files 3 3
Lines 144 144
Branches 14 14
=====================================
Misses 144 144
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This pull request introduces a new ROS 2 package for object detection using YOLO Oriented Bounding Box (OBB) models. The package subscribes to image topics, runs OBB inference, and publishes both structured detections with rotation information and annotated images.
Changes:
- Added
yolo_obb_object_detectionpackage with node implementation for OBB-based object detection - Implemented utility functions for OBB detection and visualization with fallback to axis-aligned boxes
- Configured launch files, parameters, and build system for ROS 2 integration
Reviewed changes
Copilot reviewed 8 out of 10 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| yolo_obb_object_detection/yolo_obb_object_detection/yolo_utils.py | Utility functions for loading YOLO models and processing frames with OBB detection |
| yolo_obb_object_detection/yolo_obb_object_detection/yolo_obb_object_detection_node.py | Main ROS 2 node that subscribes to images and publishes OBB detections and annotated outputs |
| yolo_obb_object_detection/requirements.txt | Python package dependencies (numpy and ultralytics) |
| yolo_obb_object_detection/package.xml | ROS 2 package metadata and dependencies |
| yolo_obb_object_detection/launch/yolo_obb_object_detection.launch.py | Launch file with device validation and parameter loading |
| yolo_obb_object_detection/config/yolo_obb_object_detection_params.yaml | Default parameters for model path, confidence threshold, and topics |
| yolo_obb_object_detection/README.md | Package documentation with usage instructions |
| yolo_obb_object_detection/CMakeLists.txt | CMake build configuration for the ROS 2 package |
| yolo_obb_object_detection/model/.gitkeep | Placeholder for model directory |
| yolo_obb_object_detection/yolo_obb_object_detection/init.py | Python package initialization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
|
|
||
| r = results[0] | ||
| annot = r.plot() |
There was a problem hiding this comment.
The node uses YOLO's built-in r.plot() method to generate the annotated image (line 83), ignoring the custom annotation logic implemented in yolo_utils._draw_obb() and yolo_utils.process_frame(). This makes the custom drawing functions in yolo_utils.py unused and redundant. Either use the custom drawing logic for consistency with what might be future customization needs, or remove the unused drawing code from yolo_utils.py.
| annot = r.plot() | |
| annot = yolo_utils.process_frame(frame, r) |
| ALLOWED_DEVICES = ['cpu', '0', 'cuda:0'] | ||
|
|
||
|
|
||
| def validate_device(device: str): | ||
| if device not in ALLOWED_DEVICES: | ||
| raise RuntimeError( | ||
| f"Invalid device '{device}'. Choose one of: {', '.join(ALLOWED_DEVICES)}" | ||
| ) |
There was a problem hiding this comment.
The device validation in ALLOWED_DEVICES is restrictive and may not cover all valid PyTorch/YOLO device specifications. Valid devices can include 'cuda:1', 'cuda:2', etc. for multi-GPU systems, 'mps' for Apple Silicon, and other formats. Consider either removing this validation to allow any device string, or expanding it to support the full range of valid device specifications (e.g., using a pattern like 'cuda:\d+').
| <exec_depend>python3-opencv</exec_depend> | ||
| <exec_depend>python3-numpy</exec_depend> |
There was a problem hiding this comment.
The package declares both system package dependencies (python3-opencv, python3-numpy via exec_depend) and specific pip dependencies (numpy==1.24.4 in requirements.txt). This dual dependency declaration could lead to version conflicts or confusion about which versions are actually used. Consider either using only pip dependencies (removing lines 17-18) or only system packages (removing requirements.txt), depending on your deployment strategy. For reference, the similar package 'yolo_object_detection' does not include these system dependencies in package.xml.
| <exec_depend>python3-opencv</exec_depend> | |
| <exec_depend>python3-numpy</exec_depend> |
| @@ -0,0 +1,23 @@ | |||
| # yolo_obb_object_detection | |||
| Real-time object detection using YOLOv26. This node subscribes to an image topic, runs inference, and publishes both structured detections and an annotated image with oriented bounding boxes (OBB). | |||
There was a problem hiding this comment.
The README mentions "YOLOv26" which does not exist. YOLO versions include YOLOv3, YOLOv4, YOLOv5, YOLOv8, YOLOv10, and YOLOv11, but there is no YOLOv26. This appears to be a typo and should likely be "YOLOv8" or the appropriate version being used.
| Real-time object detection using YOLOv26. This node subscribes to an image topic, runs inference, and publishes both structured detections and an annotated image with oriented bounding boxes (OBB). | |
| Real-time object detection using YOLO-based models. This node subscribes to an image topic, runs inference, and publishes both structured detections and an annotated image with oriented bounding boxes (OBB). |
| def process_frame(frame, model, conf, device): | ||
| results = model.predict(frame, conf=conf, verbose=False, device=device) | ||
| detections = [] | ||
| annotated = frame.copy() | ||
|
|
||
| for r in results: | ||
| # ---- Prefer OBB outputs if present ---- | ||
| if hasattr(r, "obb") and r.obb is not None: | ||
| obb = r.obb | ||
| xywhr = ( | ||
| obb.xywhr.cpu().numpy() | ||
| if hasattr(obb.xywhr, "cpu") | ||
| else np.asarray(obb.xywhr) | ||
| ) | ||
| confs = ( | ||
| obb.conf.cpu().numpy() | ||
| if hasattr(obb.conf, "cpu") | ||
| else np.asarray(obb.conf) | ||
| ) | ||
| clss = ( | ||
| obb.cls.cpu().numpy() | ||
| if hasattr(obb.cls, "cpu") | ||
| else np.asarray(obb.cls) | ||
| ) | ||
|
|
||
| for (cx, cy, w, h, theta), sc, cid in zip(xywhr, confs, clss): | ||
| cid_i = int(cid) | ||
| sc_f = float(sc) | ||
| theta_f = float(theta) | ||
|
|
||
| detections.append( | ||
| (float(cx), float(cy), float(w), float(h), theta_f, sc_f, cid_i) | ||
| ) | ||
|
|
||
| # Draw rotated box + label | ||
| box = _draw_obb(annotated, cx, cy, w, h, theta_f) | ||
| label = f"{model.names.get(cid_i, str(cid_i))} {sc_f:.2f}" | ||
|
|
||
| # Put label near first corner | ||
| x0, y0 = int(box[0][0]), int(box[0][1]) | ||
| cv2.putText( | ||
| annotated, | ||
| label, | ||
| (x0, max(y0 - 5, 0)), | ||
| cv2.FONT_HERSHEY_SIMPLEX, | ||
| 0.5, | ||
| (0, 255, 0), | ||
| 1, | ||
| cv2.LINE_AA, | ||
| ) | ||
| continue | ||
|
|
||
| # ---- Fallback to axis-aligned boxes ---- | ||
| if not hasattr(r, "boxes") or r.boxes is None: | ||
| continue | ||
|
|
||
| b = r.boxes | ||
| xyxy = b.xyxy.cpu().numpy() if hasattr(b.xyxy, "cpu") else np.asarray(b.xyxy) | ||
| confs = b.conf.cpu().numpy() if hasattr(b.conf, "cpu") else np.asarray(b.conf) | ||
| clss = b.cls.cpu().numpy() if hasattr(b.cls, "cpu") else np.asarray(b.cls) | ||
|
|
||
| for (x1, y1, x2, y2), sc, cid in zip(xyxy, confs, clss): | ||
| cid_i = int(cid) | ||
| sc_f = float(sc) | ||
|
|
||
| detections.append((float(x1), float(y1), float(x2), float(y2), sc_f, cid_i)) | ||
|
|
||
| p1, p2 = (int(x1), int(y1)), (int(x2), int(y2)) | ||
| cv2.rectangle(annotated, p1, p2, (0, 255, 0), 2) | ||
| label = f"{model.names.get(cid_i, str(cid_i))} {sc_f:.2f}" | ||
| cv2.putText( | ||
| annotated, | ||
| label, | ||
| (p1[0], max(p1[1] - 5, 0)), | ||
| cv2.FONT_HERSHEY_SIMPLEX, | ||
| 0.5, | ||
| (0, 255, 0), | ||
| 1, | ||
| cv2.LINE_AA, | ||
| ) | ||
|
|
||
| return detections, annotated |
There was a problem hiding this comment.
The function process_frame in yolo_utils.py is never used in the main node implementation. The node reimplements the detection logic directly in the on_image method. Either this function should be used to avoid code duplication, or it should be removed if it's not needed. Additionally, the process_frame function returns detections in a different format than what the node needs (tuples vs Detection2D messages), which may explain why it's not being used.
| def process_frame(frame, model, conf, device): | |
| results = model.predict(frame, conf=conf, verbose=False, device=device) | |
| detections = [] | |
| annotated = frame.copy() | |
| for r in results: | |
| # ---- Prefer OBB outputs if present ---- | |
| if hasattr(r, "obb") and r.obb is not None: | |
| obb = r.obb | |
| xywhr = ( | |
| obb.xywhr.cpu().numpy() | |
| if hasattr(obb.xywhr, "cpu") | |
| else np.asarray(obb.xywhr) | |
| ) | |
| confs = ( | |
| obb.conf.cpu().numpy() | |
| if hasattr(obb.conf, "cpu") | |
| else np.asarray(obb.conf) | |
| ) | |
| clss = ( | |
| obb.cls.cpu().numpy() | |
| if hasattr(obb.cls, "cpu") | |
| else np.asarray(obb.cls) | |
| ) | |
| for (cx, cy, w, h, theta), sc, cid in zip(xywhr, confs, clss): | |
| cid_i = int(cid) | |
| sc_f = float(sc) | |
| theta_f = float(theta) | |
| detections.append( | |
| (float(cx), float(cy), float(w), float(h), theta_f, sc_f, cid_i) | |
| ) | |
| # Draw rotated box + label | |
| box = _draw_obb(annotated, cx, cy, w, h, theta_f) | |
| label = f"{model.names.get(cid_i, str(cid_i))} {sc_f:.2f}" | |
| # Put label near first corner | |
| x0, y0 = int(box[0][0]), int(box[0][1]) | |
| cv2.putText( | |
| annotated, | |
| label, | |
| (x0, max(y0 - 5, 0)), | |
| cv2.FONT_HERSHEY_SIMPLEX, | |
| 0.5, | |
| (0, 255, 0), | |
| 1, | |
| cv2.LINE_AA, | |
| ) | |
| continue | |
| # ---- Fallback to axis-aligned boxes ---- | |
| if not hasattr(r, "boxes") or r.boxes is None: | |
| continue | |
| b = r.boxes | |
| xyxy = b.xyxy.cpu().numpy() if hasattr(b.xyxy, "cpu") else np.asarray(b.xyxy) | |
| confs = b.conf.cpu().numpy() if hasattr(b.conf, "cpu") else np.asarray(b.conf) | |
| clss = b.cls.cpu().numpy() if hasattr(b.cls, "cpu") else np.asarray(b.cls) | |
| for (x1, y1, x2, y2), sc, cid in zip(xyxy, confs, clss): | |
| cid_i = int(cid) | |
| sc_f = float(sc) | |
| detections.append((float(x1), float(y1), float(x2), float(y2), sc_f, cid_i)) | |
| p1, p2 = (int(x1), int(y1)), (int(x2), int(y2)) | |
| cv2.rectangle(annotated, p1, p2, (0, 255, 0), 2) | |
| label = f"{model.names.get(cid_i, str(cid_i))} {sc_f:.2f}" | |
| cv2.putText( | |
| annotated, | |
| label, | |
| (p1[0], max(p1[1] - 5, 0)), | |
| cv2.FONT_HERSHEY_SIMPLEX, | |
| 0.5, | |
| (0, 255, 0), | |
| 1, | |
| cv2.LINE_AA, | |
| ) | |
| return detections, annotated | |
| # Note: previously there was a `process_frame` helper here that duplicated | |
| # detection logic used directly in the main node. It has been removed to | |
| # avoid unused, duplicate code and to keep this utility module focused on | |
| # shared, actually-used helpers such as `_draw_obb`. |
|
|
||
| det_array.detections.append(det) | ||
| else: | ||
| self.get_logger().warn("No OBB output detected — is this an OBB model?") |
There was a problem hiding this comment.
Use get_logger().warning() instead of get_logger().warn(). The warn method is deprecated in Python's logging module and ROS 2's logger interface. The standard method is warning().
| self.get_logger().warn("No OBB output detected — is this an OBB model?") | |
| self.get_logger().warning("No OBB output detected — is this an OBB model?") |
| def process_frame(frame, model, conf, device): | ||
| results = model.predict(frame, conf=conf, verbose=False, device=device) | ||
| detections = [] | ||
| annotated = frame.copy() | ||
|
|
||
| for r in results: | ||
| # ---- Prefer OBB outputs if present ---- | ||
| if hasattr(r, "obb") and r.obb is not None: | ||
| obb = r.obb | ||
| xywhr = ( | ||
| obb.xywhr.cpu().numpy() | ||
| if hasattr(obb.xywhr, "cpu") | ||
| else np.asarray(obb.xywhr) | ||
| ) | ||
| confs = ( | ||
| obb.conf.cpu().numpy() | ||
| if hasattr(obb.conf, "cpu") | ||
| else np.asarray(obb.conf) | ||
| ) | ||
| clss = ( | ||
| obb.cls.cpu().numpy() | ||
| if hasattr(obb.cls, "cpu") | ||
| else np.asarray(obb.cls) | ||
| ) | ||
|
|
||
| for (cx, cy, w, h, theta), sc, cid in zip(xywhr, confs, clss): | ||
| cid_i = int(cid) | ||
| sc_f = float(sc) | ||
| theta_f = float(theta) | ||
|
|
||
| detections.append( | ||
| (float(cx), float(cy), float(w), float(h), theta_f, sc_f, cid_i) | ||
| ) | ||
|
|
||
| # Draw rotated box + label | ||
| box = _draw_obb(annotated, cx, cy, w, h, theta_f) | ||
| label = f"{model.names.get(cid_i, str(cid_i))} {sc_f:.2f}" | ||
|
|
||
| # Put label near first corner | ||
| x0, y0 = int(box[0][0]), int(box[0][1]) | ||
| cv2.putText( | ||
| annotated, | ||
| label, | ||
| (x0, max(y0 - 5, 0)), | ||
| cv2.FONT_HERSHEY_SIMPLEX, | ||
| 0.5, | ||
| (0, 255, 0), | ||
| 1, | ||
| cv2.LINE_AA, | ||
| ) | ||
| continue | ||
|
|
||
| # ---- Fallback to axis-aligned boxes ---- | ||
| if not hasattr(r, "boxes") or r.boxes is None: | ||
| continue | ||
|
|
||
| b = r.boxes | ||
| xyxy = b.xyxy.cpu().numpy() if hasattr(b.xyxy, "cpu") else np.asarray(b.xyxy) | ||
| confs = b.conf.cpu().numpy() if hasattr(b.conf, "cpu") else np.asarray(b.conf) | ||
| clss = b.cls.cpu().numpy() if hasattr(b.cls, "cpu") else np.asarray(b.cls) | ||
|
|
||
| for (x1, y1, x2, y2), sc, cid in zip(xyxy, confs, clss): | ||
| cid_i = int(cid) | ||
| sc_f = float(sc) | ||
|
|
||
| detections.append((float(x1), float(y1), float(x2), float(y2), sc_f, cid_i)) | ||
|
|
||
| p1, p2 = (int(x1), int(y1)), (int(x2), int(y2)) | ||
| cv2.rectangle(annotated, p1, p2, (0, 255, 0), 2) | ||
| label = f"{model.names.get(cid_i, str(cid_i))} {sc_f:.2f}" | ||
| cv2.putText( | ||
| annotated, | ||
| label, | ||
| (p1[0], max(p1[1] - 5, 0)), | ||
| cv2.FONT_HERSHEY_SIMPLEX, | ||
| 0.5, | ||
| (0, 255, 0), | ||
| 1, | ||
| cv2.LINE_AA, | ||
| ) | ||
|
|
||
| return detections, annotated |
There was a problem hiding this comment.
The detections returned by process_frame for OBB (oriented bounding boxes) have a different structure (7 elements: cx, cy, w, h, theta, score, class_id) compared to the fallback axis-aligned boxes (6 elements: x1, y1, x2, y2, score, class_id). This inconsistency will cause issues for any consumer of this function trying to process the returned detections, as they won't know which format to expect. Consider returning a consistent structure or clearly documenting this behavior and providing separate functions for OBB and regular detection.
| return YOLO(model_path) | ||
|
|
||
|
|
There was a problem hiding this comment.
The conf parameter passed to load_model is not used within the function. The function only returns YOLO(model_path) without utilizing the confidence threshold. Either remove the unused parameter or apply it to the model configuration if needed.
| return YOLO(model_path) | |
| """ | |
| Load a YOLO model from the given path and optionally set a default | |
| confidence threshold. | |
| Parameters | |
| ---------- | |
| model_path : str | |
| Path to the YOLO model weights. | |
| conf : float or None | |
| Default confidence threshold to apply to the model configuration. | |
| If None, the model's internal default is used. | |
| """ | |
| model = YOLO(model_path) | |
| # If a confidence threshold is provided, apply it to the model's config | |
| if conf is not None: | |
| try: | |
| # Many Ultralytics YOLO versions support configuring defaults via `overrides` | |
| model.overrides["conf"] = conf | |
| except Exception: | |
| # If for some reason `overrides` is not available or not writable, | |
| # silently ignore to avoid breaking existing behavior. | |
| pass | |
| return model |
| results = self.model.predict( | ||
| source=frame, | ||
| conf=self.conf, | ||
| device=self.device, | ||
| verbose=False, | ||
| ) | ||
|
|
||
| r = results[0] | ||
| annot = r.plot() |
There was a problem hiding this comment.
The yolo_utils module is imported and its load_model function is called, but then the model.predict is called directly in the on_image method rather than using yolo_utils.process_frame. This creates code duplication - the frame processing logic exists in yolo_utils.py but is reimplemented in the node. Consider using the process_frame function from yolo_utils to avoid duplication and maintain consistency.
| if r.obb is not None: | ||
| xywhr = r.obb.xywhr.cpu().numpy() | ||
| confs = r.obb.conf.cpu().numpy() | ||
| clss = r.obb.cls.cpu().numpy() | ||
|
|
||
| for (cx, cy, w, h, theta), sc, cid in zip(xywhr, confs, clss): | ||
| det = Detection2D() | ||
| det.header = msg.header | ||
|
|
||
| hyp = ObjectHypothesisWithPose() | ||
| hyp.hypothesis.class_id = str(int(cid)) | ||
| hyp.hypothesis.score = float(sc) | ||
| det.results.append(hyp) | ||
|
|
||
| det.bbox = BoundingBox2D() | ||
| det.bbox.center.position.x = float(cx) | ||
| det.bbox.center.position.y = float(cy) | ||
| det.bbox.center.theta = float(theta) # radians | ||
| det.bbox.size_x = float(w) | ||
| det.bbox.size_y = float(h) | ||
|
|
||
| det_array.detections.append(det) | ||
| else: | ||
| self.get_logger().warn("No OBB output detected — is this an OBB model?") |
There was a problem hiding this comment.
Missing error handling for the case when r.obb is None. While line 111 logs a warning when no OBB output is detected, this happens after attempting to publish an empty detection array. If the model is not an OBB model, this will result in publishing empty detections on every frame, which could be misleading to downstream consumers. Consider either raising an error during initialization to fail fast, or handling this case more gracefully (e.g., only warn once, or fall back to regular bounding boxes).
Removed commented-out color_image_sub_topic line.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 10 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if r.obb is None: | ||
| raise RuntimeError( | ||
| "Loaded model does not output OBB predictions. " | ||
| "Make sure you are using a YOLO26-OBB model." | ||
| ) |
There was a problem hiding this comment.
Model validation occurs during image processing rather than at initialization. If a non-OBB YOLO model is loaded, the node will crash on the first image callback with a RuntimeError. Consider adding validation during model loading (in _load_model method) by running a test inference with a dummy image to fail fast at startup rather than during operation.
yolo_obb_object_detection/yolo_obb_object_detection/yolo_obb_object_detection_node.py
Outdated
Show resolved
Hide resolved
| return box | ||
|
|
||
|
|
||
| def process_frame(frame, model, conf, device): |
There was a problem hiding this comment.
The function process_frame lacks a docstring. The existing yolo_object_detection package includes docstrings for this function (see yolo_object_detection/yolo_utils.py:12-18), which explains the function's purpose, parameters, and return values. For consistency with the codebase, add a similar docstring here.
| def process_frame(frame, model, conf, device): | |
| def process_frame(frame, model, conf, device): | |
| """ | |
| Run YOLO-OBB inference on a single frame and return detections and an annotated frame. | |
| Parameters | |
| ---------- | |
| frame : np.ndarray | |
| Input image in BGR format (as read by OpenCV). | |
| model : ultralytics.YOLO | |
| Loaded YOLO-OBB model used to perform inference. | |
| conf : float | |
| Confidence threshold for filtering detections. | |
| device : str or int | |
| Device identifier passed to the model (e.g. "cpu", "cuda", or GPU index). | |
| Returns | |
| ------- | |
| detections : list[tuple[float, float, float, float, float, float, int]] | |
| List of detections, each as (cx, cy, w, h, theta, score, class_id), where | |
| (cx, cy) is the box center, (w, h) are width and height, theta is the rotation | |
| angle in radians, score is the confidence, and class_id is the class index. | |
| annotated : np.ndarray | |
| Copy of the input frame with oriented bounding boxes and labels drawn. | |
| """ |
yolo_obb_object_detection/yolo_obb_object_detection/yolo_obb_object_detection_node.py
Outdated
Show resolved
Hide resolved
yolo_obb_object_detection/yolo_obb_object_detection/yolo_obb_object_detection_node.py
Outdated
Show resolved
Hide resolved
yolo_obb_object_detection/yolo_obb_object_detection/yolo_obb_object_detection_node.py
Outdated
Show resolved
Hide resolved
…bject_detection_node.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…bject_detection_node.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Introduces a ROS 2 node for YOLO OBB-based object detection, publishing rotated bounding boxes and annotated image outputs.