Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions rclcpp/include/rclcpp/experimental/intra_process_manager.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,39 @@ class IntraProcessManager
std::vector<uint64_t> take_ownership_subscriptions;
};

/// Hash function for rmw_gid_t to enable use in unordered_map
struct rmw_gid_hash
{
std::size_t operator()(const rmw_gid_t & gid) const noexcept
{
// Using the FNV-1a hash algorithm on the gid data
constexpr std::size_t FNV_prime = 1099511628211u;
std::size_t result = 14695981039346656037u;

for (std::size_t i = 0; i < RMW_GID_STORAGE_SIZE; ++i) {
result ^= gid.data[i];
result *= FNV_prime;
}
return result;
}
};

/// Equality comparison for rmw_gid_t to enable use in unordered_map
struct rmw_gid_equal
{
bool operator()(const rmw_gid_t & lhs, const rmw_gid_t & rhs) const noexcept
{
// Compare the data bytes only.
// implementation_identifier pointer comparison is not used here because
// intra-process communication is always within the same process and RMW,
// and pointer comparison is fragile across dynamically loaded components.
Comment on lines +412 to +414
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is new compared to #3068. i think we do not need to do the comparison against RMW identifiers.

return std::equal(
std::begin(lhs.data),
std::end(lhs.data),
std::begin(rhs.data));
}
};
Comment on lines +406 to +420
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rmw_gid_equal uses std::equal, but this header does not include <algorithm>. Relying on transitive includes can break builds depending on include order; add #include <algorithm> in this header to ensure std::equal is declared.

Copilot uses AI. Check for mistakes.

using SubscriptionMap =
std::unordered_map<uint64_t, rclcpp::experimental::SubscriptionIntraProcessBase::WeakPtr>;

Expand All @@ -398,6 +431,16 @@ class IntraProcessManager
using PublisherToSubscriptionIdsMap =
std::unordered_map<uint64_t, SplittedSubscriptions>;

/// Structure to store publisher information in GID lookup map
struct PublisherInfo
{
uint64_t pub_id;
rclcpp::PublisherBase::WeakPtr publisher;
};

using GidToPublisherInfoMap =
std::unordered_map<rmw_gid_t, PublisherInfo, rmw_gid_hash, rmw_gid_equal>;

RCLCPP_PUBLIC
static
uint64_t
Expand Down Expand Up @@ -642,6 +685,8 @@ class IntraProcessManager
PublisherBufferMap publisher_buffers_;

mutable std::shared_timed_mutex mutex_;

GidToPublisherInfoMap gid_to_publisher_info_;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the critical ABI fix.

The original placement shifted mutex_'s offset within the class, causing template methods (inlined in downstream packages like Nav2) to access the wrong memory address for mutex_. The corrupted mutex writes overwrote the hashmap's internal state, producing the SIGSEGV in _M_find_before_node.

};

} // namespace experimental
Expand Down
38 changes: 29 additions & 9 deletions rclcpp/src/rclcpp/intra_process_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ IntraProcessManager::add_publisher(
}
}

// Add GID to publisher info mapping for fast lookups (stores both ID and weak_ptr)
gid_to_publisher_info_[publisher->get_gid()] = {pub_id, publisher};

// Initialize the subscriptions storage for this publisher.
pub_to_subs_[pub_id] = SplittedSubscriptions();

Expand Down Expand Up @@ -98,6 +101,24 @@ IntraProcessManager::remove_publisher(uint64_t intra_process_publisher_id)
{
std::unique_lock<std::shared_timed_mutex> lock(mutex_);

// Remove GID to publisher info mapping.
// First try via the publisher's own GID (fast path).
auto pub_it = publishers_.find(intra_process_publisher_id);
if (pub_it != publishers_.end()) {
auto publisher = pub_it->second.lock();
if (publisher) {
gid_to_publisher_info_.erase(publisher->get_gid());
} else {
// Publisher weak_ptr already expired, fall back to linear scan by pub_id.
for (auto git = gid_to_publisher_info_.begin(); git != gid_to_publisher_info_.end(); ++git) {
if (git->second.pub_id == intra_process_publisher_id) {
gid_to_publisher_info_.erase(git);
break;
Comment on lines +104 to +116
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “fast path” in remove_publisher() (pub_it->second.lock() then publisher->get_gid()) is unlikely to succeed when called from PublisherBase::~PublisherBase() because the weak_ptr lock typically fails once the last shared_ptr is gone. That means removals will usually take the O(N) scan anyway. To keep removals O(1) and simplify control flow, consider storing a pub_id→gid (or pub_id→iterator) mapping at insertion time and erasing by pub_id directly.

Copilot uses AI. Check for mistakes.
}
}
Comment on lines +112 to +118
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added fall back clean-up process just in case.

}
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove_publisher() only erases gid_to_publisher_info_ when the publisher id exists in publishers_. If remove_publisher() is called with an id that is missing/was already erased, the GID map entry will never be removed, leading to stale entries and unbounded growth. Consider always scanning/erasing by pub_id as a fallback even when pub_it == publishers_.end().

Suggested change
}
}
} else {
// Publisher id not found in publishers_, but there might still be a stale
// entry in gid_to_publisher_info_. Fall back to linear scan by pub_id.
for (auto git = gid_to_publisher_info_.begin(); git != gid_to_publisher_info_.end(); ++git) {
if (git->second.pub_id == intra_process_publisher_id) {
gid_to_publisher_info_.erase(git);
break;
}
}

Copilot uses AI. Check for mistakes.
}

publishers_.erase(intra_process_publisher_id);
publisher_buffers_.erase(intra_process_publisher_id);
pub_to_subs_.erase(intra_process_publisher_id);
Expand All @@ -108,16 +129,15 @@ IntraProcessManager::matches_any_publishers(const rmw_gid_t * id) const
{
std::shared_lock<std::shared_timed_mutex> lock(mutex_);

for (auto & publisher_pair : publishers_) {
auto publisher = publisher_pair.second.lock();
if (!publisher) {
continue;
}
if (*publisher.get() == id) {
return true;
}
// Single O(1) hash map lookup - struct contains both ID and weak_ptr
auto it = gid_to_publisher_info_.find(*id);
if (it == gid_to_publisher_info_.end()) {
return false;
}
return false;

// Verify the publisher still exists by checking the weak_ptr
auto publisher = it->second.publisher.lock();
return publisher != nullptr;
}

size_t
Expand Down
16 changes: 15 additions & 1 deletion rclcpp/test/rclcpp/test_intra_process_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,14 @@ class PublisherBase
explicit PublisherBase(const std::string & topic, const rclcpp::QoS & qos)
: topic_name(topic),
qos_profile(qos)
{}
{
// Initialize a mock GID with unique data based on this pointer
gid_.implementation_identifier = "mock_rmw";
auto ptr_value = reinterpret_cast<std::uintptr_t>(this);
for (size_t i = 0; i < RMW_GID_STORAGE_SIZE; ++i) {
Comment on lines +168 to +169
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock GID initialization shifts ptr_value by (i * 8) for all i < RMW_GID_STORAGE_SIZE. When RMW_GID_STORAGE_SIZE * 8 exceeds the bit width of std::uintptr_t (common), shifting by >= width is undefined behavior. Consider copying bytes from ptr_value only up to sizeof(std::uintptr_t) and zero-filling the remaining GID bytes (or memcpy into gid_.data).

Suggested change
auto ptr_value = reinterpret_cast<std::uintptr_t>(this);
for (size_t i = 0; i < RMW_GID_STORAGE_SIZE; ++i) {
auto ptr_value = reinterpret_cast<std::uintptr_t>(this);
// Zero-fill the GID data to ensure deterministic contents
for (size_t i = 0; i < RMW_GID_STORAGE_SIZE; ++i) {
gid_.data[i] = 0u;
}
// Copy at most sizeof(std::uintptr_t) bytes from the pointer value to avoid
// shifting by an amount greater than or equal to the bit width of uintptr_t.
const size_t ptr_num_bytes = sizeof(std::uintptr_t);
const size_t bytes_to_copy =
(RMW_GID_STORAGE_SIZE < ptr_num_bytes) ? RMW_GID_STORAGE_SIZE : ptr_num_bytes;
for (size_t i = 0; i < bytes_to_copy; ++i) {

Copilot uses AI. Check for mistakes.
gid_.data[i] = static_cast<uint8_t>((ptr_value >> (i * 8)) & 0xFF);
}
}

virtual ~PublisherBase()
{}
Expand Down Expand Up @@ -192,6 +199,12 @@ class PublisherBase
return qos_profile.durability() == rclcpp::DurabilityPolicy::TransientLocal;
}

const rmw_gid_t &
get_gid() const
{
return gid_;
}

bool
operator==([[maybe_unused]] const rmw_gid_t & gid) const
{
Expand All @@ -210,6 +223,7 @@ class PublisherBase
private:
std::string topic_name;
rclcpp::QoS qos_profile;
rmw_gid_t gid_;
};

template<typename T, typename Alloc = std::allocator<void>>
Expand Down