Skip to content

Skip some operations in interior parents computation#4358

Merged
jwpeterson merged 9 commits intolibMesh:develfrom
leonardo-mutti-aks:skip_some_operations_in_interior_parents_computation
Mar 24, 2026
Merged

Skip some operations in interior parents computation#4358
jwpeterson merged 9 commits intolibMesh:develfrom
leonardo-mutti-aks:skip_some_operations_in_interior_parents_computation

Conversation

@leonardo-mutti-aks
Copy link
Contributor

Given that an interior parent p of an element el must satisfy p.dim() == el.dim() + 1,
we can skip some computations in MeshBase::detect_interior_parents():

  1. do not check the interior parents of elements el such that there is no p with p.dim() == el.dim() + 1,
  2. do not fill the node-to-elem map of candidate interior parents p if there is no el with el.dim() == p.dim() - 1.

In models with many elements, especially building the node-to-elem map can be a bottleneck,
so optimization 2 may help easing the computational effort.

@leonardo-mutti-aks
Copy link
Contributor Author

FYI @jwpeterson .

@roystgnr
Copy link
Member

Given that an interior parent p of an element el must satisfy p.dim() == el.dim() + 1,

So ... ex falso?

It's not used as often as for side elements, but we set the interior_parent() to the 3D element when building its 1D edges.

I'd like to see a test case that this speeds up, though; maybe there's some other way to accomplish that.

@leonardo-mutti-aks leonardo-mutti-aks force-pushed the skip_some_operations_in_interior_parents_computation branch from d4a36c8 to 1abd1ce Compare January 30, 2026 10:42
@leonardo-mutti-aks
Copy link
Contributor Author

leonardo-mutti-aks commented Jan 30, 2026

Given that an interior parent p of an element el must satisfy p.dim() == el.dim() + 1,

So ... ex falso?

It's not used as often as for side elements, but we set the interior_parent() to the 3D element when building its 1D edges.

Thanks for pointing this out, I have updated some of the comments in my commits.

I'd like to see a test case that this speeds up, though; maybe there's some other way to accomplish that.

Here are a couple of screenshots of a model we are interested in (static structural).
image
image
image

You can see that it contains many (0D) nodeelems because of all the RBE3 connections plotted in blue.

I profiled the mesh stitching time (14 mins), of which 5:45 mins are spent in MeshBase::detect_interior_parents().
This time in MeshBase::detect_interior_parents() goes down to 1:30 mins with the first two commits.

I also pushed one last commit to be able to completely disable the computation of interior parents,
at least those that are computed from MeshBase::detect_interior_parents(),
since in our project we do not use them. The final total stitching time goes down to around 8 minutes.

@leonardo-mutti-aks leonardo-mutti-aks marked this pull request as ready for review January 30, 2026 14:04
@jwpeterson jwpeterson changed the base branch from master to devel January 30, 2026 14:42
@moosebuild
Copy link

Job Coverage on 1abd1ce : invalidated by @jwpeterson

We had to retarget this PR from master -> devel, I think that may have confused the Coverage job so I'm invalidating it.

@moosebuild
Copy link

moosebuild commented Jan 30, 2026

Job Coverage, step Generate coverage on 5811bf3 wanted to post the following:

Coverage

2ecfa0 #4358 5811bf
Total Total +/- New
Rate 65.46% 65.47% +0.01% 100.00%
Hits 78161 78224 +63 106
Misses 41246 41252 +6 0

Diff coverage report

Full coverage report

This comment will be updated on new commits.

@jwpeterson
Copy link
Member

Given that an interior parent p of an element el must satisfy p.dim() == el.dim() + 1,
So ... ex falso?
It's not used as often as for side elements, but we set the interior_parent() to the 3D element when building its 1D edges.

@roystgnr I think @leonardo-mutti-aks was referring specifically to these lines in MeshBase::detect_interior_parents():

libmesh/src/mesh/mesh_base.C

Lines 1993 to 1999 in 3463d95

for (auto n : make_range(element->n_vertices()))
{
std::vector<dof_id_type> & element_ids = node_to_elem[element->node_id(n)];
for (const auto & eid : element_ids)
if (this->elem_ref(eid).dim() == element->dim()+1)
neighbors[n].insert(eid);

In that function at least, we do require the interior parent that we detect to be exactly one dimension higher. Does this mean that MeshBase::detect_interior_parents() needs an update to handle the "1D edge of a 3D element" that you are talking about?

My other comment here, after reviewing MeshBase::detect_interior_parents() more carefully, is that it's doing a bunch of work for 0D NodeElems that are connected to many 1D Elems. But it's also doing all this work for no reason, because at the moment it is not possible for any attached 1D Elem to be detected as the interior_parent of a NodeElem. The algorithm starts with a candidate interior neighbor on vertex "0" and then requires that "all other" vertices also be attached to the same candidate interior neighbor. But NodeElems don't have any vertices beyond the 1st, so the found_interior_parents flag will remain false.

libmesh/src/mesh/mesh_base.C

Lines 2020 to 2041 in 3463d95

std::set<dof_id_type> & neighbors_0 = neighbors[0];
for (const auto & interior_parent_id : neighbors_0)
{
found_interior_parents = false;
for (auto n : make_range(1u, element->n_vertices()))
{
if (neighbors[n].count(interior_parent_id))
{
found_interior_parents = true;
}
else
{
found_interior_parents = false;
break;
}
}
if (found_interior_parents)
{
element->set_interior_parent(this->elem_ptr(interior_parent_id));
break;
}

In the test case mentioned on this ticket, I verified we spend about 332 of the 802 total seconds in MeshBase::complete_preparation() calling detect_interior_parents().

 -------------------------------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=960.008, Active time=956.136                                                                        |
 -------------------------------------------------------------------------------------------------------------------------------------
| Event                                                  nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                                                   w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|-------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                     |
| MeshBase                                                                                                                            |
|   complete_preparation()                               551        102.5232    0.186067    802.3847    1.456234    10.72    83.92    |
|   copy_constraint_rows(mesh)                           11         0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   detect_interior_parents()                            551        331.9532    0.602456    331.9532    0.602456    34.72    34.72    |
|   remove_orphaned_nodes()                              551        350.5475    0.636202    350.5475    0.636202    36.66    36.66    |
|                                                                                                                                     |

So, given that we aren't currently getting useful interior parents for NodeElems, and we are also spending a lot of time doing so, adding an option to skip it completely like in this PR is reasonable, IMO. I also think that MeshBase::detect_interior_parents() should be improved to handle this case, but that is really a separate issue for us, since we don't want (and don't want to spend time computing) that information anyway.

@jwpeterson jwpeterson force-pushed the skip_some_operations_in_interior_parents_computation branch from 1abd1ce to 48a72bb Compare March 23, 2026 15:49
…ugging

* Move MeshBase::Preparation::operator== to .C file to simplify debugging
* Move MeshBase::Preparation::operator bool to .C file to simplify debugging
* Mark MeshBase::Preparation::operator bool() explicit
* Add MeshBase::Preparation::operator!= implemented in terms of operator==
* Move Preparation::operator= to .C file
* Move MeshBase::Preparation initialization code to constructor in .C file
@roystgnr
Copy link
Member

Does this mean that MeshBase::detect_interior_parents() needs an update to handle the "1D edge of a 3D element" that you are talking about?

Ideally, yeah.

In practice, so long as someone is adding edges constructed via build_edge_ptr, that'll set the interior_parent and we won't delete that setting, and we'll propagate it properly through distributed mesh redistribution ... and then if someone writes XdrIO or CheckpointIO we'll just drop that information and we still rely on detect_interior_parents() to reconstruct it and that'll fail too. :-P

at the moment it is not possible for any attached 1D Elem to be detected as the interior_parent of a NodeElem. The algorithm starts with a candidate interior neighbor on vertex "0" and then requires that "all other" vertices also be attached to the same candidate interior neighbor. But NodeElems don't have any vertices beyond the 1st, so the found_interior_parents flag will remain false.

That's painful to see, yeah.

adding an option to skip it completely like in this PR is reasonable, IMO.

You're probably right, I'd just like to know it was the last resort. Would it be possible to programmatically generate some mesh (maybe the complete graph on N vertices, with NodeElem and Edge2?) that makes it easy to replicate the performance problem?

@jwpeterson
Copy link
Member

The MOOSE ARM mac failures appear to be segfaults (Exit Code: -11) but I think they are architecture-specific, and not related to this PR.

optimization/test:optimizationreporter/parameter_mesh_base.errors/errorMissingMeshParam: ################################################################################
optimization/test:optimizationreporter/parameter_mesh_base.errors/errorMissingMeshParam: Begin tester output
optimization/test:optimizationreporter/parameter_mesh_base.errors/errorMissingMeshParam: 
optimization/test:optimizationreporter/parameter_mesh_base.errors/errorMissingMeshParam: Exit Code: -11
optimization/test:optimizationreporter/parameter_mesh_base.errors/errorMissingMeshParam: 
optimization/test:optimizationreporter/parameter_mesh_base.errors/errorMissingMeshParam: 
optimization/test:optimizationreporter/parameter_mesh_base.errors/errorMissingMeshParam: Tester failed, reason: EXIT CODE -11 != 1
optimization/test:optimizationreporter/parameter_mesh_base.errors/errorMissingMeshParam: 
optimization/test:optimizationreporter/parameter_mesh_base.errors/errorMissingMeshParam: ################################################################################
optimization/test:optimizationreporter/parameter_mesh_base.errors/errorMissingMeshParam: End tester output
optimization/test:optimizationreporter/parameter_mesh_base.errors/errorMissingMeshParam: 

@jwpeterson
Copy link
Member

Would it be possible to programmatically generate some mesh (maybe the complete graph on N vertices, with NodeElem and Edge2?) that makes it easy to replicate the performance problem?

I think I can probably just send you the final stitched mesh from this example if you want it?

@leonardo-mutti-aks found in some earlier exploratory profiling that most of the time is spent in building the "node to elem map", which happens at the beginning of the MeshBase::detect_interior_parents() call.

 -------------------------------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=919.983, Active time=849.961                                                                        |
 -------------------------------------------------------------------------------------------------------------------------------------
| Event                                                  nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                                                   w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|-------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                     |
|                                                                                                                                     |
| InteriorParents                                                                                                                     |
|   build_neighbors_for_element                          43140988   11.3779     0.000000    11.4469     0.000000    1.34     1.35     |
|   build_node_to_elem_map                               560        72.2628     0.129041    372.9359    0.665957    8.50     43.88    |
|   find_and_assign_interior_parent                      517660     0.0140      0.000000    0.0140      0.000000    0.00     0.00     |
|   overall                                              562        55.8087     0.099304    475.7988    0.846617    6.57     55.98    |
|   process_single_element                               408969198  15.7404     0.000000    28.0307     0.000000    1.85     3.30     |
|   set_interior_parents_loop                            560        11.1925     0.019987    47.0542     0.084025    1.32     5.54     |
|   single_insert_into_neighbors_for_elem                1352484    0.0435      0.000000    0.0435      0.000000    0.01     0.01     |
|   single_push_into_node_to_elem_map                    2975978588 242.6241    0.000000    242.6241    0.000000    28.55    28.55    |

This profiling is a bit too fine-grained to draw a lot of conclusions from (it was logging every map insertion, for example) but I do think that simply building the node to elem map probably dominates the time spent in MeshBase::detect_interior_parents(). And, if you look at my PerfLog snippet in #4358 (comment), each call to MeshBase::detect_interior_parents() only takes ~0.6s on average... it's just that we are calling it so many times during the stitching process for a large, complicated Mesh that causes it to stand out in the profiling.

So, I think (in addition to making detect_interior_parents() work correctly for NodeElems) the main optimization here would probably involve us "saving" the node to elem map (and probably "merging" node_to_elem maps during our recursive stitching calls), and not micro-optimizations within detect_interior_parents() itself...

@moosebuild
Copy link

Job Test MOOSE ARM mac on 5811bf3 : invalidated by @jwpeterson

Only one ray_tracing test failed, retry to see if it goes away.

@roystgnr
Copy link
Member

I think I can probably just send you the final stitched mesh from this example if you want it?

Yeah, would you?

And, if you look at my PerfLog snippet in #4358 (comment), each call to MeshBase::detect_interior_parents() only takes ~0.6s on average... it's just that we are calling it so many times during the stitching process for a large, complicated Mesh that causes it to stand out in the profiling.

Oh, this is actually something I've been looking at due to other ridiculously-redundant calls from massive stitching operations. Have you tried setting the new (well, 6 month old) prepare_after_stitching parameter to false?

@jwpeterson
Copy link
Member

I think I can probably just send you the final stitched mesh from this example if you want it?

Yeah, would you?

Sent a Google Drive link over IM.

Have you tried setting the new (well, 6 month old) prepare_after_stitching parameter to false?

I have not tried this, but I guess it could probably help us in this case. Unfortunately, we are blocked from updating libmesh at the moment due to the serial-to-parallel restart issue that we discussed over slack previously. I am in the process of testing a workround (0a2d85f), but there seems to sill be some issues with that. It would be good if we could come up with a "real" fix, since I will need to do a libmesh update soon in order to pull in the changes from this PR as well...

@jwpeterson
Copy link
Member

The tests all passed except for that one MOOSE ARM mac test failure that I don't think is related to this PR. Hence, I will go ahead and merge this, since @leonardo-mutti-aks has been waiting patiently since December for me to work on it 😓

@jwpeterson jwpeterson merged commit 88770dc into libMesh:devel Mar 24, 2026
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants