Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,16 +198,20 @@ worktree changes, the worktree edits will not be active. Always:
2. `./uw build`
3. Run your code / tests from there

### AI-Assisted Commit Attribution
When committing code developed with AI assistance, end the commit message with:
### AI-Assisted Attribution (Commits and PRs)
When committing code or creating pull requests with AI assistance, end the
message/body with:

```
Underworld development team with AI support from Claude Code
Underworld development team with AI support from [Claude Code](https://claude.com/claude-code)
```

(In commit messages, use the plain-text form without the markdown link.)

**Do NOT use**:
- `Co-Authored-By:` with a noreply email (useless for soliciting responses)
- Generic AI attribution without team context
- Emoji in PR descriptions

---

Expand Down
52 changes: 51 additions & 1 deletion docs/advanced/parallel-computing.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,56 @@ Underworld3 uses PETSc for parallel operations, which means **you rarely need to

The main use of `uw.mpi.rank` is for conditional output/visualization.

## MPI + Thread Pools (Oversubscription)

When running with MPI, each rank can also spawn BLAS/OpenMP worker threads.
If this is not controlled, total runnable threads can explode and performance
can degrade severely.

Example: `mpirun -np 8` with OpenBLAS default `10` threads can create up to
`80` compute threads, often slower than expected.

### Default Underworld3 Policy

Underworld3 now applies MPI-safe defaults (thread pool size `1`) unless users
explicitly set their own values:

- `OMP_NUM_THREADS`
- `OPENBLAS_NUM_THREADS`
- `MKL_NUM_THREADS`
- `VECLIB_MAXIMUM_THREADS`
- `NUMEXPR_NUM_THREADS`

This happens in two places:

1. `./uw` launcher: sets defaults before Python starts.
2. `underworld3` import path: applies the same defaults for MPI runs if unset.

### Runtime Warning

If running with MPI and any of the thread variables above are explicitly set
to values greater than `1`, Underworld3 prints a rank-0 warning about possible
oversubscription.

### User Controls

- Disable automatic thread caps:

```bash
export UW_DISABLE_THREAD_CAPS=1
```

- Suppress warning (keep your explicit thread settings):

```bash
export UW_SUPPRESS_THREAD_WARNING=1
```

### Recommended Practice

For most MPI benchmark and production jobs, keep `1` thread per rank unless
you are intentionally tuning hybrid MPI+threads.

## Parallel-Safe Output

### The Problem with Rank Conditionals
Expand Down Expand Up @@ -468,4 +518,4 @@ These operations require **ALL ranks** to participate:
4. **Collective operations must run on ALL ranks** - never inside rank conditionals
5. **Test with `mpirun -np N`** to catch issues early

The parallel safety system makes parallel programming in Underworld3 safer and more intuitive - collective operations are evaluated on all ranks automatically, preventing common deadlock scenarios!
The parallel safety system makes parallel programming in Underworld3 safer and more intuitive - collective operations are evaluated on all ranks automatically, preventing common deadlock scenarios!
26 changes: 26 additions & 0 deletions petsc-custom/build-petsc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,27 @@ clone_petsc() {
echo "Clone complete."
}

apply_patches() {
echo "Applying UW3 patches to PETSc..."
cd "$PETSC_DIR"

# Fix ghost facet double-counting in boundary residual/integral assembly.
# Without this, internal boundary natural BCs and BdIntegral produce
# incorrect results in parallel (shared facets integrated on multiple ranks).
# Submitted upstream: [TODO: add PETSc MR link when available]
local patch="${SCRIPT_DIR}/patches/plexfem-ghost-facet-fix.patch"
if [ -f "$patch" ]; then
if git apply --check "$patch" 2>/dev/null; then
git apply "$patch"
echo " Applied: plexfem-ghost-facet-fix.patch"
else
echo " Skipped: plexfem-ghost-facet-fix.patch (already applied or conflict)"
fi
fi

echo "Patches complete."
}

configure_petsc() {
echo "Configuring PETSc with AMR tools..."
cd "$PETSC_DIR"
Expand Down Expand Up @@ -147,6 +168,7 @@ show_help() {
echo " build Build PETSc"
echo " test Run PETSc tests"
echo " petsc4py Build and install petsc4py"
echo " patch Apply UW3 patches to PETSc source"
echo " clean Remove PETSc directory"
echo " help Show this help"
}
Expand All @@ -155,6 +177,7 @@ show_help() {
case "${1:-all}" in
all)
clone_petsc
apply_patches
configure_petsc
build_petsc
build_petsc4py
Expand All @@ -175,6 +198,9 @@ case "${1:-all}" in
build)
build_petsc
;;
patch)
apply_patches
;;
test)
test_petsc
;;
Expand Down
79 changes: 79 additions & 0 deletions petsc-custom/patches/PETSC_MR_DESCRIPTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# PETSc Merge Request: Ghost Facet Fix

## Submission details

- **Fork**: https://gitlab.com/lmoresi/petsc
- **Branch**: `fix/plexfem-ghost-facet-boundary-residual`
- **Target**: `petsc/petsc` → `release`
- **Create MR**: https://gitlab.com/lmoresi/petsc/-/merge_requests/new?merge_request%5Bsource_branch%5D=fix%2Fplexfem-ghost-facet-boundary-residual

## Title

DMPlex: filter ghost facets from boundary residual and integral assembly

## Description

### Problem

`DMPlexComputeBdResidual_Internal` and `DMPlexComputeBdIntegral` construct
`facetIS` as all facets at depth `dim-1` but do not exclude ghost facets
(point SF leaves). The volume residual code in `DMPlexComputeResidualByKey`
checks the `"ghost"` label before calling `DMPlexVecSetClosure` (line 5355),
but the boundary residual path has no equivalent check.

For **external** boundaries this is benign: ghost facets on the domain
boundary still have `support[0]` pointing to a valid local cell, and
`DMLocalToGlobal(ADD_VALUES)` correctly resolves ownership. But for
**internal** boundaries — facets in the mesh interior that carry a
`DMLabel` value and a `DM_BC_NATURAL` condition — facets at partition
junctions are present on multiple ranks. Each rank independently
integrates the boundary flux via `PetscFEIntegrateBdResidual`, and
`DMLocalToGlobal(ADD_VALUES)` sums all contributions, causing
double-counting.

The same issue affects `DMPlexComputeBdIntegral`: ghost boundary facets
are included in the per-face integral sum, which is then `MPI_Allreduce`d,
again double-counting shared facets.

### Use case: internal boundaries in geodynamics

[Underworld3](https://github.com/underworldcode/underworld3) is a
Python/PETSc geodynamics framework that uses DMPlex FEM throughout.
Geophysical models commonly require traction (natural) boundary conditions
on **internal surfaces** — for example, a fault plane or a material
interface embedded within the mesh. These are set up by labeling interior
facets with a `DMLabel` and registering `DM_BC_NATURAL` via
`PetscDSAddBoundary` / `PetscWeakFormAddBdResidual`.

This works correctly in serial but produces O(1) errors whenever the
labeled internal boundary is split across partition boundaries, because
shared facets are integrated on multiple ranks.

### Fix

After obtaining `facetIS` from `DMLabelGetStratumIS(depthLabel, dim-1, ...)`,
filter out SF leaves using `ISDifference`. This ensures each boundary facet
is processed by exactly one rank (the owner).

This follows the same pattern already used in `DMPlexComputeBdFaceGeomFVM`
(plexfem.c, around line 1087) which calls `PetscFindInt(face, nleaves, leaves, &loc)`
to skip ghost faces during FVM face flux computation.

### Testing

Tested with Underworld3 Stokes solver using `DM_BC_NATURAL` on an internal
boundary (annular mesh with labeled interior circle). Results match serial
to 10+ significant figures across all processor counts tested (1, 2, 4, 8).
Before the fix, any partition that splits the internal boundary produces
O(1) errors from double-counted boundary contributions.

The patch applies cleanly to all PETSc releases from v3.18.0 through
v3.24.5 — the affected code has been structurally unchanged across these
versions.

## Reference examples

- Stokes Green's function with internal boundary natural BCs:
https://github.com/underworldcode/underworld3/blob/development/docs/examples/fluid_mechanics/advanced/Ex_Stokes_Annulus_Kernels.py
- Cartesian variant:
https://github.com/underworldcode/underworld3/blob/development/docs/examples/fluid_mechanics/advanced/Ex_Stokes_Cartesian_Kernels.py
59 changes: 59 additions & 0 deletions petsc-custom/patches/plexfem-ghost-facet-fix.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
diff --git a/src/dm/impls/plex/plexfem.c b/src/dm/impls/plex/plexfem.c
index 0299f569070..26e6432585d 100644
--- a/src/dm/impls/plex/plexfem.c
+++ b/src/dm/impls/plex/plexfem.c
@@ -2874,6 +2874,26 @@ PetscErrorCode DMPlexComputeBdIntegral(DM dm, Vec X, DMLabel label, PetscInt num
PetscCall(DMPlexGetDepthLabel(dm, &depthLabel));
PetscCall(DMGetDimension(dm, &dim));
PetscCall(DMLabelGetStratumIS(depthLabel, dim - 1, &facetIS));
+ /* Filter out ghost facets (SF leaves) so that each boundary facet is only
+ counted on one rank. Without this, shared facets at partition boundaries
+ are integrated on multiple ranks, causing double-counting after MPI sum. */
+ if (facetIS) {
+ PetscSF sf;
+ PetscInt nleaves;
+ const PetscInt *leaves;
+
+ PetscCall(DMGetPointSF(dm, &sf));
+ PetscCall(PetscSFGetGraph(sf, NULL, &nleaves, &leaves, NULL));
+ if (nleaves > 0 && leaves) {
+ IS leafIS, ownedFacetIS;
+
+ PetscCall(ISCreateGeneral(PETSC_COMM_SELF, nleaves, leaves, PETSC_USE_POINTER, &leafIS));
+ PetscCall(ISDifference(facetIS, leafIS, &ownedFacetIS));
+ PetscCall(ISDestroy(&leafIS));
+ PetscCall(ISDestroy(&facetIS));
+ facetIS = ownedFacetIS;
+ }
+ }
PetscCall(DMGetLocalSection(dm, &section));
PetscCall(PetscSectionGetNumFields(section, &Nf));
/* Get local solution with boundary values */
@@ -5113,6 +5133,27 @@ static PetscErrorCode DMPlexComputeBdResidual_Internal(DM dm, Vec locX, Vec locX
PetscCall(DMPlexGetDepthLabel(dm, &depthLabel));
PetscCall(DMGetDimension(dm, &dim));
PetscCall(DMLabelGetStratumIS(depthLabel, dim - 1, &facetIS));
+ /* Filter out ghost facets (SF leaves) so that boundary residual contributions
+ from shared facets are only assembled on the owning rank. Without this,
+ internal boundary natural BCs at partition junctions get double-counted
+ because LocalToGlobal with ADD_VALUES sums contributions from all ranks. */
+ if (facetIS) {
+ PetscSF sf;
+ PetscInt nleaves;
+ const PetscInt *leaves;
+
+ PetscCall(DMGetPointSF(dm, &sf));
+ PetscCall(PetscSFGetGraph(sf, NULL, &nleaves, &leaves, NULL));
+ if (nleaves > 0 && leaves) {
+ IS leafIS, ownedFacetIS;
+
+ PetscCall(ISCreateGeneral(PETSC_COMM_SELF, nleaves, leaves, PETSC_USE_POINTER, &leafIS));
+ PetscCall(ISDifference(facetIS, leafIS, &ownedFacetIS));
+ PetscCall(ISDestroy(&leafIS));
+ PetscCall(ISDestroy(&facetIS));
+ facetIS = ownedFacetIS;
+ }
+ }
PetscCall(PetscDSGetNumBoundary(prob, &numBd));
for (bd = 0; bd < numBd; ++bd) {
PetscWeakForm wf;
61 changes: 61 additions & 0 deletions src/underworld3/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@
from mpi4py import MPI # for initialising MPI
import petsc4py as _petsc4py
import sys
import os as _os
import warnings as _warnings

_petsc4py.init(sys.argv)

Expand All @@ -88,6 +90,65 @@
# Handle Sphinx autodoc mocking - PETSc mock objects don't support these operations
pass


def _parse_thread_env(name: str):
"""Return integer thread count from environment variable, or None."""
raw = _os.environ.get(name)
if raw is None or str(raw).strip() == "":
return None
try:
return int(float(raw))
except (TypeError, ValueError):
return None


def _apply_mpi_thread_policy():
"""
Default thread policy for MPI runs.

Caps common thread pools to 1 unless explicitly set by the user. This
prevents MPI+BLAS oversubscription (large performance regressions).
"""
if MPI.COMM_WORLD.Get_size() <= 1:
return

if _os.environ.get("UW_DISABLE_THREAD_CAPS", "0").lower() in ("1", "true", "yes", "on"):
return

thread_vars = (
"OMP_NUM_THREADS",
"OPENBLAS_NUM_THREADS",
"MKL_NUM_THREADS",
"VECLIB_MAXIMUM_THREADS",
"NUMEXPR_NUM_THREADS",
)

for var in thread_vars:
if _os.environ.get(var, "").strip() == "":
_os.environ[var] = "1"

if MPI.COMM_WORLD.Get_rank() == 0 and _os.environ.get("UW_SUPPRESS_THREAD_WARNING", "0").lower() not in (
"1",
"true",
"yes",
"on",
):
oversub = {var: _parse_thread_env(var) for var in thread_vars}
oversub = {k: v for k, v in oversub.items() if v is not None and v > 1}
if oversub:
items = ", ".join(f"{k}={v}" for k, v in oversub.items())
_warnings.warn(
"MPI run with thread pools > 1 detected "
f"({items}). This may cause severe oversubscription. "
"Set thread counts to 1 per rank, or set UW_SUPPRESS_THREAD_WARNING=1 "
"if this is intentional.",
RuntimeWarning,
stacklevel=2,
)


_apply_mpi_thread_policy()

# Version is derived from git tags via setuptools_scm
# Priority: 1) _version.py (generated by setuptools_scm at build time)
# 2) importlib.metadata (for installed packages)
Expand Down
Loading