Skip to content

Fix in-place tablespace crashes: session lock leak and segment SIGSEGV#1628

Open
yjhjstz wants to merge 3 commits intoapache:mainfrom
yjhjstz:fix/in-place-tablespace-crashes
Open

Fix in-place tablespace crashes: session lock leak and segment SIGSEGV#1628
yjhjstz wants to merge 3 commits intoapache:mainfrom
yjhjstz:fix/in-place-tablespace-crashes

Conversation

@yjhjstz
Copy link
Member

@yjhjstz yjhjstz commented Mar 17, 2026

Summary

  • Fix session lock leak in ALTER DATABASE SET TABLESPACE for utility-mode backends, which caused assertion failure in TAP test 033_replay_tsp_drops.pl
  • Fix segment SIGSEGV when creating in-place tablespace with CREATE TABLESPACE ... LOCATION ''
  • Fix spurious "could not read symbolic link" log messages when dropping in-place tablespaces

Root Cause

Session lock leak (Issue #1626): movedb() acquires a session-level AccessExclusiveLock via MoveDbSessionLockAcquire(). The release in CommitTransaction() only checked Gp_role == GP_ROLE_DISPATCH || IS_SINGLENODE(), missing GP_ROLE_UTILITY. Standalone backends (TAP tests, pg_basebackup recovery) leaked the lock, triggering proc.c:1067 assertion at exit.

Segment SIGSEGV (Issue #1627): When CREATE TABLESPACE ... LOCATION '' is dispatched to segments, serialization converts the empty string to NULL. pstrdup(NULL) crashes the segment process.

Fix

  1. Add Gp_role == GP_ROLE_UTILITY to the session lock release condition in CommitTransaction()
  2. Add NULL guard for stmt->location in CreateTableSpace()
  3. Skip readlink() EINVAL error for in-place tablespace directories (they are directories, not symlinks)

Test Plan

  • src/test/recovery/t/033_replay_tsp_drops.pl passes (previously crashed with assertion failure)
  • CREATE TABLESPACE ... LOCATION '' on MPP cluster does not crash segments

Fixes #1626
Fixes #1627

yjhjstz added 2 commits March 18, 2026 03:17
When CREATE TABLESPACE ... LOCATION '' is dispatched from QD to QE,
the serialization converts the empty string to NULL. On the segment,
pstrdup(stmt->location) crashes with SIGSEGV because stmt->location
is NULL.

Add a NULL guard to treat NULL the same as empty string, preserving
in-place tablespace semantics.

Fixes apache#1627
movedb() acquires a session-level AccessExclusiveLock on the database
via MoveDbSessionLockAcquire(). The release in CommitTransaction()
only checked for GP_ROLE_DISPATCH and IS_SINGLENODE(), missing
GP_ROLE_UTILITY. This caused the lock to leak in standalone backends
(e.g. TAP tests), triggering a proc.c assertion failure at exit:

  FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))")

Add GP_ROLE_UTILITY to the release condition.

Also fix a spurious "could not read symbolic link" log message when
dropping in-place tablespaces: readlink() on a directory returns
EINVAL, which is expected and can be safely skipped.

Fixes apache#1626
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Greenplum in-place tablespace edge cases that could crash backends/segments or emit misleading logs, and addresses a utility-mode session lock leak that caused TAP recovery test failures.

Changes:

  • Release movedb session-level lock at commit time for utility-mode backends to prevent lock leakage/assertion failure at process exit.
  • Prevent QE SIGSEGV by treating a NULL CreateTableSpaceStmt->location as empty string for in-place tablespaces.
  • Suppress expected readlink() EINVAL noise when dropping in-place tablespaces (where pg_tblspc/<oid> is a directory, not a symlink).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/backend/commands/tablespace.c Guard against NULL stmt->location for in-place tablespaces; avoid logging expected readlink(EINVAL) for in-place drop paths.
src/backend/access/transam/xact.c Release movedb session lock on commit in GP_ROLE_UTILITY to prevent session lock leaks in standalone/utility backends.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@yjhjstz yjhjstz force-pushed the fix/in-place-tablespace-crashes branch 4 times, most recently from 56551de to 0039c9b Compare March 17, 2026 23:21
Two fixes:
 - Use TestLib::slurp_file instead of PostgreSQL::Test::Utils::slurp_file
      in adjust_conf(). PostgresNode.pm only imports TestLib, not
      PostgreSQL::Test::Utils, so the latter is undefined and causes
      t/101_restore_point_and_startup_pause to fail.

 - Replace cp with install -m 644 in enable_archiving() archive_command.
      coreutils 8.32 (Rocky 8, Ubuntu 22.04) uses copy_file_range() in cp
      which crashes on Docker overlayfs. install does not use
      copy_file_range(), avoiding the crash.

    Also add ic-recovery test suite to rocky8 and deb CI pipelines.
@yjhjstz yjhjstz force-pushed the fix/in-place-tablespace-crashes branch from 0039c9b to 65670b6 Compare March 17, 2026 23:46
Copy link
Contributor

@reshke reshke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh.. yes, this is clear oversight introduced in recent ported changes.
Somehow this worked in my by-hand hand (I also thought CI was reliable here)

@reshke
Copy link
Contributor

reshke commented Mar 18, 2026

Linking for cross-reference #1585

@reshke
Copy link
Contributor

reshke commented Mar 18, 2026

Lets rebase & merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants