Skip to content

fix(server): fix foreground mode exit code propagation in startup scripts#3044

Open
bitflicker64 wants to merge 1 commit into
apache:masterfrom
bitflicker64:clean-docker-process-supervision
Open

fix(server): fix foreground mode exit code propagation in startup scripts#3044
bitflicker64 wants to merge 1 commit into
apache:masterfrom
bitflicker64:clean-docker-process-supervision

Conversation

@bitflicker64
Copy link
Copy Markdown
Contributor

Fix foreground mode in start-hugegraph.sh (chunk 1 of #3043)

Problem

In foreground mode (-d false), start-hugegraph.sh had a structural
bug: all post-branch logic ran unconditionally after the daemon/foreground
if/else block.

In foreground mode the script blocks at hugegraph-server.sh until Java
exits. After Java exits:

  • PID="$!" captures an empty string (no background job)
  • bin/pid is written with an empty string
  • wait_for_startup fails immediately (empty PID, Java already dead)
  • disown fails (no background jobs)
  • OPEN_MONITOR fires unconditionally — even in foreground mode
  • The script exits 0, losing Java's exit code entirely

This means a Java crash in foreground mode is invisible to the process
supervisor (Docker restart policy, systemd, etc.).

Fix

  • Move all post-branch logic (PID="$!", pid file write, trap,
    wait_for_startup, disown, OPEN_MONITOR) inside the
    DAEMON == "true" branch where it belongs.
  • In the foreground branch: background Java with &, capture $!,
    write the pid file, then wait $PID (so the script blocks while Java
    runs) and exit $? (so Java's exit code propagates out).
  • Gate OPEN_MONITOR inside the daemon branch — monitor setup after a
    blocking foreground call makes no sense.

Also included

  • Fix restserver.url in rest-server.properties to include the
    http:// scheme prefix (was 127.0.0.1:8080, now
    http://127.0.0.1:8080). The check_port and URL parsing in
    util.sh expect a full URL.

Tests

New script: hugegraph-server/hugegraph-dist/src/assembly/travis/test-start-hugegraph.sh

Test Unmodified code After fix
Test 1 — daemon mode regression PASS PASS
Test 2 — foreground mode blocks PASS PASS
Test 3 — exit code propagates (137 on SIGKILL) FAIL PASS
Test 4 — -m true registers cron in daemon mode PASS PASS

Wired into server-ci.yml after the Compile step, rocksdb backend only
(foreground mode behavior is backend-independent).

What this does NOT include

Docker entrypoint changes and runtime restart validation (kill Java
inside container → container exits → Docker restarts it) are in the
follow-up PR for chunks 4–8.

Relates to #3043 (chunk 1 of 3 startup script fixes)

The previous implementation captured $! after the daemon/foreground
if/else block. The script blocked at hugegraph-server.sh until Java
exited, then $! was empty, the pid file got an empty string, and the
script exited 0, losing Java's exit code entirely.

Fix: move all post-branch logic inside the DAEMON == "true" branch.
In the foreground branch, background Java with &, capture $!, write the
pid file, then wait $PID and exit $? so Java's exit code propagates out.
Gate OPEN_MONITOR inside the daemon branch since monitor setup after a
blocking foreground call makes no sense.

Also fix restserver.url in rest-server.properties to include the http://
scheme prefix, which was causing URL parsing issues in util.sh.

Add test-start-hugegraph.sh to verify baseline and post-fix behavior,
and wire it into server-ci.yml for the RocksDB backend.

related to : apache#3043
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working ci-cd Build or deploy tests Add or improve test cases labels May 31, 2026
@bitflicker64 bitflicker64 changed the title Fix docker process supervision fix(server): fix foreground mode exit code propagation in startup scripts May 31, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 34.47%. Comparing base (b9a3dd9) to head (1f68598).

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3044      +/-   ##
============================================
- Coverage     35.96%   34.47%   -1.49%     
  Complexity      338      338              
============================================
  Files           803      803              
  Lines         68053    68053              
  Branches       8907     8907              
============================================
- Hits          24472    23462    -1010     
- Misses        40955    42025    +1070     
+ Partials       2626     2566      -60     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci-cd Build or deploy size:L This PR changes 100-499 lines, ignoring generated files. tests Add or improve test cases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant