diff --git a/content/cn/docs/guides/hugegraph-docker-cluster.md b/content/cn/docs/guides/hugegraph-docker-cluster.md index a3f6578b6..9a39da52d 100644 --- a/content/cn/docs/guides/hugegraph-docker-cluster.md +++ b/content/cn/docs/guides/hugegraph-docker-cluster.md @@ -132,3 +132,32 @@ curl http://localhost:8620/v1/partitions # 分区分配 3. **分区分配未完成**:检查 `curl http://localhost:8620/v1/stores` — 3 个 Store 必须都显示 `"state":"Up"` 才能完成分区分配 4. **连接被拒**:确保 `HG_*` 环境变量使用容器主机名(`pd0`、`store0`),而非 `127.0.0.1` + +**查看运行时日志**:使用 `docker logs `(如 `docker logs hg-pd0`)可直接查看日志,无需进入容器。 + +## 容器监控与健康检查 + +> **版本说明**:本节描述的行为**不包含在 `1.7.0` 镜像中**。请使用 `HUGEGRAPH_VERSION=latest` 或等待下一个发布版本。 + +### 进程监控模型 + +此前,三个 Docker 入口脚本均以 `tail -f /dev/null` 结尾,即使 Java 进程崩溃,容器仍会保持运行状态。由于容器从未退出,Docker 的 `restart: unless-stopped` 策略也不会触发。 + +现在,入口脚本直接监控 Java 进程: + +- **PD 和 Store 容器**:入口脚本向启动脚本传入 `-d false` 参数,启动脚本通过 `exec` 直接替换为 Java 进程。容器进程即为 Java 进程——当 Java 退出(崩溃或正常关闭)时,容器立即退出,Docker 的重启策略随即触发。 +- **Server 容器**:入口脚本使用 `tail --pid=$PID -f /dev/null` 阻塞,直到 Java 退出。`SIGTERM`/`SIGINT` 信号陷阱会将 `docker stop` 信号转发给 Java 并等待其正常关闭(退出码 0)。若 Java 崩溃,入口脚本以退出码 1 退出,从而触发重启策略。 +- 所有镜像中的 PID 1 均为 `dumb-init`,负责将 Docker 信号转发给入口脚本进程。 + +### 健康检查端点 + +所有四个 Docker 镜像现已内置 `HEALTHCHECK` 指令。`docker ps` 将显示真实的健康状态。在 90 秒的启动期内,检查失败不计入统计;此后,连续三次失败将把容器标记为 `unhealthy`。 + +| 镜像 | 健康检查端点 | 端口 | 参数 | +|------|-------------|------|------| +| `hugegraph/hugegraph`(server) | `GET /versions` | 8080 | `--interval=15s --timeout=10s --start-period=90s --retries=3` | +| `hugegraph/hugegraph-hstore` | `GET /versions` | 8080 | 同上 | +| `hugegraph/hugegraph-pd` | `GET /v1/health` | 8620 | 同上 | +| `hugegraph/hugegraph-store` | `GET /v1/health` | 8520 | 同上 | + +> **注意**:`start-hugegraph.sh` 中的 `-m true` 标志(基于 cron 的监控)仅适用于虚拟机/裸机部署,Docker 镜像中未安装也不使用该功能。Docker 用户应依赖内置的 `HEALTHCHECK` 和 Docker 重启策略。 diff --git a/content/cn/docs/quickstart/hugegraph/hugegraph-hstore.md b/content/cn/docs/quickstart/hugegraph/hugegraph-hstore.md index d07fdfe2a..79a0ddfc5 100644 --- a/content/cn/docs/quickstart/hugegraph/hugegraph-hstore.md +++ b/content/cn/docs/quickstart/hugegraph/hugegraph-hstore.md @@ -157,6 +157,11 @@ logging: ./bin/start-hugegraph-store.sh ``` +启动脚本支持 `-d` 参数控制守护进程模式: + +- `-d true`(默认):以后台守护进程方式运行,脚本立即返回。 +- `-d false`:以前台模式运行——脚本通过 `exec` 替换为 Java 进程,容器/进程管理器的进程即为 Java 本身。在 Docker 或进程管理器(systemd、supervisord)下运行时请使用此参数,以便在崩溃时自动检测并重启服务。 + 启动成功后,可以在 `logs/hugegraph-store-server.log` 中看到类似以下的日志: ``` diff --git a/content/cn/docs/quickstart/hugegraph/hugegraph-pd.md b/content/cn/docs/quickstart/hugegraph/hugegraph-pd.md index 6602cbd94..032e8993c 100644 --- a/content/cn/docs/quickstart/hugegraph/hugegraph-pd.md +++ b/content/cn/docs/quickstart/hugegraph/hugegraph-pd.md @@ -163,6 +163,11 @@ partition: ./bin/start-hugegraph-pd.sh ``` +启动脚本支持 `-d` 参数控制守护进程模式: + +- `-d true`(默认):以后台守护进程方式运行,脚本立即返回。 +- `-d false`:以前台模式运行——脚本通过 `exec` 替换为 Java 进程,容器/进程管理器的进程即为 Java 本身。在 Docker 或进程管理器(systemd、supervisord)下运行时请使用此参数,以便在崩溃时自动检测并重启服务。 + 启动成功后,可以在 `logs/hugegraph-pd-stdout.log` 中看到类似以下的日志: ``` diff --git a/content/en/docs/guides/hugegraph-docker-cluster.md b/content/en/docs/guides/hugegraph-docker-cluster.md index 6b742f6ae..da5d6c843 100644 --- a/content/en/docs/guides/hugegraph-docker-cluster.md +++ b/content/en/docs/guides/hugegraph-docker-cluster.md @@ -134,3 +134,30 @@ curl http://localhost:8620/v1/partitions # Partition assignment 4. **Connection refused**: Ensure `HG_*` environment variables use container hostnames (`pd0`, `store0`) instead of `127.0.0.1`. **Viewing runtime logs**: Use `docker logs ` (e.g. `docker logs hg-pd0`) to view logs directly without exec-ing into the container. + +## Container Supervision & Health Checks + +> **Version note**: This behavior is **not present in the `1.7.0` images**. Use `HUGEGRAPH_VERSION=latest` or wait for the next release tag. + +### Process Supervision Model + +Previously, all three Docker entrypoints ended with `tail -f /dev/null`, which kept the container running even if the Java process crashed. Docker's `restart: unless-stopped` policy never fired because the container never exited. + +The entrypoints now supervise Java directly: + +- **PD and Store containers**: the entrypoint passes `-d false` to the startup script, which `exec`s Java directly. The container process IS the Java process — when Java exits (crash or clean shutdown), the container exits immediately and Docker's restart policy fires. +- **Server container**: the entrypoint uses `tail --pid=$PID -f /dev/null` to block until Java exits. A `SIGTERM`/`SIGINT` trap forwards `docker stop` signals to Java and waits for clean shutdown (exits 0). If Java crashes, the entrypoint exits 1 so the restart policy fires. +- `dumb-init` (PID 1 in all images) forwards signals from Docker to the entrypoint process. + +### Health Check Endpoints + +All four Docker images now include a `HEALTHCHECK` instruction. `docker ps` shows real health status. During the 90-second start period, failed checks do not count. After that, three consecutive failures mark the container as `unhealthy`. + +| Image | Health endpoint | Port | Parameters | +|-------|-----------------|------|------------| +| `hugegraph/hugegraph` (server) | `GET /versions` | 8080 | `--interval=15s --timeout=10s --start-period=90s --retries=3` | +| `hugegraph/hugegraph-hstore` | `GET /versions` | 8080 | same | +| `hugegraph/hugegraph-pd` | `GET /v1/health` | 8620 | same | +| `hugegraph/hugegraph-store` | `GET /v1/health` | 8520 | same | + +> **Note**: The `-m true` flag (cron-based monitor) in `start-hugegraph.sh` is for VM/bare-metal deployments only. It is not installed or used in Docker images. Docker users should rely on the built-in `HEALTHCHECK` and Docker's restart policy instead. diff --git a/content/en/docs/quickstart/hugegraph/hugegraph-hstore.md b/content/en/docs/quickstart/hugegraph/hugegraph-hstore.md index 9c357de63..ea2bf1bb8 100644 --- a/content/en/docs/quickstart/hugegraph/hugegraph-hstore.md +++ b/content/en/docs/quickstart/hugegraph/hugegraph-hstore.md @@ -157,6 +157,11 @@ Ensure that the PD service is already started, then in the Store installation di ./bin/start-hugegraph-store.sh ``` +The startup script supports a `-d` flag to control daemon mode: + +- `-d true` (default): run as a background daemon; the script returns immediately. +- `-d false`: run in foreground — the script `exec`s Java, so the container/supervisor process IS Java. Use this when running under Docker or a process supervisor (systemd, supervisord) so crashes are detected and the service is restarted automatically. + After successful startup, you can see logs similar to the following in `logs/hugegraph-store-server.log`: ``` diff --git a/content/en/docs/quickstart/hugegraph/hugegraph-pd.md b/content/en/docs/quickstart/hugegraph/hugegraph-pd.md index 9f30bd794..5065f6db7 100644 --- a/content/en/docs/quickstart/hugegraph/hugegraph-pd.md +++ b/content/en/docs/quickstart/hugegraph/hugegraph-pd.md @@ -164,6 +164,11 @@ In the PD installation directory, execute: ./bin/start-hugegraph-pd.sh ``` +The startup script supports a `-d` flag to control daemon mode: + +- `-d true` (default): run as a background daemon; the script returns immediately. +- `-d false`: run in foreground — the script `exec`s Java, so the container/supervisor process IS Java. Use this when running under Docker or a process supervisor (systemd, supervisord) so crashes are detected and the service is restarted automatically. + After successful startup, you can see logs similar to the following in `logs/hugegraph-pd-stdout.log`: ```