Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions content/cn/docs/guides/hugegraph-docker-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,3 +132,32 @@ curl http://localhost:8620/v1/partitions # 分区分配
3. **分区分配未完成**:检查 `curl http://localhost:8620/v1/stores` — 3 个 Store 必须都显示 `"state":"Up"` 才能完成分区分配

4. **连接被拒**:确保 `HG_*` 环境变量使用容器主机名(`pd0`、`store0`),而非 `127.0.0.1`

**查看运行时日志**:使用 `docker logs <container-name>`(如 `docker logs hg-pd0`)可直接查看日志,无需进入容器。

## 容器监控与健康检查

> **版本说明**:本节描述的行为**不包含在 `1.7.0` 镜像中**。请使用 `HUGEGRAPH_VERSION=latest` 或等待下一个发布版本。

### 进程监控模型

此前,三个 Docker 入口脚本均以 `tail -f /dev/null` 结尾,即使 Java 进程崩溃,容器仍会保持运行状态。由于容器从未退出,Docker 的 `restart: unless-stopped` 策略也不会触发。

现在,入口脚本直接监控 Java 进程:

- **PD 和 Store 容器**:入口脚本向启动脚本传入 `-d false` 参数,启动脚本通过 `exec` 直接替换为 Java 进程。容器进程即为 Java 进程——当 Java 退出(崩溃或正常关闭)时,容器立即退出,Docker 的重启策略随即触发。
- **Server 容器**:入口脚本使用 `tail --pid=$PID -f /dev/null` 阻塞,直到 Java 退出。`SIGTERM`/`SIGINT` 信号陷阱会将 `docker stop` 信号转发给 Java 并等待其正常关闭(退出码 0)。若 Java 崩溃,入口脚本以退出码 1 退出,从而触发重启策略。
- 所有镜像中的 PID 1 均为 `dumb-init`,负责将 Docker 信号转发给入口脚本进程。

### 健康检查端点

所有四个 Docker 镜像现已内置 `HEALTHCHECK` 指令。`docker ps` 将显示真实的健康状态。在 90 秒的启动期内,检查失败不计入统计;此后,连续三次失败将把容器标记为 `unhealthy`。

| 镜像 | 健康检查端点 | 端口 | 参数 |
|------|-------------|------|------|
| `hugegraph/hugegraph`(server) | `GET /versions` | 8080 | `--interval=15s --timeout=10s --start-period=90s --retries=3` |
| `hugegraph/hugegraph-hstore` | `GET /versions` | 8080 | 同上 |
| `hugegraph/hugegraph-pd` | `GET /v1/health` | 8620 | 同上 |
| `hugegraph/hugegraph-store` | `GET /v1/health` | 8520 | 同上 |

> **注意**:`start-hugegraph.sh` 中的 `-m true` 标志(基于 cron 的监控)仅适用于虚拟机/裸机部署,Docker 镜像中未安装也不使用该功能。Docker 用户应依赖内置的 `HEALTHCHECK` 和 Docker 重启策略。
5 changes: 5 additions & 0 deletions content/cn/docs/quickstart/hugegraph/hugegraph-hstore.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,11 @@ logging:
./bin/start-hugegraph-store.sh
```

启动脚本支持 `-d` 参数控制守护进程模式:

- `-d true`(默认):以后台守护进程方式运行,脚本立即返回。
- `-d false`:以前台模式运行——脚本通过 `exec` 替换为 Java 进程,容器/进程管理器的进程即为 Java 本身。在 Docker 或进程管理器(systemd、supervisord)下运行时请使用此参数,以便在崩溃时自动检测并重启服务。

启动成功后,可以在 `logs/hugegraph-store-server.log` 中看到类似以下的日志:

```
Expand Down
5 changes: 5 additions & 0 deletions content/cn/docs/quickstart/hugegraph/hugegraph-pd.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,11 @@ partition:
./bin/start-hugegraph-pd.sh
```

启动脚本支持 `-d` 参数控制守护进程模式:

- `-d true`(默认):以后台守护进程方式运行,脚本立即返回。
- `-d false`:以前台模式运行——脚本通过 `exec` 替换为 Java 进程,容器/进程管理器的进程即为 Java 本身。在 Docker 或进程管理器(systemd、supervisord)下运行时请使用此参数,以便在崩溃时自动检测并重启服务。

启动成功后,可以在 `logs/hugegraph-pd-stdout.log` 中看到类似以下的日志:

```
Expand Down
27 changes: 27 additions & 0 deletions content/en/docs/guides/hugegraph-docker-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,30 @@ curl http://localhost:8620/v1/partitions # Partition assignment
4. **Connection refused**: Ensure `HG_*` environment variables use container hostnames (`pd0`, `store0`) instead of `127.0.0.1`.

**Viewing runtime logs**: Use `docker logs <container-name>` (e.g. `docker logs hg-pd0`) to view logs directly without exec-ing into the container.

## Container Supervision & Health Checks

> **Version note**: This behavior is **not present in the `1.7.0` images**. Use `HUGEGRAPH_VERSION=latest` or wait for the next release tag.

### Process Supervision Model

Previously, all three Docker entrypoints ended with `tail -f /dev/null`, which kept the container running even if the Java process crashed. Docker's `restart: unless-stopped` policy never fired because the container never exited.

The entrypoints now supervise Java directly:

- **PD and Store containers**: the entrypoint passes `-d false` to the startup script, which `exec`s Java directly. The container process IS the Java process — when Java exits (crash or clean shutdown), the container exits immediately and Docker's restart policy fires.
- **Server container**: the entrypoint uses `tail --pid=$PID -f /dev/null` to block until Java exits. A `SIGTERM`/`SIGINT` trap forwards `docker stop` signals to Java and waits for clean shutdown (exits 0). If Java crashes, the entrypoint exits 1 so the restart policy fires.
- `dumb-init` (PID 1 in all images) forwards signals from Docker to the entrypoint process.

### Health Check Endpoints

All four Docker images now include a `HEALTHCHECK` instruction. `docker ps` shows real health status. During the 90-second start period, failed checks do not count. After that, three consecutive failures mark the container as `unhealthy`.
Comment thread
bitflicker64 marked this conversation as resolved.

| Image | Health endpoint | Port | Parameters |
|-------|-----------------|------|------------|
| `hugegraph/hugegraph` (server) | `GET /versions` | 8080 | `--interval=15s --timeout=10s --start-period=90s --retries=3` |
| `hugegraph/hugegraph-hstore` | `GET /versions` | 8080 | same |
| `hugegraph/hugegraph-pd` | `GET /v1/health` | 8620 | same |
| `hugegraph/hugegraph-store` | `GET /v1/health` | 8520 | same |

> **Note**: The `-m true` flag (cron-based monitor) in `start-hugegraph.sh` is for VM/bare-metal deployments only. It is not installed or used in Docker images. Docker users should rely on the built-in `HEALTHCHECK` and Docker's restart policy instead.
5 changes: 5 additions & 0 deletions content/en/docs/quickstart/hugegraph/hugegraph-hstore.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,11 @@ Ensure that the PD service is already started, then in the Store installation di
./bin/start-hugegraph-store.sh
```

The startup script supports a `-d` flag to control daemon mode:

- `-d true` (default): run as a background daemon; the script returns immediately.
- `-d false`: run in foreground — the script `exec`s Java, so the container/supervisor process IS Java. Use this when running under Docker or a process supervisor (systemd, supervisord) so crashes are detected and the service is restarted automatically.

After successful startup, you can see logs similar to the following in `logs/hugegraph-store-server.log`:

```
Expand Down
5 changes: 5 additions & 0 deletions content/en/docs/quickstart/hugegraph/hugegraph-pd.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,11 @@ In the PD installation directory, execute:
./bin/start-hugegraph-pd.sh
```

The startup script supports a `-d` flag to control daemon mode:

- `-d true` (default): run as a background daemon; the script returns immediately.
- `-d false`: run in foreground — the script `exec`s Java, so the container/supervisor process IS Java. Use this when running under Docker or a process supervisor (systemd, supervisord) so crashes are detected and the service is restarted automatically.

After successful startup, you can see logs similar to the following in `logs/hugegraph-pd-stdout.log`:

```
Expand Down
Loading