Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 76 additions & 46 deletions docs.json

Large diffs are not rendered by default.

23 changes: 23 additions & 0 deletions fundamentals/cometchat-on-prem/docker/air-gapped-deployment.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: "Air-Gapped Deployment"
sidebarTitle: "Air-Gapped"
---

Guidelines for deploying the platform in offline or isolated (air-gapped) environments.

## Offline installation steps

- Export required Docker images with `docker save`
- Transfer images via removable media, secure copy (SSH), or an isolated internal network
- Import images on the target system with `docker load`

## Local registry

- Host images in Harbor, Nexus, or a private Docker registry
- Enforce role-based access control (RBAC) and image retention policies

## Limitations in air-gapped mode

- No access to external push notification services
- No S3 or other cloud object storage unless internally emulated
- No cloud-hosted analytics, logging, or monitoring integrations
119 changes: 119 additions & 0 deletions fundamentals/cometchat-on-prem/docker/configuration-reference.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
title: "Configuration Reference"
sidebarTitle: "Configuration"
---

Use this reference when updating domains, migrating environments, troubleshooting misconfiguration, or performing production deployments. Values are sourced from `docker-compose.yml`, service-level `.env` files, and the domain update guide.

Use this when:
- Updating domains
- Migrating environments
- Troubleshooting service misconfiguration
- Performing production deployments

## Global notes

- All services read environment variables from their respective directories.
- Domain values must be updated consistently across API, WebSocket, Notifications, Webhooks, and NGINX configurations.
- Changing the primary domain impacts reverse proxy routing, OAuth headers, CORS, webhook endpoints, and TiDB host references.

## Chat API

Update these values when changing domains:

- `MAIN_DOMAIN="<your-domain>"`
- `EXTENSION_DOMAIN="<your-domain>"`
- `WEBHOOKS_BASE_URL="https://webhooks.<your-domain>/v1/webhooks"`
- `TRIGGERS_BASE_URL="https://webhooks.<your-domain>/v1/triggers"`
- `EXTENSION_BASE_URL="https://notifications.<your-domain>"`
- `MODERATION_ENABLED=true`
- `RULES_BASE_URL="https://moderation.<your-domain>/v1/moderation-service"`
- `ADMIN_API_HOST="api.<your-domain>"`
- `CLIENT_API_HOST="apiclient.<your-domain>"`
- `ALLOWED_API_DOMAINS="<your-domain>,<additional-domain>"`
- `DB_HOST="tidb.<your-domain>"`
- `DB_HOST_CREATOR="tidb.<your-domain>"`
- `V3_CHAT_HOST="websocket.<your-domain>"`

## Management API (MGMT API)

- `ADMIN_API_HOST="api.<your-domain>"`
- `CLIENT_API_HOST="apiclient.<your-domain>"`
- `APP_HOST="dashboard.<your-domain>"`
- `API_HOST="https://mgmt-api.<your-domain>"`
- `MGMT_DOMAIN="<your-domain>"`
- `MGMT_DOMAIN_TO_REPLACE="<your-domain>"`
- `RULES_BASE_URL="https://moderation.<your-domain>/v1/moderation"`
- `ACCESS_CONTROL_ALLOW_ORIGIN="<your-domain>,<additional-domain>"`

## WebSocket

Hostnames are derived automatically from NGINX and Chat API configuration; no manual domain updates are required.

## Notifications service

- `CC_DOMAIN="<your-domain>"` (controls routing, token validation, and push delivery)

## Moderation service

- `CHAT_API_URL="<your-domain>"` for rule evaluation, metadata retrieval, and decision submission

## Webhooks service

- `CHAT_API_DOMAIN="<your-domain>"` - must match the Chat API domain exactly to avoid retries or signature verification failures

## Extensions

```json
"DOMAINS": [
"<allowed-domain-1>",
"<allowed-domain-2>",
"<your-domain>"
],
"DOMAIN_NAME": "<your-domain>"
```

Defines CORS and allowed origins for extension traffic.

## Receipt Updater

- `RECEIPTS_MYSQL_HOST="tidb.<your-domain>"` for delivery receipts, read receipts, and thread metadata

## SQL Consumer

```json
"CONNECTION_CONFIG": {
"host": "<tidb-host>"
},
"ALTER_USER_CONFIG": {
"host": "<tidb-host>"
},
"API_CONFIG": {
"API_DOMAIN": "<api-domain>"
}
```

Controls database migrations, multi-tenant provisioning, and internal requests to Chat API.

## NGINX configuration files

Update domain values in:

- chatapi.conf
- extensions.conf
- mgmtapi.conf
- notifications.conf
- dashboard.conf
- globalwebhooks.conf
- moderation.conf
- websocket.conf

These govern TLS termination, routing, reverse proxy rules, and WebSocket upgrades.

## Summary of domain values to update

- Chat API, Client API, and Management API
- Notifications, Moderation, Webhooks, and Extensions services
- NGINX reverse proxy hostnames
- TiDB host references
- WebSocket host configuration in Chat API
28 changes: 28 additions & 0 deletions fundamentals/cometchat-on-prem/docker/monitoring.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: "Monitoring"
sidebarTitle: "Monitoring"
---

Monitoring ensures system health, operational visibility, and SLA compliance.

## Observability stack

- Prometheus for metrics collection
- Grafana for dashboards and visualizations
- Loki (or ELK) for centralized log aggregation

## Key service metrics to track

- Kafka consumer lag
- WebSocket active connection count
- Redis memory utilization and cache hit ratio
- TiDB region health and TiKV store availability

## Alerting recommendations

- Sustained CPU utilization above 80%
- Database query latency exceeding 100 ms
- Kafka consumer lag breaching defined thresholds
- WebSocket connection drops or abnormal failure rate spikes

Tune thresholds based on workload characteristics and traffic patterns.
54 changes: 54 additions & 0 deletions fundamentals/cometchat-on-prem/docker/overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: "CometChat On-Prem Overview"
sidebarTitle: "Overview"
---

CometChat On-Prem is an enterprise deployment and operations blueprint for a high-performance, real-time messaging platform built for reliability, low latency, and horizontal scale. It covers deployments from roughly 10k MAU up to 250k+ MAU and establishes the foundations for even higher workloads.

## Who this guide is for

- DevOps and SRE teams responsible for uptime and operations
- Platform, cloud, and backend engineers deploying or tuning the stack
- Infrastructure architects planning multi-region, failover, or compliance-heavy environments

## What the platform does

- Real-time messaging for 1:1 and group chat with persistent history
- WebSocket event streaming for presence, typing indicators, and delivery/read receipts
- Distributed event pipeline (Kafka) for decoupled microservices communication
- Notifications subsystem for asynchronous push fan-out
- Moderation services with rule-based filtering and optional AI adapters
- Webhooks engine for outbound callbacks with retries and signature validation
- Horizontally scalable REST APIs for chat, users, groups, and metadata

## Data & storage

- TiDB cluster (PD, TiKV, TiDB SQL) as the primary relational store for users, conversations, groups, and message metadata
- MongoDB for flexible metadata, moderation data, and unstructured fields
- Three Redis clusters for caching, pub/sub, session state, and other fast-access needs
- Kafka as the event backbone for real-time messaging and inter-service pipelines
- Optional object storage (e.g., Amazon S3, MinIO, Ceph) for media, logs, documents, and other large binaries when your application handles unstructured data across services

## Deployment models

- Local development (Docker Compose): single-machine environment for dependency bootstrapping, local development/QA, and CI pipelines. Not recommended for production workloads.
- Docker Swarm (recommended up to ~200k MAU / ~20k PCC): current reference architecture with lightweight cluster management, predictable service placement, secure overlay networks, and rolling updates.
- Kubernetes (enterprise, multi-region, or >200k MAU): best when you need advanced autoscaling, cross-region failover, service mesh/mTLS, cloud-native Kafka, or strict compliance requirements. Contact us for enterprise Kubernetes architecture guidance.

## High-level architecture

<Frame>
<img src="/images/docker-on-prem-architecture.png" />
</Frame>

- NGINX for TLS termination, routing, WebSocket upgrades, and load balancing
- WebSocket gateway for real-time connections, presence events, and device sessions
- Chat API for messaging logic across users, groups, conversations, and metadata
- Moderation engine for policy-based filtering and compliance checks
- Notifications service for asynchronous push notifications and event fan-out
- Webhooks service for outbound callbacks with retries
- Kafka as the central event backbone
- TiDB, MongoDB, and Redis as the stateful data stores
- Observability stack (Prometheus, Grafana, Loki/ELK) for metrics, dashboards, and logs
- Host and network: private overlay networks isolating backend traffic and optimizing latency

31 changes: 31 additions & 0 deletions fundamentals/cometchat-on-prem/docker/persistence-and-backup.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: "Persistence & Backup"
sidebarTitle: "Persistence & Backup"
---

Defines how persistent data is stored, backed up, and restored in production environments.

## Volume layout

| Service | Default path |
| --- | --- |
| TiKV | `/data` |
| PD | `/data` |
| Kafka | `/var/lib/kafka/data` |
| Redis | `/data` |
| MongoDB | `/data/db` |

All persistent volumes should be backed by SSD or NVMe storage for production deployments.

## Backup strategy

- TiDB: daily backups to secure, off-cluster storage
- Kafka: weekly segment-level backups
- Redis: RDB snapshots every 6 hours (cache data is non-authoritative)
- Backup validation: monthly restore and integrity verification tests

## Disaster recovery

- Validate full restore procedures at least once per quarter
- Maintain a minimum of three geographically isolated backup copies
- Run staged disaster recovery simulations such as warm-standby restoration and full cluster rehydration from backups
53 changes: 53 additions & 0 deletions fundamentals/cometchat-on-prem/docker/prerequisites.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
title: "Prerequisites"
sidebarTitle: "Prerequisites"
---

## Supported operating systems

- Ubuntu 20.04 / 22.04 / 24.04 LTS
- RedHat Enterprise Linux 8+

## Required software

- Docker Engine >= 24
- Docker Compose >= v2
- Git
- OpenSSL >= 1.1
- jq, curl, net-tools

## Minimum hardware (testing / QA)

- 8 vCPUs
- 16 GB RAM
- 100 GB SSD (minimum; scale up based on workload and storage needs)

## Production hardware

### Baseline sizing

| MAU | Peak concurrent connections (PCC) | vCPUs | RAM |
| --- | --- | --- | --- |
| 10k | 500 | 32 | 64 GiB |
| 25k | 1,250 | 64 | 128 GiB |
| 50k | 2,500 | 96 | 192 GiB |
| 100k | 5,000 | 156 | 312 GiB |
| 200k | 10,000 | 272 | 544 GiB |

Storage guidance: start at 100 GB SSD and scale to 500 GB to 2 TB SSD depending on workload and data retention.

### High-concurrency sizing

| MAU | Peak concurrent connections (PCC) | vCPUs | RAM |
| --- | --- | --- | --- |
| 10k | 1,000 | 48 | 96 GiB |
| 25k | 2,500 | 96 | 192 GiB |
| 50k | 5,000 | 156 | 312 GiB |
| 100k | 10,000 | 240 | 480 GiB |
| 200k | 20,000 | 480 | 960 GiB |

Storage guidance: expect to exceed 100 GB SSD; plan 500 GB to 2 TB SSD as concurrency and data volume grow.

## Required ports

- 80 / 443 to NGINX (HTTP / HTTPS)
Loading