Skip to content

fix: database backup on worker#3517

Closed
firdausai wants to merge 3 commits intoDokploy:canaryfrom
firdausai:fix/database-backup-on-worker
Closed

fix: database backup on worker#3517
firdausai wants to merge 3 commits intoDokploy:canaryfrom
firdausai:fix/database-backup-on-worker

Conversation

@firdausai
Copy link
Copy Markdown

What is this PR about?

When backing up a database that is located on a worker node (different server), it throws an error Container not found due to the backup command being ran on the manager server while the container is on another server.

This PR tried to fix it by deploying a service to the worker node when a backup is triggered with the sole purpose of running the backup command. When a backup is triggered, it will check if the backup service already exist, if it does it will delete it, then it will deploy the service. The backup service will die when the command finished running.

Important note:

  • This PR and commit only address the postgres database. If this approach is accepted, then I can make other commits for the other databases in this PR.
  • When working on this locally on a macbook, the way I simulate another worker was through docker-in-docker (see below command). There is a limitation on the socket when executing the command on the worker, which will throw an error. The way I tested this was: I console.log the backup command that is supposed to be ran on my local worker, copy it, change the necessary value to be the same as my prod environment (service name, target worker label, and existing database service name), and paste it to my prod. The command was able to upload a backup to my s3.
docker run -d --privileged \
  --hostname worker-1 \
  --name swarm-worker-1 \
  --label type=database \
  docker:dind

Checklist

Before submitting this PR, please make sure that:

Issues related (if applicable)

closes #3516

Screenshots (if applicable)

@Siumauricio
Copy link
Copy Markdown
Contributor

Hey @firdausai, thanks for taking the time to work on this! The problem you identified is real — backing up databases on worker nodes is a legitimate gap.

However, after reviewing the implementation, we've decided to close this PR for now. Here's why:

  • Scope: It only covers Postgres, but the same issue affects MySQL, MariaDB, MongoDB, and libsql. The fix needs to be generic from the start.
  • Duplicated logic: getCreateDatabaseBackupTempService rebuilds the backup pipeline (pg_dump, gzip, rclone) manually instead of reusing getBackupCommand/generateBackupCommand. This creates two diverging code paths.
  • Logging: The temporary service doesn't write to deployment.logPath, so users wouldn't see progress or errors in the Dokploy UI.
  • Cleanup: If the backup fails, the backup-service-* Swarm service object is left behind with no cleanup in a finally block.
  • Container lookup: The $CONTAINER_ID search inside the Alpine service may not find the correct container since the service labels differ from the original database service.

The overall effort to fix these issues would be close to a rewrite, so it makes more sense to start fresh with a generic approach.

The issue #3516 remains open — contributions toward a more complete solution are welcome! Thanks again for the effort 🙏

@Siumauricio Siumauricio closed this Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Database backup fail when database is in worker node

2 participants