Ensure that binary logs for PITR are in a shared directory#541
Ensure that binary logs for PITR are in a shared directory#541
Conversation
1824f91 to
0562d49
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
0562d49 to
d259730
Compare
shlomi-noach
left a comment
There was a problem hiding this comment.
Nice! Should we also provide the flag in yaml files?
Signed-off-by: Matt Lord <mattalord@gmail.com>
Yeah. I think this does it. e2e5e8b |
How is the value being set, and to what specific value? |
The user would specify the flag and value in their cluster yaml definition using the |
shlomi-noach
left a comment
There was a problem hiding this comment.
Seems to me like it's feature complete and can be taken out of Draft?
Signed-off-by: Matt Lord <mattalord@gmail.com>
The flag ended up being for |
Signed-off-by: Matt Lord <mattalord@gmail.com>
| // Ensure that binary logs are restored to/from a location that all containers | ||
| // in the pod can access if no location was explicitly provided. | ||
| if _, ok := vttabletAllFlags["builtinbackup-incremental-restore-path"]; !ok { | ||
| vttabletAllFlags["builtinbackup-incremental-restore-path"] = vtDataRootPath | ||
| } |
There was a problem hiding this comment.
What would happen if the path specified in --builtinbackup-incremental-restore-path is not accessible to all containers in the pod?
There was a problem hiding this comment.
I am guessing it is up to the user to set the same value on all components too? mysqlctl, vttablet and vtbackup
There was a problem hiding this comment.
The same thing that happens now to every user. It doesn't work. PITR does not generally work in the operator today.
| vtRootInitScript = `set -ex | ||
| mkdir -p /mnt/vt/bin | ||
| cp --no-clobber /vt/bin/mysqlctld /mnt/vt/bin/ | ||
| cp --no-clobber $(command -v mysqlbinlog) /mnt/vt/bin/ || true |
There was a problem hiding this comment.
The directory /mnt/.../ is shared across all the containers in the pod I am assuming? In which case, this line would resolve what you wrote in the PR's description:
/tmpis not a shared mount point within the pod so whenmysqlbinlogsubsequently tries to read them from within the mysqld container it cannot find them in its container's/tmpdirectory and it fails with an error
There was a problem hiding this comment.
This is simply about copying the mysqlbinlog binary from the vitess/lite container image to the mysqlctld/vtbackup container (if it's not already there), as it looks like we'll need to keep that around in the lite image because the MySQL images do not contain that binary and it's needed for PITR.
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
c9793dd to
e069303
Compare
This includes mysqld (of course) and mysqlbinlog But it does NOT include xtrabackup Signed-off-by: Matt Lord <mattalord@gmail.com>
e069303 to
2aeb36b
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
ae78d72 to
5e7df2b
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
| echo "Backup failed" | ||
| for i in {1..600} ; do | ||
| out=$(kubectl get vtb -n example --no-headers | wc -l) | ||
| if echo "${out}" | grep -c "${finalBackupCount}" >/dev/null; then |
There was a problem hiding this comment.
finalBackupCount does not seem to be initialized before this first wait. With an empty value, grep -c "" returns 1, so we break immediately here. Then after the increment below, the second loop only waits for count 1, which the full backup already satisfies. That means this test can pass without ever observing the incremental backup.
There was a problem hiding this comment.
Good catch! Fixed.
nickvanw
left a comment
There was a problem hiding this comment.
I traced the PITR/operator flow through the changed code and upstream Vitess and left a few inline comments. The shared restore-path change looks directionally right, but I found one runtime wiring issue and a couple of gaps in the new end-to-end coverage.
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
05d2aa7 to
dc8fa0b
Compare
When executing the
vtctldclient RestoreFromBackup --restore-to-pos <value>command, thevttabletprocess in thevttabletcontainer within thevttabletpod — in theRestoreFromBackuptabletmanager RPC — restores the full backup within theVTDATAROOT(specifically/vt/vtdataroot/vt_<tabletUID>/for the mysql data) that is shared by all containers within the pod using the configured backup engine (e.g.xtrabackup). It orchestrates that in conjunction with themysqlctldprocess that's running inside themysqldcontainer within the samevttabletpod. In the end there is a runningmysqldinstance inside themysqldcontainer that is from the restored full backup. Then once the full backup is in place and themysqldprocess is running thevttabletprocess uses the OS tmp dir of/tmpto restore the binary logs from the backup — via thebuiltinbackupengine— for subsequent application and/tmpis not a shared mount point within the pod so whenmysqlbinlogsubsequently tries to read them from within themysqldcontainer it cannot find them in its container's/tmpdirectory and it fails with an error.vtctldclienthttps://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/cmd/vtctldclient/command/backups.go#L227-L263
vtctld[server]https://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/vtctl/grpcvtctldserver/server.go#L3260-L3286
vttablethttps://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/vttablet/tabletmanager/rpc_backup.go#L173-L193
https://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/vttablet/tabletmanager/restore.go#L191-L273
mysqlctld(rather thanmysqlctl, and which runs in themysqlcontainer)https://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/mysqlctl/backup.go#L364-L487
https://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/mysqlctl/builtinbackupengine.go#L995-L1060
vttabletbuiltinbackupenginehttps://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/mysqlctl/builtinbackupengine.go#L995-L1060
Related issues and PRs: