Readd manifest artifact backwards compatibility#2381
Conversation
| await tf.flush() | ||
| manifest_path = tf.name | ||
| if not manifest.data: | ||
| # TODO: BACKWARD COMPATIBILITY - remove after fully migrating to artifactless manifest |
There was a problem hiding this comment.
"Fully migrating" at this point should include making the manifest.data field not null, so the migration actually fails if it's not yet done.
There was a problem hiding this comment.
Yes, and this feels very insufficient https://github.com/pulp/pulp_container/blob/main/pulp_container/app/migrations/0039_manifest_data.py
mdellweg
left a comment
There was a problem hiding this comment.
So this adds the compatibility code back to 2.26 specifically.
Should we still make the sync autorepair in main?
Yes, I think we maybe also want to autorepair on this PR. This just enables the old behavior, but at the end it still requires the user to run management command to fully transition to our new field. It might make sense to just run the data migration for each manifest missing the |
|
Do we want this in the "official" 2.26 branch? Will it negatively impact any other consumers? Should we backport to any other branches for consistency? |
83977e5 to
dbd04b4
Compare
| self.style.WARNING("Found %d broken manifests. PKs:" % len(self._broken_manifests)) | ||
| ) | ||
| for manifest in self._broken_manifests: | ||
| self.stdout.write(self.style.WARNING(" %s" % manifest.pk)) |
There was a problem hiding this comment.
Would it be worth also printing the manifest digest and/or any other metadata about them (e.g. associated images)?
Maybe which repository they are in, if any, or whether it's an orphan?
| ) | ||
| if manifest.data is None: | ||
| manifest.data = raw_text_data | ||
| await manifest.asave(update_fields=["data"]) |
There was a problem hiding this comment.
I consider this a low risk compare to the existing issues, but in theory since we are mutating content, it could potentially deadlock between two parallel syncs.
Not a blocker, just a note.
There was a problem hiding this comment.
Well hopefully after running the command, this piece won't be executed often. Also since manifests are per image and each image gets its own repo, it's unlikely to have parallel syncs of the same two images going at the same time.
| # if artifact is not available (due to reclaim space) we will download it again | ||
| else: | ||
| content_data, manifest = await self._download_and_instantiate_manifest( | ||
| content_data, new_manifest = await self._download_and_instantiate_manifest( |
There was a problem hiding this comment.
So before, we were taking the returned manifest value directly (which AFAICT is an unsaved model object) and throwing away the previous saved manifest. Presumably later on the QueryExistingContents stage would swap the model back, unless it's actually meaningfully different. And so substantively it's only doing a download of content_data
And in the new version, we are taking an existing manifest from above, overwriting the data field, and then re-saving. And so it's still mostly about the download, but instead of the manifest getting swapped with the original later we're just updating the original now and passing it along.
Right? This is subtle, not sure if I have it correct.
There was a problem hiding this comment.
Yeah that's right, we don't have the artifact so we need to download the data and update the manifest.
📜 Checklist
See: Pull Request Walkthrough