Optimize re-sync content by jobselko · Pull Request #7754 · pulp/pulpcore

jobselko · 2026-05-29T16:56:15Z

📜 Checklist

Commits are cleanly separated with meaningful messages (simple features and bug fixes should be squashed to one commit)
A changelog entry or entries has been added for any significant changes
Follows the Pulp policy on AI Usage
(For new features) - User documentation and test coverage has been added

Assisted By: Claude Opus 4.6

dralley · 2026-06-01T04:12:35Z

+    def __init__(self, repo_version=None, deferred_fields=None, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self._repo_version = repo_version
+        self._deferred_fields = deferred_fields or {}


Woudn't this be the opposite of "deferred"? It's the list of fields we want to query up-front for each type, right?

Agree. The name I chose is misleading, changed to extra_fields

dralley · 2026-06-01T04:24:42Z

+                    cached = self._content_cache.get(model_type, {}).get(nat_key)
+                    if cached is not None:
+                        d_content.content = cached
+                        cache_hits_by_type[model_type] |= Q(pk=cached.pk)


Is there a reason to do it this way instead of just collecting a list of PKs and passing pk__in=...?

(there could be, probably the constructed SQL is different, I just don't know)

No, I just copied the logic from content_q_by_type. Changed

dralley · 2026-06-01T04:33:54Z

+            cache_hits_by_type = defaultdict(lambda: Q(pk__in=[]))
+
            for d_content in batch:
                if d_content.content._state.adding:


Theoretically content could be passed through already saved, in which case I think we're probably not touch ing it. That might be an existing bug, though not a particularly serious one.

dralley · 2026-06-01T04:43:18Z


            for model_type, content_q in content_q_by_type.items():
                try:
                    await sync_to_async(model_type.objects.filter(content_q).touch)()


This is really an existing issue, but we're executing the content_q query (w/ natural keys) twice (once to touch, and once for the model swap), and also executing a touch query twice (once with cache hits, once with existing packages that were not in the latest version cache).

Is it possible to collect PKs within the loop, combine them with the cache hit PKs, and have one touch block below this loop?

Or (maybe question for @mdellweg), does that touch really need to happen before the swap for a timing-related reason?

dralley · 2026-06-01T04:44:42Z

Looks good! A few things to look at, maybe we can make this even more efficient

Optimize re-sync content

624b4e3

Assisted By: Claude Opus 4.6

jobselko self-assigned this May 29, 2026

github-actions Bot added no-changelog no-issue labels May 29, 2026

dralley reviewed Jun 1, 2026

View reviewed changes

wip - fix review findings

5daddf2

github-actions Bot added the multi-commit label Jun 1, 2026

dralley mentioned this pull request Jun 1, 2026

Sync optimization: do existing content check in first stage pulp/pulp_rpm#4471

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize re-sync content#7754

Optimize re-sync content#7754
jobselko wants to merge 2 commits into
pulp:mainfrom
jobselko:optimize_replication

jobselko commented May 29, 2026

Uh oh!

dralley Jun 1, 2026 •

edited

Loading

Uh oh!

jobselko Jun 1, 2026

Uh oh!

dralley Jun 1, 2026 •

edited

Loading

Uh oh!

jobselko Jun 1, 2026

Uh oh!

dralley Jun 1, 2026 •

edited

Loading

Uh oh!

dralley Jun 1, 2026 •

edited

Loading

Uh oh!

dralley commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jobselko commented May 29, 2026

📜 Checklist

Uh oh!

dralley Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jobselko Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

dralley Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jobselko Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

dralley Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dralley Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dralley commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dralley Jun 1, 2026 •

edited

Loading

dralley Jun 1, 2026 •

edited

Loading

dralley Jun 1, 2026 •

edited

Loading

dralley Jun 1, 2026 •

edited

Loading