Skip to content

Disable chunk uploads for the sync rpm upload endpoint to improve performance on console#1306

Merged
mdellweg merged 2 commits intopulp:mainfrom
YasenT:disable-chunks
Feb 18, 2026
Merged

Disable chunk uploads for the sync rpm upload endpoint to improve performance on console#1306
mdellweg merged 2 commits intopulp:mainfrom
YasenT:disable-chunks

Conversation

@YasenT
Copy link
Contributor

@YasenT YasenT commented Feb 3, 2026

Disable chunk uploads
And retrigger tests to see if it fails at same place. As I see that the nighties run just fine

@YasenT YasenT changed the title Update context.py Disable chunk uploads for the sync rpm upload endpoint to improve performance on console Feb 3, 2026
@YasenT
Copy link
Contributor Author

YasenT commented Feb 6, 2026

@mdellweg can you re-run the tests? It's failing again on a random place. I can't do it myself
And i guess we need them to pass before merging?

@jobselko
Copy link
Contributor

jobselko commented Feb 6, 2026

@YasenT I just re-ran the tests. Could you please add a changelog entry, since this changes the functionality?

@mdellweg
Copy link
Member

mdellweg commented Feb 6, 2026

Yes, we should have the tests pass. But it's only necessary once we have established there's no pending change anymore. We are working on the random test failures.

@YasenT
Copy link
Contributor Author

YasenT commented Feb 6, 2026

@YasenT I just re-ran the tests. Could you please add a changelog entry, since this changes the functionality?

Added a changelog

@YasenT
Copy link
Contributor Author

YasenT commented Feb 6, 2026

Yes, we should have the tests pass. But it's only necessary once we have established there's no pending change anymore. We are working on the random test failures.

Anything else you would like changed? :)

@jobselko
Copy link
Contributor

jobselko commented Feb 9, 2026

@YasenT Any reason why Gerrod is marked as the commit author? I am guessing this was by accident.

@mdellweg
Copy link
Member

mdellweg commented Feb 9, 2026

Before I will comment more about the excessive comments still left, let me try to understand what this change is all about. When you have a huge file, it always used to be chunked up. Now with this change it will always not be chunked up, thereby running into the exact server limits the chunking was invented for in the first place. And for that you take away the user control over the chunk size. Couldn't you instead just tell the user to increase the chunk size? Isn't this a docs issue in the first place? Do we need a globally configurable chunk_size default?

@YasenT
Copy link
Contributor Author

YasenT commented Feb 9, 2026

@mdellweg
Actually I like having comments, I believe that they help, not sure if this changes anything in code, it's more of a personal viewpoint. But can we move forward with this? Not sure if every comment needs to be reviewed, but if it needs be, can I get the review so that we can complete these?

As for the chunking -> the default behavior actually doesn't change. It will still chunk at default 1MB, and people can set the chunk size if they want to.
There's a new synchronous endpoint in pulp_rpm, which is being called only when someone is trying to upload an RPM without providing a repository. I introduced the use of this endpoint in the cli a few months ago for this specific case. And we don't want chunking for that specific endpoint, to reduce the overhead coming from it.
And this is what the commit is targeting in fact.

@YasenT YasenT force-pushed the disable-chunks branch 2 times, most recently from e148c8d to b7cf6f4 Compare February 9, 2026 12:53
jobselko
jobselko previously approved these changes Feb 9, 2026
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last comment wasn't even about the comments, but about the fact that the command has a chunk_size (probably with a badly chosen default) that your changes just silently start to ignore. My point is that even in the face of the upload endpoint, there is a limit to the size of file you can upload in one go.
So my specific question here is, should we make the chunk_size globally configurable?

@YasenT
Copy link
Contributor Author

YasenT commented Feb 10, 2026

My last comment wasn't even about the comments, but about the fact that the command has a chunk_size (probably with a badly chosen default) that your changes just silently start to ignore. My point is that even in the face of the upload endpoint, there is a limit to the size of file you can upload in one go. So my specific question here is, should we make the chunk_size globally configurable?

On the chunk size, I think that the default value should be dynamic and based off the size of the file being uploaded.

But my changes aren't affecting the default behavior/flow that users are used to with the async endpoint. The changes affect just the synchronous one, which wasn't available through the pulp-cli till recently. And there the target is to reduce the tasking/load on the server side.
We increased the timeouts (on console) so that they won't be a limiting factor for the duration of an upload, which was the main factor limiting how big of a file you can upload without chunking it
And it is faster to upload 1x big file than a bunch of chunks, on tcp/ip level, as you don't force it (tcp) to go into ramp-up(slow-start) phase for each chunk. Our chunks upload sequentially, so they actually slow you down.

@YasenT YasenT requested a review from mdellweg February 10, 2026 10:20
@mdellweg
Copy link
Member

The chunksize is there to reflect a limitation of the server, and therefore no, it does not depend on the file, but on the server you upload to. Your change trades possible failure for some performance improvements in a way that leaves the user no way to react. So I come back to my original question: What is a good default chunksize and should we make it a per server setting?

@YasenT
Copy link
Contributor Author

YasenT commented Feb 12, 2026

@mdellweg the chunk uploads are causing issues as is. Which is actually turning it into a guess game for the end user - is it enough to set 20M, 60M, etc.
Would it be okay for you, if for this endpoint chunking is disabled by default except if the user doesn't specify a size? Which in general feels the more "natural" flow. Enable chunking only if i need it, instead of using it to go around the "defaults"

@mdellweg
Copy link
Member

@mdellweg the chunk uploads are causing issues as is. Which is actually turning it into a guess game for the end user - is it enough to set 20M, 60M, etc.

Yes, sadly it's a guessing game if the user does not have access to the very nginx/apache configuration. Certainly they cannot simply adjust that setting, therefore we cannot just skip chunking for infinitely large files.

Would it be okay for you, if for this endpoint chunking is disabled by default except if the user doesn't specify a size? Which in general feels the more "natural" flow. Enable chunking only if i need it, instead of using it to go around the "defaults"

This is about what is the best solution for most users that at the same time does not prevent any user from uploading a file that they could upload before. Also it's not about the api-endpoint. The user of this library is deliberately abstracted away from the api. That's why there is some heuristic to determine how to accomplish the task promised to the user.

So if I understand your suggestion correctly, we would allow the chunk size to be unspecified (reading that as infinity), and instead of finding a proper default value drop it completely.
I would then still add a way to configure the chunksize (not just for RPM) in each server profile so when hitting the limit once the user can use the experience to persist that parameter in the settings and forget about it.

@mdellweg
Copy link
Member

How about we do this:

So if I understand your suggestion correctly, we would allow the chunk size to be unspecified (reading that as infinity), and instead of finding a proper default value drop it completely.
And may i add make unspecified the default.
I think the ridiculously small value is a result of actually wanting to trigger that codepath in a not too expensive test scenario. But since we learned that there is not the one right answer, we should not claim to have the proper default.

I can take adding the settings aspect of this later.

@YasenT
Copy link
Contributor Author

YasenT commented Feb 18, 2026

@mdellweg I hope I understood you correctly, and made the change global.
Initially went with sys.maxsize but this caused some errors, so right now i'm going with a lower value.

Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close enough. Thank you!

@mdellweg mdellweg merged commit 1cc25cd into pulp:main Feb 18, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments