batches: retry changeset spec upload by default#705
Conversation
Fixes sourcegraph/sourcegraph#30431.
eseliger
left a comment
There was a problem hiding this comment.
Good improvement. I wonder if we could actually benefit from implementing some http middleware that retries based on the response code and body.
Ie. we would definitely want to retry on 5xx errors and on network errors, but we would not actually want to retry on 413 errors and run into the same issue 5 times.
That way, we could also get this for both the batch spec as well as the changeset spec.
Just a thought though, I don't think this blocks merging this as-is.
| flagSet.IntVar( | ||
| &caf.retries, "retries", 5, | ||
| "The maximum number of retries that will be made when sending a changeset spec.", | ||
| ) |
There was a problem hiding this comment.
I think 99% of users would assume this controls retries for step command executions, not a small implementation detail in one of the many subsets of a batch spec execution.
I would probably prefer to be very explicit like -changeset-spec-upload-retries.
| if len(errs.Errors()) > max { | ||
| return errs | ||
| } | ||
| time.Sleep(wait) |
There was a problem hiding this comment.
Should this listen on context cancellations? Could (not tested) prevent immediate quits using SIGINTs during the upload stage
internal/batches/service/service.go
Outdated
| if ok, err := svc.newRequest(createChangesetSpecMutation, map[string]interface{}{ | ||
| "spec": string(raw), | ||
| }).Do(ctx, &result); err != nil || !ok { |
There was a problem hiding this comment.
that was a bug before, right? When ok is true and err is nil?
|
This has been sitting here for a while; what's the plan for this? Does it still make sense to merge it? |
Fixes https://github.com/sourcegraph/sourcegraph/issues/30431.
This only adds retries to changeset spec upload — although the original ticket also mentions batch spec upload, I'm honestly less concerned about that. Let's deal with the n upload case before the +1 upload case.
There's a little bit of plumbing here to handle the retries, which I think is mostly uninteresting.
Test plan
I've added new unit tests to cover the retrier,
CreateChangesetSpec, and tested this quickly by hand to ensure the overall commands still operate.