-
Notifications
You must be signed in to change notification settings - Fork 49
[cli/api] Add automatic retry with backoff for 429 rate limit responses #585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[cli/api] Add automatic retry with backoff for 429 rate limit responses #585
Conversation
The HTTP client now automatically retries when the Buildkite API returns 429 responses. On the first retry, it waits exactly the duration specified in the RateLimit-Reset header. On subsequent retries, it multiplies the delay exponentially (2x, 4x, etc.) to reduce contention when multiple clients are competing for quota. Configurable via WithMaxRetries (default 3) and WithMaxRetryDelay (default 60s).
Adds WithOnRetry(func(attempt int, delay time.Duration)) option to the HTTP client. The callback is invoked before each retry sleep, allowing callers to log retry attempts, collect metrics, or implement custom notification logic.
When --verbose is enabled, the api command logs rate limit retries to stderr with the delay duration and expected resume time.
Remove default retry settings from the HTTP client to avoid changing behavior for other commands until we discuss with the team. The bk api command explicitly opts in to retry behavior.
Previously, min(delay, 0) would clamp delays to 0 when WithMaxRetryDelay wasn't called. Now only clamp when a positive max is configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Hey @thomasdesr 👋 Would you be able to update the PR to make CLI behave that way? |
|
So that's actually exactly how it behaves today :D When you hit the rate limit the CLI exits non-zero and returns an error that contains the notice that you've hit a back off. The reason I wanted to add support (either default or opt-in) for respecting and retrying the rate limit within the CLI was that when scripting multiple calls, getting rate limit errors in the middle of a pipeline was quite difficult to handle correctly. E.g. I was writing something like this to batch download logs for a build: bk build view --pipeline "$ORG/$PIPELINE" "$build_id" \
| jq -r '. as $b
| $b.jobs[]
| select(.type=="script")
| [$b.pipeline.slug, $b.number, .id]
| @tsv' \
| parallel -j8 --colsep '\t' -u \
'bk api --verbose "/pipelines/{1}/builds/{2}/jobs/{3}/log" | jq -r .content > {1}/{2}-{3}.log'I can use something like parallel's built-in retries, but then I'm not actually going to respect the rate limit backoff instruction. To do that I'd need to wrap the Would love advice if I'm going about this wrong :D |
|
@thomasdesr, after revisiting it, I agree that handling 429s for If you can continue working on this pull request, I'd recommend using https://github.com/buildkite/roko to handle retries – it's a library already used by our agent, agent-stack-k8s and others. Again, thanks a ton for your contribution! 🙇 |
Use Buildkite's roko library for retry handling instead of the custom retry loop. This aligns with patterns used in other Buildkite projects (agent, agent-stack-k8s).
603731b to
7484bcf
Compare
Summary
429 (Too Many Requests)thebk apicommands--verboseflag to observe retry behavior when using the CLINote: I didn't want to make this the default anywhere other than the CLI because I don't know what the consequences would be of making this the default 🙃 If you think its safe, happy to make it the default everywhere 🤣
Implementation Details
429it won't retry beforeRateLimit-Resettimer has expired429, it starts to exponentially backoff on repeated rate limitsinternal/httpgained a few additional options to configure the behavior:WithMaxRetriesto control how many attempts &WithMaxRetryDelay()to set an upper bound on retry intervalsWithOnRetrycallback (currently used by the--verboseflag for printing warnings about backoff being hit)