[cli/api] Add automatic retry with backoff for 429 rate limit responses #585

thomasdesr · 2025-12-10T04:04:22Z

Summary

This PR adds (& enables by default) automatic retry for HTTP 429 (Too Many Requests) the bk api commands
Added --verbose flag to observe retry behavior when using the CLI

Note: I didn't want to make this the default anywhere other than the CLI because I don't know what the consequences would be of making this the default 🙃 If you think its safe, happy to make it the default everywhere 🤣

Implementation Details

Built following the docs here: https://buildkite.com/docs/apis/rest-api/limits
When the client experiences a 429 it won't retry before RateLimit-Reset timer has expired
If after that, it still hits another 429, it starts to exponentially backoff on repeated rate limits
internal/http gained a few additional options to configure the behavior:
- WithMaxRetries to control how many attempts & WithMaxRetryDelay() to set an upper bound on retry intervals
- WithOnRetry callback (currently used by the --verbose flag for printing warnings about backoff being hit)

The HTTP client now automatically retries when the Buildkite API returns 429 responses. On the first retry, it waits exactly the duration specified in the RateLimit-Reset header. On subsequent retries, it multiplies the delay exponentially (2x, 4x, etc.) to reduce contention when multiple clients are competing for quota. Configurable via WithMaxRetries (default 3) and WithMaxRetryDelay (default 60s).

Adds WithOnRetry(func(attempt int, delay time.Duration)) option to the HTTP client. The callback is invoked before each retry sleep, allowing callers to log retry attempts, collect metrics, or implement custom notification logic.

When --verbose is enabled, the api command logs rate limit retries to stderr with the delay duration and expected resume time.

Remove default retry settings from the HTTP client to avoid changing behavior for other commands until we discuss with the team. The bk api command explicitly opts in to retry behavior.

Previously, min(delay, 0) would clamp delays to 0 when WithMaxRetryDelay wasn't called. Now only clamp when a positive max is configured.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

pkg/cmd/api/api.go

scadu · 2025-12-10T15:11:43Z

Hey @thomasdesr 👋
Thanks for the contribution!
While automatic retry for rate limit handling is usually great idea, I think in context of CLI, we might want to emit an error message stating that rate limit has been reached, and stop the current execution.
RateLimit-Reset could be included in the message to communicate clearly when the limit is reset.

Would you be able to update the PR to make CLI behave that way?

@scadu

@scadu comment

thomasdesr · 2025-12-10T20:20:32Z

So that's actually exactly how it behaves today :D When you hit the rate limit the CLI exits non-zero and returns an error that contains the notice that you've hit a back off.

The reason I wanted to add support (either default or opt-in) for respecting and retrying the rate limit within the CLI was that when scripting multiple calls, getting rate limit errors in the middle of a pipeline was quite difficult to handle correctly.

E.g. I was writing something like this to batch download logs for a build:

bk build view --pipeline "$ORG/$PIPELINE" "$build_id" \
  | jq -r '. as $b
      | $b.jobs[]
      | select(.type=="script")
      | [$b.pipeline.slug, $b.number, .id]
      | @tsv' \
  | parallel -j8 --colsep '\t' -u \
      'bk api --verbose "/pipelines/{1}/builds/{2}/jobs/{3}/log" | jq -r .content > {1}/{2}-{3}.log'

I can use something like parallel's built-in retries, but then I'm not actually going to respect the rate limit backoff instruction. To do that I'd need to wrap the bk cli and extract the backoff time from the error message? At that point I might as well use the SDK directly xD

Would love advice if I'm going about this wrong :D

scadu · 2025-12-12T10:24:05Z

@thomasdesr, after revisiting it, I agree that handling 429s for api commands would be welcome addition. It looks like we currently only handle it for build|job GETs through go-buildkite.

If you can continue working on this pull request, I'd recommend using https://github.com/buildkite/roko to handle retries – it's a library already used by our agent, agent-stack-k8s and others.

Again, thanks a ton for your contribution! 🙇

Use Buildkite's roko library for retry handling instead of the custom retry loop. This aligns with patterns used in other Buildkite projects (agent, agent-stack-k8s).

thomasdesr added 5 commits December 9, 2025 19:46

Add OnRetry callback for retry observability

bcc1b30

Adds WithOnRetry(func(attempt int, delay time.Duration)) option to the HTTP client. The callback is invoked before each retry sleep, allowing callers to log retry attempts, collect metrics, or implement custom notification logic.

Add --verbose flag to bk api for rate limit retry logging

6af37ac

When --verbose is enabled, the api command logs rate limit retries to stderr with the delay duration and expected resume time.

Disable retry by default in HTTP client, opt-in for bk api only

1fefc89

Remove default retry settings from the HTTP client to avoid changing behavior for other commands until we discuss with the team. The bk api command explicitly opts in to retry behavior.

Fix maxRetryDelay clamping when no max is configured

844d1e3

Previously, min(delay, 0) would clamp delays to 0 when WithMaxRetryDelay wasn't called. Now only clamp when a positive max is configured.

thomasdesr requested a review from a team as a code owner December 10, 2025 04:04

chatgpt-codex-connector bot reviewed Dec 10, 2025

View reviewed changes

pkg/cmd/api/api.go Outdated Show resolved Hide resolved

Review: Use root --verbose flag

c24a59a

JoeColeman95 enabled auto-merge (squash) December 10, 2025 14:57

JoeColeman95 previously approved these changes Dec 10, 2025

View reviewed changes

JoeColeman95 disabled auto-merge December 10, 2025 15:00

Replace hand-rolled retry logic with buildkite/roko

7484bcf

Use Buildkite's roko library for retry handling instead of the custom retry loop. This aligns with patterns used in other Buildkite projects (agent, agent-stack-k8s).

thomasdesr force-pushed the thomas/add-automatic-429-retry branch from 603731b to 7484bcf Compare December 27, 2025 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cli/api] Add automatic retry with backoff for 429 rate limit responses #585

[cli/api] Add automatic retry with backoff for 429 rate limit responses #585

thomasdesr commented Dec 10, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

scadu commented Dec 10, 2025

Uh oh!

thomasdesr commented Dec 10, 2025

Uh oh!

scadu commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[cli/api] Add automatic retry with backoff for 429 rate limit responses #585

Are you sure you want to change the base?

[cli/api] Add automatic retry with backoff for 429 rate limit responses #585

Conversation

thomasdesr commented Dec 10, 2025

Summary

Implementation Details

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

scadu commented Dec 10, 2025

Uh oh!

thomasdesr commented Dec 10, 2025

Uh oh!

scadu commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants