Skip to main content
Vast.ai enforces rate limits per endpoint and method using a token bucket model. These limits apply per API key (or per IP if no key is provided). Each endpoint has an independent token bucket defined by:
  • Max tokens (burst capacity): how many requests you can make in rapid succession.
  • Refresh rate (tokens/sec): how quickly tokens refill — this is your sustained request rate.
  • Penalty tokens: extra tokens deducted on a 429 rejection, extending recovery time.
These values reflect current defaults and may change. Always check the Retry-After and X-RateLimit-Reset response headers for the most accurate pacing.

Example rate limits

The table below shows representative limits across common endpoint categories:
EndpointMethodMax TokensRefresh Rate (tok/s)Sustained Req/minPenalty Tokens
/api/v0/instances/GET10.50300
/api/v0/instances/{id}/PUT11.00600
/api/v0/instances/{id}/DELETE10.33200
/api/v0/machines/GET10.40240
/api/v0/volumes/GET10.50300
/api/v0/template/GET10.53310
/api/v0/ssh/GET11.00600
/api/v0/invoicesGET10.33200
/api/v0/secrets/GET10.20120
/api/v0/workergroups/GET10.50300
Reading the table:
  • Max Tokens = 1 means no burst allowance — each request must wait for a token to refill. This is the most common configuration.
  • Refresh Rate is the inverse of the old threshold value (e.g., a 2s threshold becomes 0.50 tokens/sec).
  • Sustained Req/min is refresh_rate * 60 — the maximum throughput if you pace requests evenly.
  • Penalty Tokens = 0 means no extra cost on rejection. When configured, penalties push the bucket into debt, increasing Retry-After on subsequent 429s.
Write-heavy operations (create, update, delete) generally have stricter limits than read operations. If you need higher limits for production usage, contact support with your account details and expected call rates.