Service Tiers
Service Tiers
The service_tier parameter lets you control cost and latency tradeoffs when sending requests through OpenRouter. You can pass it in your request to select a specific processing tier, and the response will indicate which tier was actually used. Your request is billed at the actual served tier’s rate.
Using Service Tiers
Pass service_tier as a top-level parameter in your request body. Supported values are flex (lower cost, higher latency) and priority (faster, higher cost). The example below requests the flex tier from OpenAI’s gpt-5 for a 50% discount in exchange for higher latency and lower availability.
The service_tier parameter is also accepted on the Responses API and the Anthropic Messages API — see API Response Differences below for where the response field is returned in each.
Supported Providers
The following providers support flex and priority for select models. The response’s service_tier field reports which tier was actually used.
OpenAI
- Possible response values:
default,flex,priority
Learn more in OpenAI’s Chat Completions and Responses API documentation. See OpenAI’s pricing page for details on cost differences between tiers.
Google (Vertex AI)
- Possible response values:
standard,flex,priority
Learn more in Google’s Flex and Priority documentation.
Google (AI Studio)
- Possible response values:
standard,flex,priority
Learn more in Google’s Flex and Priority documentation.
API Response Differences
The API response includes a service_tier field that indicates which capacity tier was actually used to serve your request. The placement of this field varies by API format:
- Chat Completions API (
/api/v1/chat/completions):service_tieris returned at the top level of the response object, matching OpenAI’s native format. - Responses API (
/api/v1/responses):service_tieris returned at the top level of the response object, matching OpenAI’s native format. - Messages API (
/api/v1/messages):service_tieris returned inside theusageobject, matching Anthropic’s native format.