Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- <docs>
- ## OpenRouter API Documentation for AI Consumption
- This document outlines how to interact with the OpenRouter API via HTTP, providing access to a wide array of AI models through a unified interface.
- ### 1\. Core Concepts
- * **Unified API**: OpenRouter provides a single API endpoint (`https://openrouter.ai/api/v1`) to access numerous AI models from various providers. It handles model fallbacks and can optimize for cost or performance.
- * **HTTP Interaction**: The primary way to use the API directly is via HTTP POST requests to specific endpoints.
- * **OpenAI Compatibility**: The API schema is very similar to OpenAI's, making it a drop-in replacement in many cases. OpenRouter normalizes schemas across different providers.
- ### 2\. Authentication
- * **Method**: Bearer Token.
- * **Header**: `Authorization: Bearer <OPENROUTER_API_KEY>`
- * Replace `<OPENROUTER_API_KEY>` with your actual OpenRouter API key.
- * **Optional Headers for Leaderboard/Analytics**:
- * `HTTP-Referer: <YOUR_SITE_URL>`: Your application's URL for ranking on OpenRouter.ai.
- * `X-Title: <YOUR_SITE_NAME>`: Your application's name for ranking on OpenRouter.ai.
- ### 3\. Key API Endpoints
- #### 3.1. Chat Completions
- * **Endpoint**: `https://openrouter.ai/api/v1/chat/completions`
- * **Method**: `POST`
- * **Content-Type**: `application/json`
- * **Request Body (JSON)**:
- * `model` (string, optional): The ID of the model to use (e.g., `openai/gpt-4o`, `anthropic/claude-3.5-sonnet`). If unspecified, uses the user's default model. Find model IDs via the `/models` endpoint or the [Models Page](https://openrouter.ai/models).
- * `messages` (array of objects, required): A list of message objects describing the conversation history.
- * Each message object:
- * `role` (string): `system`, `user`, `assistant`, or `tool`.
- * `content` (string or array of content parts for `user` role): The message content.
- * For multimodal input (images), the `user` role's `content` can be an array:
- * `{"type": "text", "text": "Your text prompt"}`
- * `{"type": "image_url", "image_url": {"url": "URL_OR_BASE64_DATA_URL", "detail": "auto|low|high"}}` (See section 4.3)
- * `{"type": "file", "file": {"filename": "document.pdf", "file_data": "BASE64_DATA_URL"}}` (See section 4.3)
- * `name` (string, optional): Identifier for the message sender, especially for `tool` role or distinguishing users/assistants.
- * `tool_call_id` (string, required for `role: "tool"`): The ID of the tool call this message is a result for.
- * `tool_calls` (array of objects, for `role: "assistant"` when tools are called): The tool calls generated by the model. (See section 4.2)
- * `stream` (boolean, optional, default: `false`): If `true`, enables streaming responses via Server-Sent Events (SSE).
- * `temperature` (float, optional, default: 1.0, range: 0.0-2.0): Controls randomness. Lower is more deterministic.
- * `max_tokens` (integer, optional): Maximum number of tokens to generate.
- * `top_p` (float, optional, default: 1.0, range: 0.0-1.0): Nucleus sampling.
- * `top_k` (integer, optional, default: 0): Top-K sampling.
- * `frequency_penalty` (float, optional, default: 0.0, range: -2.0-2.0): Penalizes new tokens based on their existing frequency.
- * `presence_penalty` (float, optional, default: 0.0, range: -2.0-2.0): Penalizes new tokens based on whether they appear in the text so far.
- * `repetition_penalty` (float, optional, default: 1.0, range: 0.0-2.0): Penalizes repetition.
- * `stop` (string or array of strings, optional): Sequences where the API will stop generating further tokens.
- * `seed` (integer, optional): For deterministic sampling (if supported by the model).
- * `response_format` (object, optional): E.g., `{"type": "json_object"}` to enable JSON mode. (See section 4.1)
- * `tools` (array of objects, optional): Definitions of tools the model can call. (See section 4.2)
- * `tool_choice` (string or object, optional): Controls how the model uses tools (e.g., `auto`, `none`, specific function). (See section 4.2)
- * `transforms` (array of strings, optional): E.g., `["middle-out"]`. (See section 4.4)
- * `models` (array of strings, optional): List of fallback model IDs. (See section 4.5)
- * `provider` (object, optional): Preferences for provider routing. (See section 4.5)
- * `reasoning` (object, optional): Configuration for reasoning tokens. (See section 4.6)
- * `usage` (object, optional): E.g., `{"include": true}`. (See section 4.7)
- * `plugins` (array of objects, optional): For features like PDF parsing or web search.
- * E.g., `[{"id": "file-parser", "pdf": {"engine": "mistral-ocr"}}]`
- * E.g., `[{"id": "web", "max_results": 3, "search_prompt": "Relevant web results:"}]`
- * **Response**:
- * Standard: JSON object with `id`, `choices`, `created`, `model`, `object`, `usage` (if not streaming or `usage: {include: true}` is set).
- * Streaming: A stream of Server-Sent Events. Each event is `data: <JSON_CHUNK>\n\n`. The stream ends with `data: [DONE]\n\n`. The last chunk may contain the `usage` object if `usage: {include: true}` was set.
- * `choices` (array): Contains message object(s).
- * `message`:
- * `role`: `assistant`
- * `content`: Generated text.
- * `tool_calls` (array, optional): If the model decides to call tools.
- * `reasoning` (string, optional): Reasoning steps if requested and provided by the model.
- * `delta` (for streaming): Contains the partial message update.
- * `finish_reason` (string): `stop`, `length`, `tool_calls`, `content_filter`, `error`.
- * `native_finish_reason` (string): Raw finish reason from the provider.
- #### 3.2. Completions (Legacy)
- * **Endpoint**: `https://openrouter.ai/api/v1/completions`
- * **Method**: `POST`
- * **Content-Type**: `application/json`
- * **Note**: This is for text-only (prompt-based) completions. Prefer `/chat/completions` for newer models and chat-based interactions.
- * **Request Body**: Similar to `/chat/completions` but uses `prompt` (string) instead of `messages`.
- #### 3.3. List Available Models
- * **Endpoint**: `https://openrouter.ai/api/v1/models`
- * **Method**: `GET`
- * **Response**: JSON array of model objects. Each object includes:
- * `id` (string): Model identifier (e.g., `openai/gpt-4o`).
- * `name` (string): Human-readable name.
- * `context_length` (integer): Maximum context window size.
- * `pricing` (object): Price per token for prompt and completion, per image, per request.
- * `prompt` (string): Price per prompt token (USD).
- * `completion` (string): Price per completion token (USD).
- * `image` (string): Price per image (USD).
- * `request` (string): Price per request (USD).
- * `quantization` (string): E.g., `fp16`, `int8`.
- * `max_completion_tokens` (integer, optional).
- * `supported_parameters` (array of strings, optional): Lists parameters like `tools`, `structured_outputs` if supported.
- #### 3.4. Get Generation Details (Cost/Stats)
- * **Endpoint**: `https://openrouter.ai/api/v1/generation`
- * **Method**: `GET`
- * **Query Parameter**: `id=<GENERATION_ID>` (The `id` returned from a completion request).
- * **Authentication**: Requires `Authorization: Bearer <OPENROUTER_API_KEY>`.
- * **Response**: JSON object with detailed metadata, including native token counts and cost for the specified generation. This is more precise for billing than the `usage` object in completion responses.
- #### 3.5. Check API Key Info (Rate Limits/Credits)
- * **Endpoint**: `https://openrouter.ai/api/v1/auth/key` (also documented as `/api/v1/key`)
- * **Method**: `GET`
- * **Authentication**: Requires `Authorization: Bearer <OPENROUTER_API_KEY>`.
- * **Response**: JSON object with:
- * `limit` (float or null): Credit limit for the key.
- * `usage` (float): Credits used.
- * `rate_limit` (object): `requests` (integer) and `interval` (string, e.g., "10s").
- #### 3.6. Manage API Keys (Provisioning)
- * **Base Endpoint**: `https://openrouter.ai/api/v1/keys`
- * **Authentication**: Requires a special *Provisioning API key* as Bearer Token.
- * **Endpoints**:
- * `GET /keys`: List API keys (paginated with `offset`).
- * `POST /keys`: Create a new API key.
- * Body: `{"name": "KeyName", "label": "optional-label", "limit": 1000.00 (optional credit limit)}`
- * `GET /keys/{hash}`: Get details of a specific key.
- * `PATCH /keys/{hash}`: Update a key (e.g., name, disabled status, limit).
- * `DELETE /keys/{hash}`: Delete a key.
- ### 4\. Key Features and Topics via HTTP
- #### 4.1. Structured Outputs
- * **Purpose**: Enforce JSON Schema validation on model responses.
- * **Mechanism**: Use the `response_format` parameter in the `/chat/completions` request.
- ```json
- "response_format": {
- "type": "json_schema",
- "json_schema": {
- "name": "your_function_or_schema_name",
- "strict": true, // Recommended for exact schema adherence
- "schema": { /* Your JSON Schema Definition */
- "type": "object",
- "properties": {
- "propertyName": {"type": "string", "description": "Details for AI"}
- },
- "required": ["propertyName"]
- }
- }
- }
- ```
- * **Model Support**: Check model details on the [Models Page](https://openrouter.ai/models) (filter by `supported_parameters: structured_outputs`) or use `provider: {"require_parameters": true}`. Supported by newer OpenAI models (GPT-4o+) and Fireworks-provided models.
- * **Best Practices**:
- * Include descriptive `description` fields in your schema properties to guide the model.
- * Set `strict: true` for precise schema adherence.
- * **Streaming**: Supported. Partial JSON will be streamed.
- * **Note**: Also supported is `"response_format": {"type": "json_object"}` for general JSON output without a specific schema.
- #### 4.2. Tool Calling (Function Calling)
- * **Purpose**: Allow LLMs to request execution of external tools/functions. The LLM suggests the call; your application executes it and returns results.
- * **Mechanism (Request)**:
- * `tools` (array): Define available tools.
- ```json
- "tools": [
- {
- "type": "function",
- "function": {
- "name": "your_function_name",
- "description": "What this function does.",
- "parameters": { // JSON Schema for function arguments
- "type": "object",
- "properties": {
- "param1": {"type": "string", "description": "Description of param1"}
- },
- "required": ["param1"]
- }
- }
- }
- ]
- ```
- * `tool_choice` (string or object, optional):
- * `"none"`: Model will not call a tool.
- * `"auto"`: Model decides whether to call a tool.
- * `{"type": "function", "function": {"name": "your_function_name"}}`: Force calling a specific tool.
- * `"required"`: (Supported by some models like newer OpenAI) Model must call one or more tools.
- * **Mechanism (Handling Response & Follow-up)**:
- 1. **Initial Request**: Send messages + `tools` (+ `tool_choice`).
- 2. **Model Response (Assistant Turn)**: If the model decides to call a tool, `message.tool_calls` will be populated:
- ```json
- "tool_calls": [
- {
- "id": "call_abc123", // Unique ID for this call
- "type": "function",
- "function": {
- "name": "your_function_name",
- "arguments": "{\"param1\": \"value1\"}" // Stringified JSON
- }
- }
- ]
- ```
- The `finish_reason` will be `tool_calls`.
- 3. **Your Application**:
- * Parse `tool_calls[i].function.arguments`.
- * Execute the function `tool_calls[i].function.name` with these arguments.
- * Get the result from your function.
- 4. **Follow-up Request**: Send a new request to `/chat/completions` including:
- * Original messages.
- * The assistant's message that contained the `tool_calls`.
- * A new message with `role: "tool"` for each tool call result:
- ```json
- {
- "role": "tool",
- "tool_call_id": "call_abc123", // from tool_calls[i].id
- "name": "your_function_name", // from tool_calls[i].function.name
- "content": "{\"result\": \"your_function_output\"}" // Stringified JSON of the tool's output
- }
- ```
- 5. **Final Model Response**: The model will use the tool's output to generate a final response to the user.
- #### 4.3. Images & PDFs
- * **Endpoint**: `/api/v1/chat/completions`
- * **Sending Images**:
- * Within the `messages` array, for a `user` role message, `content` becomes an array of parts.
- * **Image URL**:
- ```json
- {
- "role": "user",
- "content": [
- {"type": "text", "text": "What's in this image?"},
- {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
- ]
- }
- ```
- * **Base64 Encoded Image**:
- ```json
- {
- "role": "user",
- "content": [
- {"type": "text", "text": "Describe this local image:"},
- {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<BASE64_STRING>"}}
- ]
- }
- ```
- Supported image types: `image/png`, `image/jpeg`, `image/webp`.
- * Multiple images can be sent in separate `image_url` content parts.
- * Recommendation: Send text prompt before images, or place images in system prompt if they must come first.
- * **Sending PDFs**:
- * Works with **any** model on OpenRouter. If the model doesn't natively support files, OpenRouter parses it.
- * Within the `messages` array, for a `user` role message, use `content` array with type `file`.
- * **Base64 Encoded PDF**:
- ```json
- {
- "role": "user",
- "content": [
- {"type": "text", "text": "Summarize this PDF:"},
- {
- "type": "file",
- "file": {
- "filename": "mydoc.pdf",
- "file_data": "data:application/pdf;base64,<BASE64_STRING>"
- }
- }
- ]
- }
- ```
- * **PDF Processing Engine Configuration (Optional)**:
- Use the `plugins` parameter in the request body:
- ```json
- "plugins": [
- {
- "id": "file-parser",
- "pdf": {
- "engine": "mistral-ocr | pdf-text | native" // default: native then pdf-text
- }
- }
- ]
- ```
- * `mistral-ocr`: Best for scanned/image-based PDFs (cost: $MISTRAL\_OCR\_COST/1000 pages).
- * `pdf-text`: Best for text-based PDFs (Free).
- * `native`: Uses model's native file processing if available (charged as input tokens).
- * **Reusing PDF Annotations (Skip Reparsing Costs)**:
- 1. Initial request with PDF.
- 2. The assistant's response message may contain `annotations`.
- 3. In subsequent requests about the same PDF, include the original PDF data in the `user` message AND the assistant's previous message with its `annotations` in the conversation history. OpenRouter will use these to avoid reparsing.
- #### 4.4. Message Transforms
- * **Purpose**: Optimize prompts that exceed a model's context window, especially when perfect recall isn't critical.
- * **Mechanism**: Use the `transforms` parameter in the `/chat/completions` request.
- ```json
- "transforms": ["middle-out"]
- ```
- * **`middle-out` Behavior**:
- * Removes or truncates messages from the *middle* of the prompt history until it fits. This is based on research showing LLMs pay less attention to the middle.
- * Also handles message count limits (e.g., Anthropic's {anthropicMaxMessagesCount} messages) by keeping half from the start and half from the end.
- * OpenRouter tries models with at least half the required tokens; if none, uses the model with the largest context.
- * **Default**: `middle-out` is default for models with \<= 8k context length. To disable, send `transforms: []`.
- #### 4.5. Uptime Optimization & Provider Routing
- * **OpenRouter's Default Behavior**:
- * Monitors provider health (response times, error rates, availability).
- * Load balances requests across providers, prioritizing price and stability.
- * Automatically falls back to other providers on errors/rate limits.
- * **Customizing Provider Routing**: Use the `provider` object in the `/chat/completions` request.
- ```json
- "provider": {
- "order": ["ProviderName1", "ProviderName2"], // Try providers in this specific order.
- "allow_fallbacks": true, // (boolean, default: true) Whether to use other providers if ordered/primary ones fail.
- "require_parameters": false, // (boolean, default: false) If true, only route to providers supporting all request params.
- "data_collection": "allow", // ("allow" | "deny", default: "allow") Filter providers based on data logging/training policies.
- "only": ["ProviderName1"], // Only use providers from this list.
- "ignore": ["ProviderNameToSkip"], // Skip providers from this list.
- "quantizations": ["fp16", "int8"], // Filter by quantization levels.
- "sort": "price | throughput | latency", // Explicitly sort providers by this metric (disables default load balancing).
- "max_price": {"prompt": 0.5, "completion": 1.5} // Max price in USD per million tokens. Can also include "request" and "image".
- }
- ```
- * **Model Variants (Shortcuts for Routing)**: Append to model slug.
- * `:nitro` (e.g., `meta-llama/llama-3.1-70b-instruct:nitro`): Equivalent to `provider: {"sort": "throughput"}`.
- * `:floor` (e.g., `meta-llama/llama-3.1-70b-instruct:floor`): Equivalent to `provider: {"sort": "price"}`.
- * Other variants: `:free`, `:beta`, `:extended`, `:thinking`, `:online`.
- * **Model Fallbacks with `models` parameter**:
- ```json
- "model": "primary/model-id",
- "models": ["fallback/model1-id", "fallback/model2-id"]
- ```
- If `primary/model-id` fails, OpenRouter tries `fallback/model1-id`, then `fallback/model2-id`. The `model` field in the response will indicate which model was actually used.
- #### 4.6. Reasoning Tokens (Thinking Tokens)
- * **Purpose**: Get insight into the model's step-by-step reasoning process. Charged as output tokens.
- * **Availability**: Returned in the `reasoning` field of the assistant's message if the model supports and outputs them. Some models use reasoning internally but don't return the tokens.
- * **Mechanism**: Use the `reasoning` parameter in the `/chat/completions` request.
- ```json
- "reasoning": {
- // Choose one of these:
- "effort": "low | medium | high", // For OpenAI o-series style control.
- "max_tokens": 2000, // For Anthropic/Gemini style control (specific token limit).
- // Optional:
- "exclude": false // (boolean, default: false) If true, use reasoning internally but don't return it in the response.
- }
- ```
- * If `effort` is used with models supporting `max_tokens`, it's converted (high \~80%, medium \~50%, low \~20% of `max_tokens` parameter if provided, else model's max completion tokens).
- * If `max_tokens` (in `reasoning`) is used with models supporting `effort`, it influences the effort level.
- * **Legacy Parameters**:
- * `include_reasoning: true` is equivalent to `reasoning: {}`.
- * `include_reasoning: false` is equivalent to `reasoning: { "exclude": true }`.
- * **Anthropic Specifics**:
- * Can use `:thinking` variant (e.g., `anthropic/claude-3.5-sonnet:thinking`), which defaults to high effort.
- * `reasoning.max_tokens` for Anthropic is capped at 32,000 and min 1024.
- * The main `max_tokens` request parameter must be greater than the reasoning budget.
- #### 4.7. Usage Accounting
- * **Purpose**: Track token counts (prompt, completion, cached, reasoning) and cost directly in the API response without extra calls. Uses model's native tokenizer.
- * **Mechanism**: Use the `usage` parameter in the `/chat/completions` request.
- ```json
- "usage": {
- "include": true
- }
- ```
- * **Response**:
- * **Non-streaming**: The main response object will contain a `usage` field.
- * **Streaming**: The *last* SSE message (after `[DONE]` or as part of it) will contain the `usage` field. `choices` array in this chunk will be empty.
- <!-- end list -->
- ```json
- // Example usage object in response
- "usage": {
- "completion_tokens": 150,
- "completion_tokens_details": {"reasoning_tokens": 50},
- "cost": 250, // Example cost in smallest credit unit or a standardized format
- "prompt_tokens": 100,
- "prompt_tokens_details": {"cached_tokens": 20},
- "total_tokens": 250
- }
- ```
- * **Performance**: Adds a few hundred ms to the last response chunk for calculation.
- * **Alternative**: Use the `/api/v1/generation?id=<GENERATION_ID>` endpoint for more detailed and potentially more up-to-date usage info after completion.
- ### 5\. Other Important Information
- #### 5.1. Rate Limits
- * **General**: Function of credits remaining on the API key/account. Typically 1 request per credit per second, up to a surge limit (e.g., 500 req/s).
- * **Free Models** (ID ending in `:free`):
- * {FREE\_MODEL\_RATE\_LIMIT\_RPM} requests per minute.
- * Daily limits:
- * Less than {FREE\_MODEL\_CREDITS\_THRESHOLD} credits purchased: {FREE\_MODEL\_NO\_CREDITS\_RPD} free requests/day.
- * At least {FREE\_MODEL\_CREDITS\_THRESHOLD} credits purchased: {FREE\_MODEL\_HAS\_CREDITS\_RPD} free requests/day.
- * **DDoS Protection**: Cloudflare may block excessive requests.
- * **Checking Limits**: `GET /api/v1/auth/key` (see section 3.5).
- #### 5.2. Error Handling
- * **Response Structure**:
- ```json
- {
- "error": {
- "code": <HTTP_STATUS_CODE_INTEGER>,
- "message": "Error description.",
- "metadata": { /* Optional: provider_name, raw error, reasons for moderation, etc. */ }
- }
- }
- ```
- * **HTTP Status Codes**:
- * `400 Bad Request`: Invalid params, CORS.
- * `401 Unauthorized`: Invalid API key, expired OAuth.
- * `402 Payment Required`: Insufficient credits.
- * `403 Forbidden`: Input flagged by moderation. `metadata` will contain `reasons`, `flagged_input`.
- * `408 Request Timeout`.
- * `429 Too Many Requests`: Rate limited.
- * `502 Bad Gateway`: Model down or invalid response from provider. `metadata` may contain `provider_name`, `raw` error.
- * `503 Service Unavailable`: No provider meets routing requirements.
- * **In-stream Errors**: For streaming, if an error occurs during generation, an error object can be part of an SSE data event, while the HTTP status remains `200 OK`.
- #### 5.3. Streaming
- * **Enable**: `"stream": true` in request body.
- * **Format**: Server-Sent Events (SSE).
- * Lines starting with `:` are comments (e.g., `: OPENROUTER PROCESSING`) and can be ignored.
- * Data events: `data: <JSON_CHUNK>\n\n`.
- * End of stream: `data: [DONE]\n\n`.
- * **Stream Cancellation**: Aborting the HTTP connection can cancel processing for supported providers, stopping billing. Provider support varies.
- #### 5.4. Prompt Caching
- * **Purpose**: Save on inference costs by reusing responses for identical cached prompt segments.
- * **Inspection**: `cache_discount` in `/api/v1/generation` response, or `cached_tokens` in `usage` object if `usage: {include: true}`.
- * **OpenAI**: Automated for prompts \> 1024 tokens. No cost for writes; reads are \~0.5x-0.75x input price.
- * **Anthropic Claude**: Requires `cache_control` breakpoints in message content. Writes cost {ANTHROPIC\_CACHE\_WRITE\_MULTIPLIER}x input, reads {ANTHROPIC\_CACHE\_READ\_MULTIPLIER}x input. Cache expires in \~5 mins. Limit of 4 breakpoints.
- ```json
- "content": [
- {"type": "text", "text": "HUGE TEXT BODY", "cache_control": {"type": "ephemeral"}}
- ]
- ```
- * **DeepSeek**: Automated. Writes at input price, reads {DEEPSEEK\_CACHE\_READ\_MULTIPLIER}x input.
- * **Google Gemini**:
- * **Implicit Caching** (Gemini 2.5 Pro/Flash): Automatic. No write/storage cost. Reads {GOOGLE\_CACHE\_READ\_MULTIPLIER}x input. TTL \~3-5 mins. Min tokens: {GOOGLE\_CACHE\_MIN\_TOKENS\_2\_5\_FLASH} (Flash), {GOOGLE\_CACHE\_MIN\_TOKENS\_2\_5\_PRO} (Pro).
- * **Explicit Caching** (Other Gemini): Requires `cache_control` breakpoints (similar to Anthropic, but only last breakpoint used by OpenRouter for Gemini). Writes cost input + 5 mins storage. Reads {GOOGLE\_CACHE\_READ\_MULTIPLIER}x input. TTL 5 mins. Min \~4096 tokens for write.
- #### 5.5. Web Search
- * **Purpose**: Augment prompts with real-time web search results for any model.
- * **Mechanism**:
- 1. **Model Variant**: Append `:online` to model slug (e.g., `openai/gpt-4o:online`).
- 2. **Plugin**:
- ```json
- "plugins": [{
- "id": "web",
- "max_results": 5, // (integer, optional, default: 5)
- "search_prompt": "Custom prompt for incorporating results" // (string, optional)
- }]
- ```
- * **Powered by**: Exa.ai (auto method: keyword + embeddings).
- * **Results Parsing**: Standardized in `annotations` field of assistant's message.
- ```json
- "annotations": [{
- "type": "url_citation",
- "url_citation": {
- "url": "https://example.com/result",
- "title": "Search Result Title",
- "content": "Snippet of the result", // Added by OpenRouter if available
- "start_index": 10, "end_index": 25 // Indices in assistant's content string
- }
- }]
- ```
- * **Plugin Pricing**: $4 per 1000 results from Exa (default 5 results = $0.02 per request + LLM usage).
- * **Native Web Search (Non-plugin)**: Some models (OpenAI, Perplexity) have built-in search.
- * Control with `web_search_options: {"search_context_size": "low|medium|high"}`.
- * Pricing varies by model and context size (e.g., OpenAI GPT-4o high: $50/1000 requests; Perplexity Sonar high: $12/1000 requests).
- </docs>
Add Comment
Please, Sign In to add comment