openrouter api description for llm

<docs>
## OpenRouter API Documentation for AI Consumption

This document outlines how to interact with the OpenRouter API via HTTP, providing access to a wide array of AI models through a unified interface.

### 1\. Core Concepts

  * **Unified API**: OpenRouter provides a single API endpoint (`https://openrouter.ai/api/v1`) to access numerous AI models from various providers. It handles model fallbacks and can optimize for cost or performance.
  * **HTTP Interaction**: The primary way to use the API directly is via HTTP POST requests to specific endpoints.
  * **OpenAI Compatibility**: The API schema is very similar to OpenAI's, making it a drop-in replacement in many cases. OpenRouter normalizes schemas across different providers.

### 2\. Authentication

  * **Method**: Bearer Token.
  * **Header**: `Authorization: Bearer <OPENROUTER_API_KEY>`
      * Replace `<OPENROUTER_API_KEY>` with your actual OpenRouter API key.
  * **Optional Headers for Leaderboard/Analytics**:
      * `HTTP-Referer: <YOUR_SITE_URL>`: Your application's URL for ranking on OpenRouter.ai.
      * `X-Title: <YOUR_SITE_NAME>`: Your application's name for ranking on OpenRouter.ai.

### 3\. Key API Endpoints

#### 3.1. Chat Completions

  * **Endpoint**: `https://openrouter.ai/api/v1/chat/completions`
  * **Method**: `POST`
  * **Content-Type**: `application/json`
  * **Request Body (JSON)**:
      * `model` (string, optional): The ID of the model to use (e.g., `openai/gpt-4o`, `anthropic/claude-3.5-sonnet`). If unspecified, uses the user's default model. Find model IDs via the `/models` endpoint or the [Models Page](https://openrouter.ai/models).
      * `messages` (array of objects, required): A list of message objects describing the conversation history.
          * Each message object:
              * `role` (string): `system`, `user`, `assistant`, or `tool`.
              * `content` (string or array of content parts for `user` role): The message content.
                  * For multimodal input (images), the `user` role's `content` can be an array:
                      * `{"type": "text", "text": "Your text prompt"}`
                      * `{"type": "image_url", "image_url": {"url": "URL_OR_BASE64_DATA_URL", "detail": "auto|low|high"}}` (See section 4.3)
                      * `{"type": "file", "file": {"filename": "document.pdf", "file_data": "BASE64_DATA_URL"}}` (See section 4.3)
              * `name` (string, optional): Identifier for the message sender, especially for `tool` role or distinguishing users/assistants.
              * `tool_call_id` (string, required for `role: "tool"`): The ID of the tool call this message is a result for.
              * `tool_calls` (array of objects, for `role: "assistant"` when tools are called): The tool calls generated by the model. (See section 4.2)
      * `stream` (boolean, optional, default: `false`): If `true`, enables streaming responses via Server-Sent Events (SSE).
      * `temperature` (float, optional, default: 1.0, range: 0.0-2.0): Controls randomness. Lower is more deterministic.
      * `max_tokens` (integer, optional): Maximum number of tokens to generate.
      * `top_p` (float, optional, default: 1.0, range: 0.0-1.0): Nucleus sampling.
      * `top_k` (integer, optional, default: 0): Top-K sampling.
      * `frequency_penalty` (float, optional, default: 0.0, range: -2.0-2.0): Penalizes new tokens based on their existing frequency.
      * `presence_penalty` (float, optional, default: 0.0, range: -2.0-2.0): Penalizes new tokens based on whether they appear in the text so far.
      * `repetition_penalty` (float, optional, default: 1.0, range: 0.0-2.0): Penalizes repetition.
      * `stop` (string or array of strings, optional): Sequences where the API will stop generating further tokens.
      * `seed` (integer, optional): For deterministic sampling (if supported by the model).
      * `response_format` (object, optional): E.g., `{"type": "json_object"}` to enable JSON mode. (See section 4.1)
      * `tools` (array of objects, optional): Definitions of tools the model can call. (See section 4.2)
      * `tool_choice` (string or object, optional): Controls how the model uses tools (e.g., `auto`, `none`, specific function). (See section 4.2)
      * `transforms` (array of strings, optional): E.g., `["middle-out"]`. (See section 4.4)
      * `models` (array of strings, optional): List of fallback model IDs. (See section 4.5)
      * `provider` (object, optional): Preferences for provider routing. (See section 4.5)
      * `reasoning` (object, optional): Configuration for reasoning tokens. (See section 4.6)
      * `usage` (object, optional): E.g., `{"include": true}`. (See section 4.7)
      * `plugins` (array of objects, optional): For features like PDF parsing or web search.
          * E.g., `[{"id": "file-parser", "pdf": {"engine": "mistral-ocr"}}]`
          * E.g., `[{"id": "web", "max_results": 3, "search_prompt": "Relevant web results:"}]`
  * **Response**:
      * Standard: JSON object with `id`, `choices`, `created`, `model`, `object`, `usage` (if not streaming or `usage: {include: true}` is set).
      * Streaming: A stream of Server-Sent Events. Each event is `data: <JSON_CHUNK>\n\n`. The stream ends with `data: [DONE]\n\n`. The last chunk may contain the `usage` object if `usage: {include: true}` was set.
      * `choices` (array): Contains message object(s).
          * `message`:
              * `role`: `assistant`
              * `content`: Generated text.
              * `tool_calls` (array, optional): If the model decides to call tools.
              * `reasoning` (string, optional): Reasoning steps if requested and provided by the model.
          * `delta` (for streaming): Contains the partial message update.
      * `finish_reason` (string): `stop`, `length`, `tool_calls`, `content_filter`, `error`.
      * `native_finish_reason` (string): Raw finish reason from the provider.

#### 3.2. Completions (Legacy)

  * **Endpoint**: `https://openrouter.ai/api/v1/completions`
  * **Method**: `POST`
  * **Content-Type**: `application/json`
  * **Note**: This is for text-only (prompt-based) completions. Prefer `/chat/completions` for newer models and chat-based interactions.
  * **Request Body**: Similar to `/chat/completions` but uses `prompt` (string) instead of `messages`.

#### 3.3. List Available Models

  * **Endpoint**: `https://openrouter.ai/api/v1/models`
  * **Method**: `GET`
  * **Response**: JSON array of model objects. Each object includes:
      * `id` (string): Model identifier (e.g., `openai/gpt-4o`).
      * `name` (string): Human-readable name.
      * `context_length` (integer): Maximum context window size.
      * `pricing` (object): Price per token for prompt and completion, per image, per request.
          * `prompt` (string): Price per prompt token (USD).
          * `completion` (string): Price per completion token (USD).
          * `image` (string): Price per image (USD).
          * `request` (string): Price per request (USD).
      * `quantization` (string): E.g., `fp16`, `int8`.
      * `max_completion_tokens` (integer, optional).
      * `supported_parameters` (array of strings, optional): Lists parameters like `tools`, `structured_outputs` if supported.

#### 3.4. Get Generation Details (Cost/Stats)

  * **Endpoint**: `https://openrouter.ai/api/v1/generation`
  * **Method**: `GET`
  * **Query Parameter**: `id=<GENERATION_ID>` (The `id` returned from a completion request).
  * **Authentication**: Requires `Authorization: Bearer <OPENROUTER_API_KEY>`.
  * **Response**: JSON object with detailed metadata, including native token counts and cost for the specified generation. This is more precise for billing than the `usage` object in completion responses.

#### 3.5. Check API Key Info (Rate Limits/Credits)

  * **Endpoint**: `https://openrouter.ai/api/v1/auth/key` (also documented as `/api/v1/key`)
  * **Method**: `GET`
  * **Authentication**: Requires `Authorization: Bearer <OPENROUTER_API_KEY>`.
  * **Response**: JSON object with:
      * `limit` (float or null): Credit limit for the key.
      * `usage` (float): Credits used.
      * `rate_limit` (object): `requests` (integer) and `interval` (string, e.g., "10s").

#### 3.6. Manage API Keys (Provisioning)

  * **Base Endpoint**: `https://openrouter.ai/api/v1/keys`
  * **Authentication**: Requires a special *Provisioning API key* as Bearer Token.
  * **Endpoints**:
      * `GET /keys`: List API keys (paginated with `offset`).
      * `POST /keys`: Create a new API key.
          * Body: `{"name": "KeyName", "label": "optional-label", "limit": 1000.00 (optional credit limit)}`
      * `GET /keys/{hash}`: Get details of a specific key.
      * `PATCH /keys/{hash}`: Update a key (e.g., name, disabled status, limit).
      * `DELETE /keys/{hash}`: Delete a key.

### 4\. Key Features and Topics via HTTP

#### 4.1. Structured Outputs

  * **Purpose**: Enforce JSON Schema validation on model responses.
  * **Mechanism**: Use the `response_format` parameter in the `/chat/completions` request.
    ```json
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "your_function_or_schema_name",
        "strict": true, // Recommended for exact schema adherence
        "schema": { /* Your JSON Schema Definition */
          "type": "object",
          "properties": {
            "propertyName": {"type": "string", "description": "Details for AI"}
          },
          "required": ["propertyName"]
        }
      }
    }
    ```
  * **Model Support**: Check model details on the [Models Page](https://openrouter.ai/models) (filter by `supported_parameters: structured_outputs`) or use `provider: {"require_parameters": true}`. Supported by newer OpenAI models (GPT-4o+) and Fireworks-provided models.
  * **Best Practices**:
      * Include descriptive `description` fields in your schema properties to guide the model.
      * Set `strict: true` for precise schema adherence.
  * **Streaming**: Supported. Partial JSON will be streamed.
  * **Note**: Also supported is `"response_format": {"type": "json_object"}` for general JSON output without a specific schema.

#### 4.2. Tool Calling (Function Calling)

  * **Purpose**: Allow LLMs to request execution of external tools/functions. The LLM suggests the call; your application executes it and returns results.
  * **Mechanism (Request)**:
      * `tools` (array): Define available tools.
        ```json
        "tools": [
          {
            "type": "function",
            "function": {
              "name": "your_function_name",
              "description": "What this function does.",
              "parameters": { // JSON Schema for function arguments
                "type": "object",
                "properties": {
                  "param1": {"type": "string", "description": "Description of param1"}
                },
                "required": ["param1"]
              }
            }
          }
        ]
        ```
      * `tool_choice` (string or object, optional):
          * `"none"`: Model will not call a tool.
          * `"auto"`: Model decides whether to call a tool.
          * `{"type": "function", "function": {"name": "your_function_name"}}`: Force calling a specific tool.
          * `"required"`: (Supported by some models like newer OpenAI) Model must call one or more tools.
  * **Mechanism (Handling Response & Follow-up)**:
    1.  **Initial Request**: Send messages + `tools` (+ `tool_choice`).
    2.  **Model Response (Assistant Turn)**: If the model decides to call a tool, `message.tool_calls` will be populated:
        ```json
        "tool_calls": [
          {
            "id": "call_abc123", // Unique ID for this call
            "type": "function",
            "function": {
              "name": "your_function_name",
              "arguments": "{\"param1\": \"value1\"}" // Stringified JSON
            }
          }
        ]
        ```
        The `finish_reason` will be `tool_calls`.
    3.  **Your Application**:
          * Parse `tool_calls[i].function.arguments`.
          * Execute the function `tool_calls[i].function.name` with these arguments.
          * Get the result from your function.
    4.  **Follow-up Request**: Send a new request to `/chat/completions` including:
          * Original messages.
          * The assistant's message that contained the `tool_calls`.
          * A new message with `role: "tool"` for each tool call result:
            ```json
            {
              "role": "tool",
              "tool_call_id": "call_abc123", // from tool_calls[i].id
              "name": "your_function_name", // from tool_calls[i].function.name
              "content": "{\"result\": \"your_function_output\"}" // Stringified JSON of the tool's output
            }
            ```
    5.  **Final Model Response**: The model will use the tool's output to generate a final response to the user.

#### 4.3. Images & PDFs

  * **Endpoint**: `/api/v1/chat/completions`
  * **Sending Images**:
      * Within the `messages` array, for a `user` role message, `content` becomes an array of parts.
      * **Image URL**:
        ```json
        {
          "role": "user",
          "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
          ]
        }
        ```
      * **Base64 Encoded Image**:
        ```json
        {
          "role": "user",
          "content": [
            {"type": "text", "text": "Describe this local image:"},
            {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<BASE64_STRING>"}}
          ]
        }
        ```
        Supported image types: `image/png`, `image/jpeg`, `image/webp`.
      * Multiple images can be sent in separate `image_url` content parts.
      * Recommendation: Send text prompt before images, or place images in system prompt if they must come first.
  * **Sending PDFs**:
      * Works with **any** model on OpenRouter. If the model doesn't natively support files, OpenRouter parses it.
      * Within the `messages` array, for a `user` role message, use `content` array with type `file`.
      * **Base64 Encoded PDF**:
        ```json
        {
          "role": "user",
          "content": [
            {"type": "text", "text": "Summarize this PDF:"},
            {
              "type": "file",
              "file": {
                "filename": "mydoc.pdf",
                "file_data": "data:application/pdf;base64,<BASE64_STRING>"
              }
            }
          ]
        }
        ```
      * **PDF Processing Engine Configuration (Optional)**:
        Use the `plugins` parameter in the request body:
        ```json
        "plugins": [
          {
            "id": "file-parser",
            "pdf": {
              "engine": "mistral-ocr | pdf-text | native" // default: native then pdf-text
            }
          }
        ]
        ```
          * `mistral-ocr`: Best for scanned/image-based PDFs (cost: $MISTRAL\_OCR\_COST/1000 pages).
          * `pdf-text`: Best for text-based PDFs (Free).
          * `native`: Uses model's native file processing if available (charged as input tokens).
      * **Reusing PDF Annotations (Skip Reparsing Costs)**:
        1.  Initial request with PDF.
        2.  The assistant's response message may contain `annotations`.
        3.  In subsequent requests about the same PDF, include the original PDF data in the `user` message AND the assistant's previous message with its `annotations` in the conversation history. OpenRouter will use these to avoid reparsing.

#### 4.4. Message Transforms

  * **Purpose**: Optimize prompts that exceed a model's context window, especially when perfect recall isn't critical.
  * **Mechanism**: Use the `transforms` parameter in the `/chat/completions` request.
    ```json
    "transforms": ["middle-out"]
    ```
  * **`middle-out` Behavior**:
      * Removes or truncates messages from the *middle* of the prompt history until it fits. This is based on research showing LLMs pay less attention to the middle.
      * Also handles message count limits (e.g., Anthropic's {anthropicMaxMessagesCount} messages) by keeping half from the start and half from the end.
      * OpenRouter tries models with at least half the required tokens; if none, uses the model with the largest context.
  * **Default**: `middle-out` is default for models with \<= 8k context length. To disable, send `transforms: []`.

#### 4.5. Uptime Optimization & Provider Routing

  * **OpenRouter's Default Behavior**:
      * Monitors provider health (response times, error rates, availability).
      * Load balances requests across providers, prioritizing price and stability.
      * Automatically falls back to other providers on errors/rate limits.
  * **Customizing Provider Routing**: Use the `provider` object in the `/chat/completions` request.
    ```json
    "provider": {
      "order": ["ProviderName1", "ProviderName2"], // Try providers in this specific order.
      "allow_fallbacks": true, // (boolean, default: true) Whether to use other providers if ordered/primary ones fail.
      "require_parameters": false, // (boolean, default: false) If true, only route to providers supporting all request params.
      "data_collection": "allow", // ("allow" | "deny", default: "allow") Filter providers based on data logging/training policies.
      "only": ["ProviderName1"], // Only use providers from this list.
      "ignore": ["ProviderNameToSkip"], // Skip providers from this list.
      "quantizations": ["fp16", "int8"], // Filter by quantization levels.
      "sort": "price | throughput | latency", // Explicitly sort providers by this metric (disables default load balancing).
      "max_price": {"prompt": 0.5, "completion": 1.5} // Max price in USD per million tokens. Can also include "request" and "image".
    }
    ```
  * **Model Variants (Shortcuts for Routing)**: Append to model slug.
      * `:nitro` (e.g., `meta-llama/llama-3.1-70b-instruct:nitro`): Equivalent to `provider: {"sort": "throughput"}`.
      * `:floor` (e.g., `meta-llama/llama-3.1-70b-instruct:floor`): Equivalent to `provider: {"sort": "price"}`.
      * Other variants: `:free`, `:beta`, `:extended`, `:thinking`, `:online`.
  * **Model Fallbacks with `models` parameter**:
    ```json
    "model": "primary/model-id",
    "models": ["fallback/model1-id", "fallback/model2-id"]
    ```
    If `primary/model-id` fails, OpenRouter tries `fallback/model1-id`, then `fallback/model2-id`. The `model` field in the response will indicate which model was actually used.

#### 4.6. Reasoning Tokens (Thinking Tokens)

  * **Purpose**: Get insight into the model's step-by-step reasoning process. Charged as output tokens.
  * **Availability**: Returned in the `reasoning` field of the assistant's message if the model supports and outputs them. Some models use reasoning internally but don't return the tokens.
  * **Mechanism**: Use the `reasoning` parameter in the `/chat/completions` request.
    ```json
    "reasoning": {
      // Choose one of these:
      "effort": "low | medium | high", // For OpenAI o-series style control.
      "max_tokens": 2000, // For Anthropic/Gemini style control (specific token limit).

      // Optional:
      "exclude": false // (boolean, default: false) If true, use reasoning internally but don't return it in the response.
    }
    ```
      * If `effort` is used with models supporting `max_tokens`, it's converted (high \~80%, medium \~50%, low \~20% of `max_tokens` parameter if provided, else model's max completion tokens).
      * If `max_tokens` (in `reasoning`) is used with models supporting `effort`, it influences the effort level.
  * **Legacy Parameters**:
      * `include_reasoning: true` is equivalent to `reasoning: {}`.
      * `include_reasoning: false` is equivalent to `reasoning: { "exclude": true }`.
  * **Anthropic Specifics**:
      * Can use `:thinking` variant (e.g., `anthropic/claude-3.5-sonnet:thinking`), which defaults to high effort.
      * `reasoning.max_tokens` for Anthropic is capped at 32,000 and min 1024.
      * The main `max_tokens` request parameter must be greater than the reasoning budget.

#### 4.7. Usage Accounting

  * **Purpose**: Track token counts (prompt, completion, cached, reasoning) and cost directly in the API response without extra calls. Uses model's native tokenizer.
  * **Mechanism**: Use the `usage` parameter in the `/chat/completions` request.
    ```json
    "usage": {
      "include": true
    }
    ```
  * **Response**:
      * **Non-streaming**: The main response object will contain a `usage` field.
      * **Streaming**: The *last* SSE message (after `[DONE]` or as part of it) will contain the `usage` field. `choices` array in this chunk will be empty.
    <!-- end list -->
    ```json
    // Example usage object in response
    "usage": {
      "completion_tokens": 150,
      "completion_tokens_details": {"reasoning_tokens": 50},
      "cost": 250, // Example cost in smallest credit unit or a standardized format
      "prompt_tokens": 100,
      "prompt_tokens_details": {"cached_tokens": 20},
      "total_tokens": 250
    }
    ```
  * **Performance**: Adds a few hundred ms to the last response chunk for calculation.
  * **Alternative**: Use the `/api/v1/generation?id=<GENERATION_ID>` endpoint for more detailed and potentially more up-to-date usage info after completion.

### 5\. Other Important Information

#### 5.1. Rate Limits

  * **General**: Function of credits remaining on the API key/account. Typically 1 request per credit per second, up to a surge limit (e.g., 500 req/s).
  * **Free Models** (ID ending in `:free`):
      * {FREE\_MODEL\_RATE\_LIMIT\_RPM} requests per minute.
      * Daily limits:
          * Less than {FREE\_MODEL\_CREDITS\_THRESHOLD} credits purchased: {FREE\_MODEL\_NO\_CREDITS\_RPD} free requests/day.
          * At least {FREE\_MODEL\_CREDITS\_THRESHOLD} credits purchased: {FREE\_MODEL\_HAS\_CREDITS\_RPD} free requests/day.
  * **DDoS Protection**: Cloudflare may block excessive requests.
  * **Checking Limits**: `GET /api/v1/auth/key` (see section 3.5).

#### 5.2. Error Handling

  * **Response Structure**:
    ```json
    {
      "error": {
        "code": <HTTP_STATUS_CODE_INTEGER>,
        "message": "Error description.",
        "metadata": { /* Optional: provider_name, raw error, reasons for moderation, etc. */ }
      }
    }
    ```
  * **HTTP Status Codes**:
      * `400 Bad Request`: Invalid params, CORS.
      * `401 Unauthorized`: Invalid API key, expired OAuth.
      * `402 Payment Required`: Insufficient credits.
      * `403 Forbidden`: Input flagged by moderation. `metadata` will contain `reasons`, `flagged_input`.
      * `408 Request Timeout`.
      * `429 Too Many Requests`: Rate limited.
      * `502 Bad Gateway`: Model down or invalid response from provider. `metadata` may contain `provider_name`, `raw` error.
      * `503 Service Unavailable`: No provider meets routing requirements.
  * **In-stream Errors**: For streaming, if an error occurs during generation, an error object can be part of an SSE data event, while the HTTP status remains `200 OK`.

#### 5.3. Streaming

  * **Enable**: `"stream": true` in request body.
  * **Format**: Server-Sent Events (SSE).
      * Lines starting with `:` are comments (e.g., `: OPENROUTER PROCESSING`) and can be ignored.
      * Data events: `data: <JSON_CHUNK>\n\n`.
      * End of stream: `data: [DONE]\n\n`.
  * **Stream Cancellation**: Aborting the HTTP connection can cancel processing for supported providers, stopping billing. Provider support varies.

#### 5.4. Prompt Caching

  * **Purpose**: Save on inference costs by reusing responses for identical cached prompt segments.
  * **Inspection**: `cache_discount` in `/api/v1/generation` response, or `cached_tokens` in `usage` object if `usage: {include: true}`.
  * **OpenAI**: Automated for prompts \> 1024 tokens. No cost for writes; reads are \~0.5x-0.75x input price.
  * **Anthropic Claude**: Requires `cache_control` breakpoints in message content. Writes cost {ANTHROPIC\_CACHE\_WRITE\_MULTIPLIER}x input, reads {ANTHROPIC\_CACHE\_READ\_MULTIPLIER}x input. Cache expires in \~5 mins. Limit of 4 breakpoints.
    ```json
    "content": [
      {"type": "text", "text": "HUGE TEXT BODY", "cache_control": {"type": "ephemeral"}}
    ]
    ```
  * **DeepSeek**: Automated. Writes at input price, reads {DEEPSEEK\_CACHE\_READ\_MULTIPLIER}x input.
  * **Google Gemini**:
      * **Implicit Caching** (Gemini 2.5 Pro/Flash): Automatic. No write/storage cost. Reads {GOOGLE\_CACHE\_READ\_MULTIPLIER}x input. TTL \~3-5 mins. Min tokens: {GOOGLE\_CACHE\_MIN\_TOKENS\_2\_5\_FLASH} (Flash), {GOOGLE\_CACHE\_MIN\_TOKENS\_2\_5\_PRO} (Pro).
      * **Explicit Caching** (Other Gemini): Requires `cache_control` breakpoints (similar to Anthropic, but only last breakpoint used by OpenRouter for Gemini). Writes cost input + 5 mins storage. Reads {GOOGLE\_CACHE\_READ\_MULTIPLIER}x input. TTL 5 mins. Min \~4096 tokens for write.

#### 5.5. Web Search

  * **Purpose**: Augment prompts with real-time web search results for any model.
  * **Mechanism**:
    1.  **Model Variant**: Append `:online` to model slug (e.g., `openai/gpt-4o:online`).
    2.  **Plugin**:
        ```json
        "plugins": [{
          "id": "web",
          "max_results": 5, // (integer, optional, default: 5)
          "search_prompt": "Custom prompt for incorporating results" // (string, optional)
        }]
        ```
  * **Powered by**: Exa.ai (auto method: keyword + embeddings).
  * **Results Parsing**: Standardized in `annotations` field of assistant's message.
    ```json
    "annotations": [{
      "type": "url_citation",
      "url_citation": {
        "url": "https://example.com/result",
        "title": "Search Result Title",
        "content": "Snippet of the result", // Added by OpenRouter if available
        "start_index": 10, "end_index": 25 // Indices in assistant's content string
      }
    }]
    ```
  * **Plugin Pricing**: $4 per 1000 results from Exa (default 5 results = $0.02 per request + LLM usage).
  * **Native Web Search (Non-plugin)**: Some models (OpenAI, Perplexity) have built-in search.
      * Control with `web_search_options: {"search_context_size": "low|medium|high"}`.
      * Pricing varies by model and context size (e.g., OpenAI GPT-4o high: $50/1000 requests; Perplexity Sonar high: $12/1000 requests).

</docs>