Glenpl

openrouter api description for llm

May 13th, 2025
21
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 26.44 KB | None | 0 0
  1. <docs>
  2. ## OpenRouter API Documentation for AI Consumption
  3.  
  4. This document outlines how to interact with the OpenRouter API via HTTP, providing access to a wide array of AI models through a unified interface.
  5.  
  6. ### 1\. Core Concepts
  7.  
  8. * **Unified API**: OpenRouter provides a single API endpoint (`https://openrouter.ai/api/v1`) to access numerous AI models from various providers. It handles model fallbacks and can optimize for cost or performance.
  9. * **HTTP Interaction**: The primary way to use the API directly is via HTTP POST requests to specific endpoints.
  10. * **OpenAI Compatibility**: The API schema is very similar to OpenAI's, making it a drop-in replacement in many cases. OpenRouter normalizes schemas across different providers.
  11.  
  12. ### 2\. Authentication
  13.  
  14. * **Method**: Bearer Token.
  15. * **Header**: `Authorization: Bearer <OPENROUTER_API_KEY>`
  16. * Replace `<OPENROUTER_API_KEY>` with your actual OpenRouter API key.
  17. * **Optional Headers for Leaderboard/Analytics**:
  18. * `HTTP-Referer: <YOUR_SITE_URL>`: Your application's URL for ranking on OpenRouter.ai.
  19. * `X-Title: <YOUR_SITE_NAME>`: Your application's name for ranking on OpenRouter.ai.
  20.  
  21. ### 3\. Key API Endpoints
  22.  
  23. #### 3.1. Chat Completions
  24.  
  25. * **Endpoint**: `https://openrouter.ai/api/v1/chat/completions`
  26. * **Method**: `POST`
  27. * **Content-Type**: `application/json`
  28. * **Request Body (JSON)**:
  29. * `model` (string, optional): The ID of the model to use (e.g., `openai/gpt-4o`, `anthropic/claude-3.5-sonnet`). If unspecified, uses the user's default model. Find model IDs via the `/models` endpoint or the [Models Page](https://openrouter.ai/models).
  30. * `messages` (array of objects, required): A list of message objects describing the conversation history.
  31. * Each message object:
  32. * `role` (string): `system`, `user`, `assistant`, or `tool`.
  33. * `content` (string or array of content parts for `user` role): The message content.
  34. * For multimodal input (images), the `user` role's `content` can be an array:
  35. * `{"type": "text", "text": "Your text prompt"}`
  36. * `{"type": "image_url", "image_url": {"url": "URL_OR_BASE64_DATA_URL", "detail": "auto|low|high"}}` (See section 4.3)
  37. * `{"type": "file", "file": {"filename": "document.pdf", "file_data": "BASE64_DATA_URL"}}` (See section 4.3)
  38. * `name` (string, optional): Identifier for the message sender, especially for `tool` role or distinguishing users/assistants.
  39. * `tool_call_id` (string, required for `role: "tool"`): The ID of the tool call this message is a result for.
  40. * `tool_calls` (array of objects, for `role: "assistant"` when tools are called): The tool calls generated by the model. (See section 4.2)
  41. * `stream` (boolean, optional, default: `false`): If `true`, enables streaming responses via Server-Sent Events (SSE).
  42. * `temperature` (float, optional, default: 1.0, range: 0.0-2.0): Controls randomness. Lower is more deterministic.
  43. * `max_tokens` (integer, optional): Maximum number of tokens to generate.
  44. * `top_p` (float, optional, default: 1.0, range: 0.0-1.0): Nucleus sampling.
  45. * `top_k` (integer, optional, default: 0): Top-K sampling.
  46. * `frequency_penalty` (float, optional, default: 0.0, range: -2.0-2.0): Penalizes new tokens based on their existing frequency.
  47. * `presence_penalty` (float, optional, default: 0.0, range: -2.0-2.0): Penalizes new tokens based on whether they appear in the text so far.
  48. * `repetition_penalty` (float, optional, default: 1.0, range: 0.0-2.0): Penalizes repetition.
  49. * `stop` (string or array of strings, optional): Sequences where the API will stop generating further tokens.
  50. * `seed` (integer, optional): For deterministic sampling (if supported by the model).
  51. * `response_format` (object, optional): E.g., `{"type": "json_object"}` to enable JSON mode. (See section 4.1)
  52. * `tools` (array of objects, optional): Definitions of tools the model can call. (See section 4.2)
  53. * `tool_choice` (string or object, optional): Controls how the model uses tools (e.g., `auto`, `none`, specific function). (See section 4.2)
  54. * `transforms` (array of strings, optional): E.g., `["middle-out"]`. (See section 4.4)
  55. * `models` (array of strings, optional): List of fallback model IDs. (See section 4.5)
  56. * `provider` (object, optional): Preferences for provider routing. (See section 4.5)
  57. * `reasoning` (object, optional): Configuration for reasoning tokens. (See section 4.6)
  58. * `usage` (object, optional): E.g., `{"include": true}`. (See section 4.7)
  59. * `plugins` (array of objects, optional): For features like PDF parsing or web search.
  60. * E.g., `[{"id": "file-parser", "pdf": {"engine": "mistral-ocr"}}]`
  61. * E.g., `[{"id": "web", "max_results": 3, "search_prompt": "Relevant web results:"}]`
  62. * **Response**:
  63. * Standard: JSON object with `id`, `choices`, `created`, `model`, `object`, `usage` (if not streaming or `usage: {include: true}` is set).
  64. * Streaming: A stream of Server-Sent Events. Each event is `data: <JSON_CHUNK>\n\n`. The stream ends with `data: [DONE]\n\n`. The last chunk may contain the `usage` object if `usage: {include: true}` was set.
  65. * `choices` (array): Contains message object(s).
  66. * `message`:
  67. * `role`: `assistant`
  68. * `content`: Generated text.
  69. * `tool_calls` (array, optional): If the model decides to call tools.
  70. * `reasoning` (string, optional): Reasoning steps if requested and provided by the model.
  71. * `delta` (for streaming): Contains the partial message update.
  72. * `finish_reason` (string): `stop`, `length`, `tool_calls`, `content_filter`, `error`.
  73. * `native_finish_reason` (string): Raw finish reason from the provider.
  74.  
  75. #### 3.2. Completions (Legacy)
  76.  
  77. * **Endpoint**: `https://openrouter.ai/api/v1/completions`
  78. * **Method**: `POST`
  79. * **Content-Type**: `application/json`
  80. * **Note**: This is for text-only (prompt-based) completions. Prefer `/chat/completions` for newer models and chat-based interactions.
  81. * **Request Body**: Similar to `/chat/completions` but uses `prompt` (string) instead of `messages`.
  82.  
  83. #### 3.3. List Available Models
  84.  
  85. * **Endpoint**: `https://openrouter.ai/api/v1/models`
  86. * **Method**: `GET`
  87. * **Response**: JSON array of model objects. Each object includes:
  88. * `id` (string): Model identifier (e.g., `openai/gpt-4o`).
  89. * `name` (string): Human-readable name.
  90. * `context_length` (integer): Maximum context window size.
  91. * `pricing` (object): Price per token for prompt and completion, per image, per request.
  92. * `prompt` (string): Price per prompt token (USD).
  93. * `completion` (string): Price per completion token (USD).
  94. * `image` (string): Price per image (USD).
  95. * `request` (string): Price per request (USD).
  96. * `quantization` (string): E.g., `fp16`, `int8`.
  97. * `max_completion_tokens` (integer, optional).
  98. * `supported_parameters` (array of strings, optional): Lists parameters like `tools`, `structured_outputs` if supported.
  99.  
  100. #### 3.4. Get Generation Details (Cost/Stats)
  101.  
  102. * **Endpoint**: `https://openrouter.ai/api/v1/generation`
  103. * **Method**: `GET`
  104. * **Query Parameter**: `id=<GENERATION_ID>` (The `id` returned from a completion request).
  105. * **Authentication**: Requires `Authorization: Bearer <OPENROUTER_API_KEY>`.
  106. * **Response**: JSON object with detailed metadata, including native token counts and cost for the specified generation. This is more precise for billing than the `usage` object in completion responses.
  107.  
  108. #### 3.5. Check API Key Info (Rate Limits/Credits)
  109.  
  110. * **Endpoint**: `https://openrouter.ai/api/v1/auth/key` (also documented as `/api/v1/key`)
  111. * **Method**: `GET`
  112. * **Authentication**: Requires `Authorization: Bearer <OPENROUTER_API_KEY>`.
  113. * **Response**: JSON object with:
  114. * `limit` (float or null): Credit limit for the key.
  115. * `usage` (float): Credits used.
  116. * `rate_limit` (object): `requests` (integer) and `interval` (string, e.g., "10s").
  117.  
  118. #### 3.6. Manage API Keys (Provisioning)
  119.  
  120. * **Base Endpoint**: `https://openrouter.ai/api/v1/keys`
  121. * **Authentication**: Requires a special *Provisioning API key* as Bearer Token.
  122. * **Endpoints**:
  123. * `GET /keys`: List API keys (paginated with `offset`).
  124. * `POST /keys`: Create a new API key.
  125. * Body: `{"name": "KeyName", "label": "optional-label", "limit": 1000.00 (optional credit limit)}`
  126. * `GET /keys/{hash}`: Get details of a specific key.
  127. * `PATCH /keys/{hash}`: Update a key (e.g., name, disabled status, limit).
  128. * `DELETE /keys/{hash}`: Delete a key.
  129.  
  130. ### 4\. Key Features and Topics via HTTP
  131.  
  132. #### 4.1. Structured Outputs
  133.  
  134. * **Purpose**: Enforce JSON Schema validation on model responses.
  135. * **Mechanism**: Use the `response_format` parameter in the `/chat/completions` request.
  136. ```json
  137. "response_format": {
  138. "type": "json_schema",
  139. "json_schema": {
  140. "name": "your_function_or_schema_name",
  141. "strict": true, // Recommended for exact schema adherence
  142. "schema": { /* Your JSON Schema Definition */
  143. "type": "object",
  144. "properties": {
  145. "propertyName": {"type": "string", "description": "Details for AI"}
  146. },
  147. "required": ["propertyName"]
  148. }
  149. }
  150. }
  151. ```
  152. * **Model Support**: Check model details on the [Models Page](https://openrouter.ai/models) (filter by `supported_parameters: structured_outputs`) or use `provider: {"require_parameters": true}`. Supported by newer OpenAI models (GPT-4o+) and Fireworks-provided models.
  153. * **Best Practices**:
  154. * Include descriptive `description` fields in your schema properties to guide the model.
  155. * Set `strict: true` for precise schema adherence.
  156. * **Streaming**: Supported. Partial JSON will be streamed.
  157. * **Note**: Also supported is `"response_format": {"type": "json_object"}` for general JSON output without a specific schema.
  158.  
  159. #### 4.2. Tool Calling (Function Calling)
  160.  
  161. * **Purpose**: Allow LLMs to request execution of external tools/functions. The LLM suggests the call; your application executes it and returns results.
  162. * **Mechanism (Request)**:
  163. * `tools` (array): Define available tools.
  164. ```json
  165. "tools": [
  166. {
  167. "type": "function",
  168. "function": {
  169. "name": "your_function_name",
  170. "description": "What this function does.",
  171. "parameters": { // JSON Schema for function arguments
  172. "type": "object",
  173. "properties": {
  174. "param1": {"type": "string", "description": "Description of param1"}
  175. },
  176. "required": ["param1"]
  177. }
  178. }
  179. }
  180. ]
  181. ```
  182. * `tool_choice` (string or object, optional):
  183. * `"none"`: Model will not call a tool.
  184. * `"auto"`: Model decides whether to call a tool.
  185. * `{"type": "function", "function": {"name": "your_function_name"}}`: Force calling a specific tool.
  186. * `"required"`: (Supported by some models like newer OpenAI) Model must call one or more tools.
  187. * **Mechanism (Handling Response & Follow-up)**:
  188. 1. **Initial Request**: Send messages + `tools` (+ `tool_choice`).
  189. 2. **Model Response (Assistant Turn)**: If the model decides to call a tool, `message.tool_calls` will be populated:
  190. ```json
  191. "tool_calls": [
  192. {
  193. "id": "call_abc123", // Unique ID for this call
  194. "type": "function",
  195. "function": {
  196. "name": "your_function_name",
  197. "arguments": "{\"param1\": \"value1\"}" // Stringified JSON
  198. }
  199. }
  200. ]
  201. ```
  202. The `finish_reason` will be `tool_calls`.
  203. 3. **Your Application**:
  204. * Parse `tool_calls[i].function.arguments`.
  205. * Execute the function `tool_calls[i].function.name` with these arguments.
  206. * Get the result from your function.
  207. 4. **Follow-up Request**: Send a new request to `/chat/completions` including:
  208. * Original messages.
  209. * The assistant's message that contained the `tool_calls`.
  210. * A new message with `role: "tool"` for each tool call result:
  211. ```json
  212. {
  213. "role": "tool",
  214. "tool_call_id": "call_abc123", // from tool_calls[i].id
  215. "name": "your_function_name", // from tool_calls[i].function.name
  216. "content": "{\"result\": \"your_function_output\"}" // Stringified JSON of the tool's output
  217. }
  218. ```
  219. 5. **Final Model Response**: The model will use the tool's output to generate a final response to the user.
  220.  
  221. #### 4.3. Images & PDFs
  222.  
  223. * **Endpoint**: `/api/v1/chat/completions`
  224. * **Sending Images**:
  225. * Within the `messages` array, for a `user` role message, `content` becomes an array of parts.
  226. * **Image URL**:
  227. ```json
  228. {
  229. "role": "user",
  230. "content": [
  231. {"type": "text", "text": "What's in this image?"},
  232. {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
  233. ]
  234. }
  235. ```
  236. * **Base64 Encoded Image**:
  237. ```json
  238. {
  239. "role": "user",
  240. "content": [
  241. {"type": "text", "text": "Describe this local image:"},
  242. {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<BASE64_STRING>"}}
  243. ]
  244. }
  245. ```
  246. Supported image types: `image/png`, `image/jpeg`, `image/webp`.
  247. * Multiple images can be sent in separate `image_url` content parts.
  248. * Recommendation: Send text prompt before images, or place images in system prompt if they must come first.
  249. * **Sending PDFs**:
  250. * Works with **any** model on OpenRouter. If the model doesn't natively support files, OpenRouter parses it.
  251. * Within the `messages` array, for a `user` role message, use `content` array with type `file`.
  252. * **Base64 Encoded PDF**:
  253. ```json
  254. {
  255. "role": "user",
  256. "content": [
  257. {"type": "text", "text": "Summarize this PDF:"},
  258. {
  259. "type": "file",
  260. "file": {
  261. "filename": "mydoc.pdf",
  262. "file_data": "data:application/pdf;base64,<BASE64_STRING>"
  263. }
  264. }
  265. ]
  266. }
  267. ```
  268. * **PDF Processing Engine Configuration (Optional)**:
  269. Use the `plugins` parameter in the request body:
  270. ```json
  271. "plugins": [
  272. {
  273. "id": "file-parser",
  274. "pdf": {
  275. "engine": "mistral-ocr | pdf-text | native" // default: native then pdf-text
  276. }
  277. }
  278. ]
  279. ```
  280. * `mistral-ocr`: Best for scanned/image-based PDFs (cost: $MISTRAL\_OCR\_COST/1000 pages).
  281. * `pdf-text`: Best for text-based PDFs (Free).
  282. * `native`: Uses model's native file processing if available (charged as input tokens).
  283. * **Reusing PDF Annotations (Skip Reparsing Costs)**:
  284. 1. Initial request with PDF.
  285. 2. The assistant's response message may contain `annotations`.
  286. 3. In subsequent requests about the same PDF, include the original PDF data in the `user` message AND the assistant's previous message with its `annotations` in the conversation history. OpenRouter will use these to avoid reparsing.
  287.  
  288. #### 4.4. Message Transforms
  289.  
  290. * **Purpose**: Optimize prompts that exceed a model's context window, especially when perfect recall isn't critical.
  291. * **Mechanism**: Use the `transforms` parameter in the `/chat/completions` request.
  292. ```json
  293. "transforms": ["middle-out"]
  294. ```
  295. * **`middle-out` Behavior**:
  296. * Removes or truncates messages from the *middle* of the prompt history until it fits. This is based on research showing LLMs pay less attention to the middle.
  297. * Also handles message count limits (e.g., Anthropic's {anthropicMaxMessagesCount} messages) by keeping half from the start and half from the end.
  298. * OpenRouter tries models with at least half the required tokens; if none, uses the model with the largest context.
  299. * **Default**: `middle-out` is default for models with \<= 8k context length. To disable, send `transforms: []`.
  300.  
  301. #### 4.5. Uptime Optimization & Provider Routing
  302.  
  303. * **OpenRouter's Default Behavior**:
  304. * Monitors provider health (response times, error rates, availability).
  305. * Load balances requests across providers, prioritizing price and stability.
  306. * Automatically falls back to other providers on errors/rate limits.
  307. * **Customizing Provider Routing**: Use the `provider` object in the `/chat/completions` request.
  308. ```json
  309. "provider": {
  310. "order": ["ProviderName1", "ProviderName2"], // Try providers in this specific order.
  311. "allow_fallbacks": true, // (boolean, default: true) Whether to use other providers if ordered/primary ones fail.
  312. "require_parameters": false, // (boolean, default: false) If true, only route to providers supporting all request params.
  313. "data_collection": "allow", // ("allow" | "deny", default: "allow") Filter providers based on data logging/training policies.
  314. "only": ["ProviderName1"], // Only use providers from this list.
  315. "ignore": ["ProviderNameToSkip"], // Skip providers from this list.
  316. "quantizations": ["fp16", "int8"], // Filter by quantization levels.
  317. "sort": "price | throughput | latency", // Explicitly sort providers by this metric (disables default load balancing).
  318. "max_price": {"prompt": 0.5, "completion": 1.5} // Max price in USD per million tokens. Can also include "request" and "image".
  319. }
  320. ```
  321. * **Model Variants (Shortcuts for Routing)**: Append to model slug.
  322. * `:nitro` (e.g., `meta-llama/llama-3.1-70b-instruct:nitro`): Equivalent to `provider: {"sort": "throughput"}`.
  323. * `:floor` (e.g., `meta-llama/llama-3.1-70b-instruct:floor`): Equivalent to `provider: {"sort": "price"}`.
  324. * Other variants: `:free`, `:beta`, `:extended`, `:thinking`, `:online`.
  325. * **Model Fallbacks with `models` parameter**:
  326. ```json
  327. "model": "primary/model-id",
  328. "models": ["fallback/model1-id", "fallback/model2-id"]
  329. ```
  330. If `primary/model-id` fails, OpenRouter tries `fallback/model1-id`, then `fallback/model2-id`. The `model` field in the response will indicate which model was actually used.
  331.  
  332. #### 4.6. Reasoning Tokens (Thinking Tokens)
  333.  
  334. * **Purpose**: Get insight into the model's step-by-step reasoning process. Charged as output tokens.
  335. * **Availability**: Returned in the `reasoning` field of the assistant's message if the model supports and outputs them. Some models use reasoning internally but don't return the tokens.
  336. * **Mechanism**: Use the `reasoning` parameter in the `/chat/completions` request.
  337. ```json
  338. "reasoning": {
  339. // Choose one of these:
  340. "effort": "low | medium | high", // For OpenAI o-series style control.
  341. "max_tokens": 2000, // For Anthropic/Gemini style control (specific token limit).
  342.  
  343. // Optional:
  344. "exclude": false // (boolean, default: false) If true, use reasoning internally but don't return it in the response.
  345. }
  346. ```
  347. * If `effort` is used with models supporting `max_tokens`, it's converted (high \~80%, medium \~50%, low \~20% of `max_tokens` parameter if provided, else model's max completion tokens).
  348. * If `max_tokens` (in `reasoning`) is used with models supporting `effort`, it influences the effort level.
  349. * **Legacy Parameters**:
  350. * `include_reasoning: true` is equivalent to `reasoning: {}`.
  351. * `include_reasoning: false` is equivalent to `reasoning: { "exclude": true }`.
  352. * **Anthropic Specifics**:
  353. * Can use `:thinking` variant (e.g., `anthropic/claude-3.5-sonnet:thinking`), which defaults to high effort.
  354. * `reasoning.max_tokens` for Anthropic is capped at 32,000 and min 1024.
  355. * The main `max_tokens` request parameter must be greater than the reasoning budget.
  356.  
  357. #### 4.7. Usage Accounting
  358.  
  359. * **Purpose**: Track token counts (prompt, completion, cached, reasoning) and cost directly in the API response without extra calls. Uses model's native tokenizer.
  360. * **Mechanism**: Use the `usage` parameter in the `/chat/completions` request.
  361. ```json
  362. "usage": {
  363. "include": true
  364. }
  365. ```
  366. * **Response**:
  367. * **Non-streaming**: The main response object will contain a `usage` field.
  368. * **Streaming**: The *last* SSE message (after `[DONE]` or as part of it) will contain the `usage` field. `choices` array in this chunk will be empty.
  369. <!-- end list -->
  370. ```json
  371. // Example usage object in response
  372. "usage": {
  373. "completion_tokens": 150,
  374. "completion_tokens_details": {"reasoning_tokens": 50},
  375. "cost": 250, // Example cost in smallest credit unit or a standardized format
  376. "prompt_tokens": 100,
  377. "prompt_tokens_details": {"cached_tokens": 20},
  378. "total_tokens": 250
  379. }
  380. ```
  381. * **Performance**: Adds a few hundred ms to the last response chunk for calculation.
  382. * **Alternative**: Use the `/api/v1/generation?id=<GENERATION_ID>` endpoint for more detailed and potentially more up-to-date usage info after completion.
  383.  
  384. ### 5\. Other Important Information
  385.  
  386. #### 5.1. Rate Limits
  387.  
  388. * **General**: Function of credits remaining on the API key/account. Typically 1 request per credit per second, up to a surge limit (e.g., 500 req/s).
  389. * **Free Models** (ID ending in `:free`):
  390. * {FREE\_MODEL\_RATE\_LIMIT\_RPM} requests per minute.
  391. * Daily limits:
  392. * Less than {FREE\_MODEL\_CREDITS\_THRESHOLD} credits purchased: {FREE\_MODEL\_NO\_CREDITS\_RPD} free requests/day.
  393. * At least {FREE\_MODEL\_CREDITS\_THRESHOLD} credits purchased: {FREE\_MODEL\_HAS\_CREDITS\_RPD} free requests/day.
  394. * **DDoS Protection**: Cloudflare may block excessive requests.
  395. * **Checking Limits**: `GET /api/v1/auth/key` (see section 3.5).
  396.  
  397. #### 5.2. Error Handling
  398.  
  399. * **Response Structure**:
  400. ```json
  401. {
  402. "error": {
  403. "code": <HTTP_STATUS_CODE_INTEGER>,
  404. "message": "Error description.",
  405. "metadata": { /* Optional: provider_name, raw error, reasons for moderation, etc. */ }
  406. }
  407. }
  408. ```
  409. * **HTTP Status Codes**:
  410. * `400 Bad Request`: Invalid params, CORS.
  411. * `401 Unauthorized`: Invalid API key, expired OAuth.
  412. * `402 Payment Required`: Insufficient credits.
  413. * `403 Forbidden`: Input flagged by moderation. `metadata` will contain `reasons`, `flagged_input`.
  414. * `408 Request Timeout`.
  415. * `429 Too Many Requests`: Rate limited.
  416. * `502 Bad Gateway`: Model down or invalid response from provider. `metadata` may contain `provider_name`, `raw` error.
  417. * `503 Service Unavailable`: No provider meets routing requirements.
  418. * **In-stream Errors**: For streaming, if an error occurs during generation, an error object can be part of an SSE data event, while the HTTP status remains `200 OK`.
  419.  
  420. #### 5.3. Streaming
  421.  
  422. * **Enable**: `"stream": true` in request body.
  423. * **Format**: Server-Sent Events (SSE).
  424. * Lines starting with `:` are comments (e.g., `: OPENROUTER PROCESSING`) and can be ignored.
  425. * Data events: `data: <JSON_CHUNK>\n\n`.
  426. * End of stream: `data: [DONE]\n\n`.
  427. * **Stream Cancellation**: Aborting the HTTP connection can cancel processing for supported providers, stopping billing. Provider support varies.
  428.  
  429. #### 5.4. Prompt Caching
  430.  
  431. * **Purpose**: Save on inference costs by reusing responses for identical cached prompt segments.
  432. * **Inspection**: `cache_discount` in `/api/v1/generation` response, or `cached_tokens` in `usage` object if `usage: {include: true}`.
  433. * **OpenAI**: Automated for prompts \> 1024 tokens. No cost for writes; reads are \~0.5x-0.75x input price.
  434. * **Anthropic Claude**: Requires `cache_control` breakpoints in message content. Writes cost {ANTHROPIC\_CACHE\_WRITE\_MULTIPLIER}x input, reads {ANTHROPIC\_CACHE\_READ\_MULTIPLIER}x input. Cache expires in \~5 mins. Limit of 4 breakpoints.
  435. ```json
  436. "content": [
  437. {"type": "text", "text": "HUGE TEXT BODY", "cache_control": {"type": "ephemeral"}}
  438. ]
  439. ```
  440. * **DeepSeek**: Automated. Writes at input price, reads {DEEPSEEK\_CACHE\_READ\_MULTIPLIER}x input.
  441. * **Google Gemini**:
  442. * **Implicit Caching** (Gemini 2.5 Pro/Flash): Automatic. No write/storage cost. Reads {GOOGLE\_CACHE\_READ\_MULTIPLIER}x input. TTL \~3-5 mins. Min tokens: {GOOGLE\_CACHE\_MIN\_TOKENS\_2\_5\_FLASH} (Flash), {GOOGLE\_CACHE\_MIN\_TOKENS\_2\_5\_PRO} (Pro).
  443. * **Explicit Caching** (Other Gemini): Requires `cache_control` breakpoints (similar to Anthropic, but only last breakpoint used by OpenRouter for Gemini). Writes cost input + 5 mins storage. Reads {GOOGLE\_CACHE\_READ\_MULTIPLIER}x input. TTL 5 mins. Min \~4096 tokens for write.
  444.  
  445. #### 5.5. Web Search
  446.  
  447. * **Purpose**: Augment prompts with real-time web search results for any model.
  448. * **Mechanism**:
  449. 1. **Model Variant**: Append `:online` to model slug (e.g., `openai/gpt-4o:online`).
  450. 2. **Plugin**:
  451. ```json
  452. "plugins": [{
  453. "id": "web",
  454. "max_results": 5, // (integer, optional, default: 5)
  455. "search_prompt": "Custom prompt for incorporating results" // (string, optional)
  456. }]
  457. ```
  458. * **Powered by**: Exa.ai (auto method: keyword + embeddings).
  459. * **Results Parsing**: Standardized in `annotations` field of assistant's message.
  460. ```json
  461. "annotations": [{
  462. "type": "url_citation",
  463. "url_citation": {
  464. "url": "https://example.com/result",
  465. "title": "Search Result Title",
  466. "content": "Snippet of the result", // Added by OpenRouter if available
  467. "start_index": 10, "end_index": 25 // Indices in assistant's content string
  468. }
  469. }]
  470. ```
  471. * **Plugin Pricing**: $4 per 1000 results from Exa (default 5 results = $0.02 per request + LLM usage).
  472. * **Native Web Search (Non-plugin)**: Some models (OpenAI, Perplexity) have built-in search.
  473. * Control with `web_search_options: {"search_context_size": "low|medium|high"}`.
  474. * Pricing varies by model and context size (e.g., OpenAI GPT-4o high: $50/1000 requests; Perplexity Sonar high: $12/1000 requests).
  475.  
  476. </docs>
Add Comment
Please, Sign In to add comment