LLM for Agents API
Unified LLM inference platform for AI agents. OpenAI-compatible. Pay-per-token.
https://api.llm4agents.comGetting Started
-
Register your agent
Create an agent account and receive your API key. Store it securely -- it is shown only once.
curl -X POST https://api.llm4agents.com/api/v1/agents/register \ -H "Content-Type: application/json" \ -d '{"name": "my-agent"}'Response:
{ "uuid": "a1b2c3d4-...", "apiKey": "sk-proxy-abc123...", "name": "my-agent", "createdAt": "2026-04-14T12:00:00.000Z", "requestId": "req_..." } -
Add funds
Generate a deposit wallet, then send USDT or USDC on Solana or Polygon. The wallet is a top-up address used exclusively to fund your LLM4Agents balance — every confirmed deposit credits the same balance that pays for chat completions, scraper, search, and image tools. Funds are credited automatically after on-chain verification. Gasless-transfer relayer fees are paid by your own EOA in the token being transferred — they do not draw from the LLM4Agents balance.
curl -X POST https://api.llm4agents.com/api/v1/wallets/generate \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"chain": "solana", "token": "USDC"}' -
Start using the API
Point any OpenAI-compatible SDK at the proxy and start making requests.
from openai import OpenAI client = OpenAI( base_url="https://api.llm4agents.com/v1", api_key="your-api-key" ) response = client.chat.completions.create( model="anthropic/claude-3-haiku", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content)import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://api.llm4agents.com/v1', apiKey: 'your-api-key', }); const response = await client.chat.completions.create({ model: 'anthropic/claude-3-haiku', messages: [{ role: 'user', content: 'Hello!' }], }); console.log(response.choices[0].message.content);curl https://api.llm4agents.com/v1/chat/completions \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-3-haiku", "messages": [{"role": "user", "content": "Hello!"}] }'
Authentication
All authenticated endpoints require an Authorization header with a Bearer token:
Authorization: Bearer sk-proxy-abc123...
TypeScript SDK
@llm4agents/sdk is the official TypeScript SDK for the platform. A single
LLM4AgentsClient facade exposes chat completions, wallet management,
gasless stablecoin transfers, and MCP-powered tools (scraper, search, image) — all
authenticated with the same sk-proxy-... bearer key.
- Repo: github.com/llmforagents/sdk
- npm:
@llm4agents/sdk - Runtimes: Node 18+, browser, Cloudflare Workers, Deno, Bun
- Dependencies: zero runtime deps.
ethers ^6is an optional peer dep, only required forclient.transfer.
mcp.llm4agents.com. No auto-retry, no hidden state — every method maps to a
documented endpoint or MCP tool.
Install
npm install @llm4agents/sdk
# Optional — only required for client.transfer (gasless transfers)
npm install ethers
pnpm add @llm4agents/sdk
pnpm add ethers # optional, gasless transfers only
yarn add @llm4agents/sdk
yarn add ethers # optional, gasless transfers only
bun add @llm4agents/sdk
bun add ethers # optional, gasless transfers only
Initialize
import { LLM4AgentsClient } from '@llm4agents/sdk';
const client = new LLM4AgentsClient({
apiKey: process.env.LLM4AGENTS_API_KEY!,
// Optional overrides:
baseUrl: 'https://api.llm4agents.com',
mcpUrl: 'https://mcp.llm4agents.com/mcp',
timeout: 30_000,
});
Chat — completions, streaming, tool-loop conversations
Thin wrapper over POST /v1/chat/completions, plus a higher-level
conversation() helper that maintains history and auto-executes MCP tool calls.
// Single completion
const res = await client.chat.completions.create({
model: 'anthropic/claude-sonnet-4',
messages: [{ role: 'user', content: 'Hello' }],
});
// Streaming
const stream = await client.chat.completions.create({
model: 'anthropic/claude-sonnet-4',
messages: [{ role: 'user', content: 'Count to 10' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
// Conversation with auto tool-execution loop
const conv = client.chat.conversation({
model: 'anthropic/claude-sonnet-4',
system: 'You are a research assistant',
tools: client.tools,
onToolCall: (name, args) => { console.log(`→ ${name}`); return true; },
onToolResult: (name, result) => { console.log(`✓ ${name} (${result.length} chars)`); },
maxToolRounds: 5,
});
const answer = await conv.say('Search for Bitcoin news and summarize the top 3');
console.log(answer.content);
console.log(answer.toolCalls); // ToolCallRecord[]
// Streaming conversation — typed StreamEvent
const events = await conv.stream('Now find the current price');
for await (const ev of events) {
switch (ev.type) {
case 'text': process.stdout.write(ev.content); break;
case 'tool_start': console.log(`\n[tool] ${ev.name}`); break;
case 'tool_end': console.log(`[done] ${ev.durationMs}ms`); break;
case 'done': console.log('\n', ev.response.usage); break;
}
}
// History is a JSON-serializable array — persist anywhere
const saved = conv.messages; // readonly ChatMessage[]
const rehydrated = client.chat.conversation({ model: '...', history: saved });
Wallets — generate, balance, transactions
Generated wallets are deposit-only top-up addresses for your LLM4Agents balance. Anything sent to them credits the single balance used to pay for chat completions, scraper, search, and image tools. Gasless-transfer relayer fees are paid separately by your own EOA in the token being transferred — they do not draw from the LLM4Agents balance.
const wallet = await client.wallets.generate({
chain: 'polygon',
token: 'USDC',
});
console.log(wallet.address);
const balance = await client.wallets.balance();
console.log(balance.availableUsd);
console.log(balance.wallets); // per-chain/token breakdown
const page = await client.wallets.transactions({ limit: 20, type: 'deposit' });
for (const tx of page.transactions) {
console.log(`${tx.type}: $${tx.amountUsd} — ${tx.description}`);
}
Gasless transfers — non-custodial stablecoin sends
The SDK signs locally with the user's private key (via the ethers peer dep)
and submits to POST /v1/tx/send. The platform never touches the key. See
Gasless TX for the full flow and supported chains/tokens.
// One-call: quote + sign + submit
const result = await client.transfer.send({
chain: 'polygon',
token: 'USDC',
to: '0xRecipient...',
amount: '10.50',
privateKey: process.env.EOA_PRIVATE_KEY!,
});
console.log(result.txHash, result.explorerUrl);
// Two-step: inspect the operator fee before signing
const quote = await client.transfer.quote({
chain: 'polygon', token: 'USDC',
from: '0xSender...', to: '0xRecipient...', amount: '10.50',
});
console.log(`Fee: ${quote.feeFormatted}`);
const submitted = await client.transfer.submit(quote, process.env.EOA_PRIVATE_KEY!);
client.transfer requires the ethers peer dep. If it isn't
installed, the SDK throws a clear error on first use:
"ethers is required for gasless transfers. Install it: npm install ethers".
MCP tools — scraper, search, image
Typed wrappers over the MCP Streamable HTTP endpoint at
mcp.llm4agents.com/mcp. Pricing and full parameter docs live under
Scraper / MCP, Search Tools, and
Image Tools.
// Scraper (headless browser)
const html = await client.tools.scraper.fetchHtml({ url: 'https://example.com' });
const md = await client.tools.scraper.markdown({ url: 'https://example.com' });
const shot = await client.tools.scraper.screenshot({ url: 'https://example.com', fullPage: true });
// Search (Google via Serper)
const results = await client.tools.search.google({ q: 'TypeScript SDK design' });
const news = await client.tools.search.googleNews({ q: 'Bitcoin', tbs: 'qdr:d' });
const places = await client.tools.search.googleMaps({ q: 'coffee near me' });
// Image (generate, edit, analyze)
const img = await client.tools.image.generate({ prompt: 'A robot writing code' });
const edited = await client.tools.image.edit({ prompt: 'Make it blue', imageUrl: '...' });
const analysis = await client.tools.image.analyze({ prompt: 'What is this?', imageUrl: '...' });
// Tool definitions for use with custom tool loops or other LLMs
const defs = client.tools.definitions; // ToolDefinition[]
Models
const models = await client.models.list();
for (const m of models) {
console.log(`${m.slug} — $${m.inputPricePer1m}/1M in, $${m.outputPricePer1m}/1M out`);
}
Error handling
Every failure surfaces as a single LLM4AgentsError with a stable
code, the upstream statusCode (when applicable), and the
requestId from the x-request-id response header.
import { LLM4AgentsError } from '@llm4agents/sdk';
try {
await client.chat.completions.create({ model: '...', messages: [...] });
} catch (err) {
if (err instanceof LLM4AgentsError) {
switch (err.code) {
case 'insufficient_balance': /* top up */ break;
case 'rate_limited': /* back off */ break;
case 'model_not_found': /* pick fallback */ break;
case 'gas_spike': /* re-quote and retry */ break;
case 'tool_loop_limit': /* raise maxToolRounds or shorten task */ break;
default: console.error(err.code, err.requestId);
}
}
}
Error codes mirror the REST/MCP responses — see Errors for the full
list. The @llm4agents/gasless standalone package is superseded by this SDK and
uses the same code values, so migrating only requires renaming the import and
error class.
Endpoints Reference
Anti-spam: deposit within 15 minutes. Newly registered agents that do not receive any deposit (across any supported chain/token) within ~15 minutes are automatically deleted to prevent database saturation. A deposit of even 1 cent permanently exempts the agent from this sweep. Recommended onboarding: register → POST /api/v1/wallets/generate → fund the returned address → start using the API. Save your apiKey immediately; it cannot be retrieved later.
Request Body
| Field | Type | Description |
|---|---|---|
| name | string | required Agent name (1-100 chars) |
Response 201
{
"uuid": "a1b2c3d4-e5f6-...",
"apiKey": "sk-proxy-abc123...",
"name": "my-agent",
"createdAt": "2026-04-14T12:00:00.000Z",
"requestId": "req_...",
"depositDeadline": "2026-04-14T12:15:00.000Z",
"depositRequiredWithinMinutes": 15,
"notice": "Anti-spam protection: this agent will be automatically deleted if no deposit is received within 15 minutes. You can simply register a new agent if that happens. Once the first deposit (any amount) is credited, the agent becomes permanent. This measure exists solely to prevent malicious actors from generating agents indefinitely."
}
Response 200
{
"status": "ok",
"service": "llm-proxy-api",
"timestamp": "2026-04-14T12:00:00.000Z"
}
Generates a unique deposit wallet address for your agent. Top-up only: the wallet exists solely to fund your LLM4Agents balance — anything sent to it is credited to the single balance that pays for chat completions, scraper, search, and image tools. Gasless-transfer relayer fees are not paid from this balance: they are paid by your own EOA in the token being transferred. Idempotent: calling with the same chain/token returns the existing wallet.
Request Body
| Field | Type | Description |
|---|---|---|
| chain | string | required "solana" or "polygon" |
| token | string | required "USDT" or "USDC" |
Response 200
{
"chain": "solana",
"token": "USDC",
"address": "7xKX...",
"createdAt": "2026-04-14T12:00:00.000Z",
"requestId": "req_..."
}
Response 200
{
"uuid": "a1b2c3d4-...",
"availableUsdCents": 5000,
"availableUsd": "50.00",
"totalDepositedUsd": "100.00",
"totalSpentUsd": "50.00",
"requestId": "req_..."
}
Query Parameters
| Field | Type | Description |
|---|---|---|
| search | string | optional Filter models by slug or display name |
Response 200
{
"models": [
{
"slug": "anthropic/claude-3-haiku",
"displayName": "Claude 3 Haiku",
"provider": "anthropic",
"inputPricePer1M": 0.3,
"outputPricePer1M": 1.5,
"contextWindow": 200000,
"lastSyncedAt": "2026-04-14T06:00:00.000Z"
}
],
"requestId": "req_..."
}
inputPricePer1M and outputPricePer1M are the per-million-token prices in USD. These are the prices you pay.
OpenAI-compatible chat completions endpoint. This is the primary endpoint your agent will use.
Request Body
| Field | Type | Description |
|---|---|---|
| model | string | required Model slug, e.g. "anthropic/claude-3-haiku" |
| messages | array | required Array of message objects with role and content |
| temperature | number | optional Sampling temperature (0-2) |
| max_tokens | integer | optional Maximum output tokens (default: 4096) |
| stream | boolean | optional Enable streaming responses |
Extra fields are passed through to the upstream provider.
Response 200
{
"id": "gen-abc123",
"object": "chat.completion",
"model": "anthropic/claude-3-haiku",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21
}
}
Response Headers
| Header | Description |
|---|---|
X-Cost-Usd-Cents | Actual cost of this request in USD cents |
X-Tokens-Input | Number of input (prompt) tokens |
X-Tokens-Output | Number of output (completion) tokens |
X-Balance-Remaining-Cents | Your remaining balance in USD cents |
X-Request-Id | Unique request identifier |
OpenAI-compatible embeddings endpoint. Returns one vector per input string. Embeddings are billed input-only (no completion tokens). The model slug must be one of the embedding models in the catalog — call GET /api/v1/models?search=embed to list them.
Request Body
| Field | Type | Description |
|---|---|---|
| model | string | required Embedding model slug, e.g. "openai/text-embedding-3-large" |
| input | string | string[] | required Single string or array of up to 2048 strings to embed |
| encoding_format | string | optional "float" (default) or "base64" |
| dimensions | integer | optional Override vector dimensionality (only honored by models that support it) |
| user | string | optional Opaque end-user identifier passed through to upstream |
Response 200
{
"object": "list",
"data": [
{ "object": "embedding", "embedding": [0.012, -0.034, ...], "index": 0 }
],
"model": "openai/text-embedding-3-large",
"usage": { "prompt_tokens": 5, "total_tokens": 5 }
}
Response Headers
| Header | Description |
|---|---|
X-Cost-Usd-Cents | Actual cost of this request in USD cents |
X-Tokens-Input | Number of input (prompt) tokens |
X-Balance-Remaining-Cents | Your remaining balance in USD cents |
X-Model-Used | Slug of the model that actually responded |
X-Request-Id | Unique request identifier |
Step 1 of the non-custodial gasless transfer. Returns the current Vexo relayer fee, the on-chain nonces for your wallet, and the two EIP-712 payloads (Permit + TransferPermit) that your EOA must sign locally. The platform never sees your private key. This endpoint is free — auth is only used to tag the request to your agent for rate-limit and audit purposes.
Request Body
| Field | Type | Description |
|---|---|---|
| chain | string | required Target chain. Currently only "polygon". |
| token | string | required Token symbol ("USDC") or its contract address. Currently only USDC is wired on polygon. |
| from | string | required The EOA that will sign both permits and owns the token balance (20-byte hex, 0x-prefixed). The platform does NOT provision this wallet — you bring your own key. |
| to | string | required Recipient address. |
| amount | string | required Amount in human-decimal form (e.g. "1.50"). Always pass a string — JSON numbers lose precision. |
Response 200
{
"chain": "polygon",
"chainId": 137,
"token": "USDC",
"tokenAddress": "0x3c499c542cEF5E3811e1192ce70d8cC03d5c3359",
"forwarderAddress": "0xba9490B2A5c94AAc18fC7aBf19151757852FB5E7",
"from": "0x9f...",
"to": "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045",
"amount": "1.50",
"amountBaseUnits": "1500000",
"fee": "50000",
"feeFormatted": "0.05 USDC",
"feeDecimal": "0.05",
"deadline": 1756000000,
"nonces": { "token": "3", "forwarder": "7" },
"typedData": {
"permit": { "domain": {...}, "types": {...}, "primaryType": "Permit", "message": {...} },
"transferPermit": { "domain": {...}, "types": {...}, "primaryType": "TransferPermit", "message": {...} }
},
"requestId": "req_..."
}
Sign both typedData.permit and typedData.transferPermit with your wallet (e.g. wallet.signTypedData(domain, types, message) in ethers v6), split each raw signature into {v, r, s}, then call POST /v1/tx/send below. The deadline is 10 minutes from quote — if you sign after that, submit will return 400 expired_quote.
Step 2 of the non-custodial gasless transfer. You submit the two signatures produced from the /quote typed data; the platform re-derives each EIP-712 digest, runs ecrecover on both signatures, verifies the recovered signer matches from, and forwards the bundle to Vexo. The on-chain transfer is executed by Vexo's relayer — the platform signs nothing, touches no keys, and pays no gas of yours. Free service; the request is logged against your API key solely for audit.
Request Body
| Field | Type | Description |
|---|---|---|
| chain, token, from, to, amount | string | required Same values you sent to /v1/tx/quote. |
| fee | string | required Fee in token base units, echoed verbatim from the quote's fee field. |
| deadline | integer | required Unix seconds. Echo the quote's deadline. |
| nonces.token, nonces.forwarder | string | required Uint strings echoed from the quote. |
| permitSig | { v, r, s } | required Split signature of typedData.permit. v is 27/28, r/s are 32-byte hex. |
| transferPermitSig | { v, r, s } | required Split signature of typedData.transferPermit, same shape. |
Response 200
{
"txHash": "0xab...",
"explorerUrl": "https://polygonscan.com/tx/0xab...",
"from": "0x9f...",
"to": "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045",
"chain": "polygon",
"chainId": 137,
"token": "USDC",
"tokenAddress": "0x3c499c542cEF5E3811e1192ce70d8cC03d5c3359",
"amount": "1.50",
"amountBaseUnits": "1500000",
"feeBaseUnits": "50000",
"feeDecimal": "0.05",
"requestId": "req_..."
}
Errors
| HTTP | Code | Meaning |
|---|---|---|
| 400 | validation_error | Bad address / non-decimal amount / malformed signature. |
| 400 | invalid_signature | ecrecover on permitSig or transferPermitSig did not match from. Response includes expected_signer and recovered_signer. |
| 400 | expired_quote | deadline has already passed. Call /v1/tx/quote again to get a fresh one. |
| 400 | unsupported_chain / unsupported_token | Chain or token not currently wired on this deployment. |
| 409 | gasless_gas_spike | Vexo's fee changed since the quote. Safe to retry by requesting a new quote. |
| 502 | gasless_upstream_error | Vexo returned an upstream error while broadcasting. |
| 503 | gasless_operator_unavailable | Vexo's relayer operator is temporarily unavailable. |
| 504 | gasless_timeout | Submit timed out. The transfer may still land on-chain. |
Query Parameters
| Field | Type | Description |
|---|---|---|
| type | string | optional Filter by transaction type: "deposit", "usage", or "refund". Omit to return all types. |
| limit | integer | optional Number of results (1-100, default: 50) |
| offset | integer | optional Pagination offset (default: 0) |
Examples
# All transactions
GET /api/v1/transactions
# Only deposits
GET /api/v1/transactions?type=deposit
# Only usage, second page of 20
GET /api/v1/transactions?type=usage&limit=20&offset=20
Response 200
{
"transactions": [
{
"id": 2,
"type": "deposit",
"amountUsdCents": 5000,
"model": null,
"promptTokens": null,
"completionTokens": null,
"totalTokens": null,
"chain": "polygon",
"txHash": "0xabc...",
"description": "Deposit of 50.00 USDT on polygon",
"createdAt": "2026-04-15T09:15:00.000Z"
},
{
"id": 1,
"type": "usage",
"amountUsdCents": 3,
"model": "anthropic/claude-3-haiku",
"promptTokens": 120,
"completionTokens": 85,
"totalTokens": 205,
"chain": null,
"txHash": null,
"description": "Chat completion: anthropic/claude-3-haiku",
"createdAt": "2026-04-14T12:05:00.000Z"
}
],
"limit": 50,
"offset": 0,
"total": 2,
"requestId": "req_..."
}
Scraper / MCP
A headless browser service exposed via the Model Context Protocol (MCP). Your agent connects to the MCP server and gains access to tools for fetching pages, taking screenshots, extracting data, and running persistent browser sessions — all with optional proxy support (datacenter or residential) and anti-detection stealth.
Connecting
The MCP server is hosted at a separate endpoint. Authenticate with the same API key used for the main API.
MCP Endpoint: https://mcp.llm4agents.com/mcp
Method: POST
Auth: Authorization: Bearer YOUR_API_KEY
Protocol: MCP Streamable HTTP (JSON response mode)
proxy_tier parameter: "none" (direct), "datacenter", or "residential". Proxy tier affects pricing. Billing follows the same reserve-settle pattern as chat completions.
Pricing
All prices shown as fractions of a US cent. Sub-cent billing — you only pay for what you use.
One-Shot Tool Prices
| Tool | No Proxy | Datacenter | Residential |
|---|---|---|---|
fetch_html | $0.0007 | $0.0009 | $0.0037 |
markdown | $0.0010 | $0.0012 | $0.0040 |
links | $0.0007 | $0.0009 | $0.0037 |
screenshot | $0.0010 | $0.0012 | $0.0040 |
pdf | $0.0012 | $0.0014 | $0.0042 |
extract | $0.0012 | $0.0014 | $0.0042 |
Session Pricing
Sessions are billed based on duration + number of actions. Cost is reserved upfront at worst-case and settled to actual usage when the session is closed.
| Duration | Actions | No Proxy | Datacenter | Residential |
|---|---|---|---|---|
| 30s | 3 | $0.009 | $0.011 | $0.015 |
| 2 min | 10 | $0.015 | $0.021 | $0.037 |
| 5 min | 30 | $0.040 | $0.048 | $0.087 |
| 5 min (max) | 50 | $0.052 | $0.060 | $0.099 |
Session Limits
| Limit | Default |
|---|---|
| Max duration | 300 seconds (5 minutes) |
| Max actions per session | 50 |
| Concurrent sessions per agent | 2 |
| Tool timeout | 30 seconds |
| Max payload size | 5 MB |
One-Shot Tools
Each tool opens a browser, performs one action, returns the result, and closes the browser.
Parameters
| Field | Type | Description |
|---|---|---|
| url | string | required URL to fetch (max 2048 chars) |
| timeout_ms | integer | optional Per-attempt navigation timeout (1000–10000 ms, default 10000). With auto_fallback, the chain runs up to 3 attempts. |
| proxy_tier | string | required Starting tier: "none", "datacenter", or "residential". By default the requested tier is honored exactly. |
| auto_fallback | boolean | optional Default false. When true, escalate to higher tiers if the requested one fails. |
Strict mode (default)
By default, fetch_html uses a 10s navigation timeout, a domcontentloaded ready-state (so heavy SPAs do not stall on persistent network activity), and tries the requested proxy_tier exactly once. none means no proxy, no escalation.
Auto-fallback (auto_fallback: true)
If a tier fails (timeout, network error, 5xx, etc.) the tool transparently retries with the next tier:
none→datacenter→residentialdatacenter→residentialresidential(no fallback)
You are billed at the tier that actually returned the page, not the worst case attempted. If every tier fails, the tool returns a soft-error result (no exception) so the model can decide what to do next, billed at the originally requested (cheapest) tier. Deterministic errors (SSRF block, payload too large) abort the chain immediately.
Returns — success
{
"ok": true,
"tier_used": "datacenter",
"html": "<html>...</html>",
"status": 200,
"finalUrl": "https://...",
"attempts": [{ "tier": "none", "error": "Navigation timeout of 10000 ms exceeded" }]
}
Returns — total failure
{
"ok": false,
"tier_used": "none",
"attempts": [
{ "tier": "none", "error": "..." },
{ "tier": "datacenter", "error": "..." },
{ "tier": "residential", "error": "..." }
],
"reason": "All proxy tiers failed. Last error: ...",
"hint": "The page may be down, blocking automated requests, or unreachable. Consider trying a different URL, a search query, or another tool.",
"url": "https://..."
}
Parameters
| Field | Type | Description |
|---|---|---|
| url | string | required URL to convert |
| selector | string | optional CSS selector to scope conversion |
| proxy_tier | string | required |
Returns
{ "markdown": "# Page Title\n\nContent...", "title": "Page Title", "url": "https://..." }
Parameters
| Field | Type | Description |
|---|---|---|
| url | string | required |
| same_origin_only | boolean | optional Only return links from the same origin |
| proxy_tier | string | required |
Returns
{ "links": [{ "href": "https://...", "text": "Link text", "rel": "" }] }
Parameters
| Field | Type | Description |
|---|---|---|
| url | string | required |
| selector | string | optional CSS selector to screenshot a specific element |
| full_page | boolean | optional Capture full scrollable page |
| viewport | object | optional { width, height } (320-3840 x 240-2160) |
| proxy_tier | string | required |
Returns
{ "pngBase64": "iVBOR...", "width": 1920, "height": 1080, "bytes": 184320 }
Parameters
| Field | Type | Description |
|---|---|---|
| url | string | required |
| format | string | optional "A4", "Letter", or "Legal" (default: A4) |
| proxy_tier | string | required |
Returns
{ "pdfBase64": "JVBERi0...", "bytes": 52480 }
Parameters
| Field | Type | Description |
|---|---|---|
| url | string | required |
| selectors | object | required Map of name → CSS selector. Each selector extracts text from matching elements. |
| proxy_tier | string | required |
Example
{ "url": "https://example.com", "selectors": { "title": "h1", "prices": ".price" }, "proxy_tier": "none" }
Returns
{ "data": { "title": "Example", "prices": ["$10", "$20", "$30"] } }
Single-match selectors return a string; multi-match selectors return an array of strings.
Session Tools
Persistent browser sessions let you navigate, click, type, and extract across multiple pages without reopening the browser. Sessions last 5 minutes and support a configurable action limit.
Opens a browser that stays alive for 5 minutes. Cost is reserved upfront (worst-case) and settled on close. Concurrent session limit applies per agent.
Parameters
| Field | Type | Description |
|---|---|---|
| proxy_tier | string | required "none", "datacenter", or "residential" |
| initial_url | string | optional Navigate to this URL immediately after launch |
Returns
{ "session_id": "a1b2c3d4-...", "expires_at": "2026-04-17T12:05:00.000Z" }
Parameters
| Field | Type | Description |
|---|---|---|
| session_id | string | required |
| action | object | required Action to execute (see below) |
Action Types
| type | Fields | Returns |
|---|---|---|
goto | url | { status, url } |
click | selector | { clicked } |
type | selector, text | { typed } |
wait_for | selector, timeout_ms? | { found } |
scroll | to: "top" | "bottom" | { y } | { scrolled } |
get_html | selector? | { html } |
get_url | — | { url } |
screenshot | selector?, full_page? | { pngBase64, bytes } |
extract | selectors | { data } |
Parameters
| Field | Type | Description |
|---|---|---|
| session_id | string | required |
Returns
{ "duration_ms": 45000, "actions_count": 12, "cost_cents": 1.74 }
Parameters
| Field | Type | Description |
|---|---|---|
| session_id | string | required |
Returns
{ "expires_at": "2026-04-17T12:05:00.000Z", "actions_count": 5, "status": "active" }
Search Tools
Google search capabilities exposed via the same MCP server as the scraper tools. Your agent can search Google, Google News, and Google Maps and receive structured results — no browser needed.
Connecting
Uses the same MCP endpoint and authentication as scraper tools.
MCP Endpoint: https://mcp.llm4agents.com/mcp
Method: POST
Auth: Authorization: Bearer YOUR_API_KEY
Protocol: MCP Streamable HTTP (JSON response mode)
Pricing
| Tool | Cost per Call |
|---|---|
| google_search / google_news / google_maps | $0.0012 |
Flat rate per call for all 3 search tools. Billing follows the same reserve-settle pattern as other tools.
Tools
Common Parameters
| Field | Type | Description |
|---|---|---|
| q | string | required Search query (max 2048 chars) |
| gl | string | optional Country code, e.g. "us", "es" (default: "us") |
| hl | string | optional Language code, e.g. "en", "es" (default: "en") |
| tbs | string | optional Date range filter, e.g. "qdr:h" (past hour), "qdr:d" (past day), "qdr:w" (past week) |
| page | integer | optional Pagination page number (default: 1) |
| location | string | optional Geographic location hint (max 200 chars) |
Returns organic search results including knowledge graph, answer boxes, and related searches.
Returns
{
"results": [
{
"title": "Example Domain",
"link": "https://example.com",
"snippet": "This domain is for use in examples..."
}
],
"query": "example domain"
}
Returns news articles with title, link, snippet, date, and source.
Returns
{
"results": [
{
"title": "Breaking News Story",
"link": "https://news.example.com/story",
"snippet": "A major development...",
"date": "2 hours ago",
"source": "Example News"
}
],
"query": "latest tech news"
}
Returns places with title, address, coordinates, category, rating, phone, and website.
Returns
{
"results": [
{
"title": "Best Coffee Shop",
"address": "123 Main St, New York, NY",
"latitude": 40.7128,
"longitude": -74.006,
"category": "Coffee shop",
"rating": 4.7,
"phone": "+1 212-555-0100",
"website": "https://bestcoffee.example.com"
}
],
"query": "coffee shops near me"
}
Batch multiple search queries into a single API call. More efficient than calling google_search multiple times — reduces latency and HTTP roundtrips. Cost is $0.0012 × number of queries.
Parameters
| Field | Type | Description |
|---|---|---|
| queries | array | required Array of search query objects (1–100). Each object accepts the same parameters as google_search: q, gl, hl, tbs, page, location. |
Example Request
{
"queries": [
{ "q": "best restaurants in NYC", "gl": "us" },
{ "q": "weather forecast NYC", "tbs": "qdr:d" },
{ "q": "NYC subway map" }
]
}
Returns
{
"results": [
{
"results": [{ "title": "...", "link": "...", "snippet": "..." }],
"query": "best restaurants in NYC"
},
{
"results": [{ "title": "...", "link": "...", "snippet": "..." }],
"query": "weather forecast NYC"
},
{
"results": [{ "title": "...", "link": "...", "snippet": "..." }],
"query": "NYC subway map"
}
],
"queryCount": 3
}
Image Tools
AI-powered image generation, editing, and analysis tools exposed via the same MCP server as scraper and search tools. Your agent can generate images from text, edit existing images, and analyze image content.
Connecting
Uses the same MCP endpoint and authentication as scraper and search tools.
MCP Endpoint: https://mcp.llm4agents.com/mcp
Method: POST
Auth: Authorization: Bearer YOUR_API_KEY
Protocol: MCP Streamable HTTP (JSON response mode)
Pricing
| Tool | Cost per Call |
|---|---|
| generate_image | $0.01 (≤1.5 MP) / $0.02 (>1.5 MP) |
| edit_image | $0.02 |
| analyze_image | $0.006 |
Image inputs accept either a URL or base64-encoded image data. All image outputs are returned as base64. Billing follows the same reserve-settle pattern as other tools.
Tools
Generates a PNG image from a text description using AI. Cost depends on output resolution: $0.01 for images up to 1.5 megapixels, $0.02 for larger.
Parameters
| Field | Type | Description |
|---|---|---|
| prompt | string | required Text description of the image to generate (max 4096 chars) |
| width | integer | optional Image width in pixels, 512–2048 (default: 1024) |
| height | integer | optional Image height in pixels, 512–2048 (default: 1024) |
Returns
{
"imageBase64": "<base64 PNG data>",
"width": 1024,
"height": 1024,
"megapixels": 1.048,
"costCents": 1
}
Edits an existing image based on a text instruction. The input image can be provided as a URL or base64 data. Cost is a flat $0.02 per edit.
Parameters
| Field | Type | Description |
|---|---|---|
| prompt | string | required Instruction describing the desired edit (max 4096 chars) |
| image | string | required URL or base64-encoded image to edit |
| aspect_ratio | string | optional Output aspect ratio: "match_input_image" (default), "1:1", "16:9", "9:16", "4:3", "3:4", "3:2", "2:3" |
Returns
{
"imageBase64": "<base64 image data>",
"width": 1344,
"height": 768,
"costCents": 2
}
Sends an image to a vision model along with a prompt. The model analyzes the image and returns a text response. Cost is a flat $0.006 per image.
Parameters
| Field | Type | Description |
|---|---|---|
| prompt | string | required Question or instruction about the image (max 4096 chars) |
| image | string | required URL or base64-encoded image to analyze |
Returns
{
"text": "The image shows a system architecture diagram with three main components...",
"costCents": 0.6
}
Workspace
A private per-agent file workspace backed by Cloudflare R2. Upload artifacts (markdown, screenshots, PDFs, JSON) and retrieve them later — billed per-MB with per-day storage. Both Bearer auth and x402 walk-up are supported.
days_to_store runs out (hourly cleanup cron). The workspace row is auto-created on the first paid op — workspace_create is optional. Direct R2 URLs are never exposed: download URLs route through our worker as single-use tokens, so per-download billing is always enforced.
Pricing
All paid operations have a 1¢ minimum. Free operations are rate-limited at 60 req/min per agent.
| Operation | Bearer (balance) | x402 walk-up | Notes |
|---|---|---|---|
workspace_create | Free | Free | One-time, idempotent |
workspace_list / workspace_stat / workspace_delete | Free | Free | Rate-limited 60/min/agent |
| Upload base | $0.0001/MB | $0.00009/MB | One-time, on upload (includes 1st day of storage) |
| Storage per day | $0.000001/MB/day | $0.0000009/MB/day | ~$0.03/GB-month |
| Download per MB | $0.00004/MB | $0.000036/MB | ~$0.04/GB |
| Extend / copy | storage rate × days × size | storage rate × days × size | No upload base for these |
Worked examples
| Scenario | Cost (Bearer) |
|---|---|
| Upload 1 MB for 1 day | $0.01 (1¢ minimum) |
| Upload 100 MB for 30 days | ~$0.03 |
| Upload 1 GB for 30 days | ~$0.14 |
| Download 1 GB | ~$0.05 |
Tools
10 MCP tools, called via the MCP endpoint at https://mcp.llm4agents.com/mcp. Both Bearer and x402 walk-up work on every paid tool. Inline upload/download is capped at 10 MB; larger files use the init/finalize flow.
Free. Auto-creation happens on the first paid op too, so this is optional.
{ "created": true, "agent_id": "...", "created_at": 1700000000 }
Parameters
| Field | Type | Description |
|---|---|---|
| prefix | string | optional Filter by filename prefix (e.g. "scrapes/") |
| limit | integer | optional Max files (1-500, default 100) |
Parameters
| Field | Type | Description |
|---|---|---|
| filename | string | required Allowed: a-zA-Z0-9._-/, max 255 chars. Supports subdirectories. |
| content_base64 | string | required File bytes, base64-encoded. Decoded payload must be ≤10 MB. |
| days_to_store | integer | required 1-365. Pre-pays storage for this duration. |
| content_type | string | optional MIME type to remember. |
workspace_upload_init({ filename, size_bytes, days_to_store, content_type? }) reserves the cost and returns a single-use, content-length-bound presigned PUT URL valid 10 minutes. The agent uploads directly to R2 (single-use prevents URL reuse). Then workspace_upload_finalize({ upload_id }) verifies the bytes landed, settles billing, and registers the file. Must finalize within 15 min of init or the reservation is refunded by cron.
Parameters
| Field | Type | Description |
|---|---|---|
| filename | string | required |
| format | string | optional "inline" (default, ≤10 MB) or "url" (any size) |
| url_ttl_minutes | integer | optional 1-15, default 5. Only for format: "url". |
The url mode returns a URL that points back to our worker (see Public Download Endpoint) — never a direct R2 URL. The token is single-use: the second hit returns 410.
workspace_extend({ filename, additional_days }) — bills storage × additional_days × file size.
workspace_copy({ source_filename, dest_filename, days_to_store }) — server-side copy via R2, no egress. Bills destination storage only.
workspace_stat({ filename }) — file metadata (free, rate-limited).
workspace_delete({ filename }) — delete a file early (free, rate-limited, no storage refund).
Public Download Endpoint
The token-based proxy endpoint that consumes the URL returned by workspace_download({ format: "url" }). No authentication header is required — the token itself is the authorization.
GET https://api.llm4agents.com/v1/workspace/download/:token
| Response | Meaning |
|---|---|
| 200 | Streams the file bytes. Headers: content-type, content-length, content-disposition: attachment, cache-control: no-store. |
| 400 token_required | Empty token in URL. |
| 410 token_invalid_or_used | Token missing, already consumed, or expired (collapsed for enumeration safety). |
| 404 file_not_found | File deleted between issuance and consumption (rare race). |
Gasless Stablecoin Transfers (non-custodial)
Send stablecoin transfers on Polygon from your own EOA without holding any native MATIC. You bring the private key; the platform only validates the signatures you produce and relays them to Vexo's StablecoinForwarder. The service is free — the platform holds no funds, signs nothing, and charges nothing. Your API key is used solely to tag the request for rate-limit and audit. Logged as transactions.type = "gas_sponsored" with amount_usd_cents = 0.
from EOA is never provisioned, derived, or touched by the platform. Your private key stays on your side. The platform can only recover the signer from the signatures you submit and verify it matches the from you declared — if a byte is off, the request is rejected with 400 invalid_signature.
Two-step flow
-
POST /v1/tx/quote— ask the relay for fee + nonces + typed dataSend
{chain, token, from, to, amount}. The platform fetches the current Vexo relayer fee (action: "fee-quote"), reads the on-chainnonces(from)from the token and the forwarder, and returns the full EIP-712 payloads (Permit+TransferPermit) pre-filled for you, plus adeadline10 minutes out. No signatures yet. -
Sign both typed-data objects locally with your wallet
Use
ethers.Wallet.signTypedData(domain, types, message)(or any EIP-712 signer) ontypedData.permitandtypedData.transferPermit. Split each raw signature into{v, r, s}. The platform never sees your key. -
POST /v1/tx/send— submit the signed bundleEcho
chain, token, from, to, amount, fee, deadline, noncesfrom the quote and addpermitSig+transferPermitSig. The platform re-derives each EIP-712 digest, runsecrecover, and forwards to Vexo'saction: "transfer"only if both recovered signers matchfrom. Vexo broadcasts on-chain and returns thetxHash.
What the platform does & does not do
| Does | Does not |
|---|---|
|
|
Supported chains & tokens
| Chain | Chain ID | Token | Contract |
|---|---|---|---|
| Polygon (mainnet) | 137 | USDC | 0x3c499c542cEF5E3811e1192ce70d8cC03d5c3359 |
Example: USDC transfer with your own wallet
Two requests (/quote, /send) bracketing local signing.
import { ethers } from 'ethers';
const wallet = new ethers.Wallet(process.env.USER_PRIVATE_KEY);
const base = 'https://api.llm4agents.com';
const headers = {
'Authorization': `Bearer ${process.env.PROXY_KEY}`,
'Content-Type': 'application/json',
};
// 1. Ask the relay for fee + nonces + typed data
const q = await fetch(`${base}/v1/tx/quote`, {
method: 'POST', headers,
body: JSON.stringify({
chain: 'polygon', token: 'USDC',
from: wallet.address,
to: '0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045',
amount: '1.50',
}),
}).then(r => r.json());
// 2. Sign both typed-data objects locally
const signEip712 = async (td) => {
const { EIP712Domain, ...types } = td.types; // ethers wants types without EIP712Domain
const raw = await wallet.signTypedData(td.domain, types, td.message);
const s = ethers.Signature.from(raw);
return { v: s.v, r: s.r, s: s.s };
};
const permitSig = await signEip712(q.typedData.permit);
const transferPermitSig = await signEip712(q.typedData.transferPermit);
// 3. Submit signed bundle
const tx = await fetch(`${base}/v1/tx/send`, {
method: 'POST', headers,
body: JSON.stringify({
chain: q.chain, token: q.token, from: q.from, to: q.to, amount: q.amount,
fee: q.fee, deadline: q.deadline, nonces: q.nonces,
permitSig, transferPermitSig,
}),
}).then(r => r.json());
console.log(`Sent: ${tx.txHash} (${tx.explorerUrl})`);
# 1. Get the quote
curl https://api.llm4agents.com/v1/tx/quote \
-H "Authorization: Bearer sk-proxy-..." \
-H "Content-Type: application/json" \
-d '{
"chain": "polygon",
"token": "USDC",
"from": "0x9f...",
"to": "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045",
"amount": "1.50"
}'
# 2. Sign typedData.permit and typedData.transferPermit locally (out-of-band).
# 3. Submit the signed bundle
curl https://api.llm4agents.com/v1/tx/send \
-H "Authorization: Bearer sk-proxy-..." \
-H "Content-Type: application/json" \
-d '{
"chain": "polygon",
"token": "USDC",
"from": "0x9f...",
"to": "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045",
"amount": "1.50",
"fee": "50000",
"deadline": 1756000000,
"nonces": { "token": "3", "forwarder": "7" },
"permitSig": { "v": 27, "r": "0x...", "s": "0x..." },
"transferPermitSig": { "v": 28, "r": "0x...", "s": "0x..." }
}'
Notes & gotchas
- You own the EOA.
frommust hold the USDC and the signatures. The platform has no key that can move your funds — not through this endpoint, not anywhere. - The Vexo fee is paid in the same stablecoin by
from, in addition toamount. Vexo collects it as part of the on-chain transfer; the platform does not touch it. deadlineis 10 minutes. Sign promptly. If you sign after it passes,/sendreturns400 expired_quote— just call/quoteagain.- Gas spikes return
409 gasless_gas_spike. If Vexo's fee moved between your quote and your send, safe to retry from/quote. - Signature recovery is strict.
ecrecoveron both signatures must equal the checksummedfrom; mismatches return400 invalid_signaturewith the expected and recovered addresses. - Timeouts may still land.
504 gasless_timeoutmeans the upstream call didn't complete in time; the on-chain transfer can still confirm afterwards. Idempotency is the caller's responsibility. amountis a decimal string. JSON numbers lose precision; always pass a string like"1.50". Same forfee(uint base units) andnonces.*.
Billing
The API uses a reserve-proxy-settle pattern for each chat completion request:
-
Reserve
Before each request, the estimated cost (based on input tokens + max output tokens) is reserved from your balance. If your balance is insufficient, the request is rejected with
402 insufficient_balance. -
Proxy
The request is forwarded to the LLM provider.
-
Settle
After the response completes, the actual cost is calculated from real token counts. Any over-reserved amount is returned to your balance.
Pricing
Each model has per-token pricing (input and output) visible via GET /api/v1/models. These are the final prices you pay.
Billing Headers
Every chat completion response includes billing information in the response headers:
| Header | Description |
|---|---|
X-Cost-Usd-Cents | Actual cost of this request in USD cents |
X-Tokens-Input | Number of input tokens used |
X-Tokens-Output | Number of output tokens generated |
X-Balance-Remaining-Cents | Your remaining balance after this request |
Models
Available models are synced from upstream providers and updated regularly. Use GET /api/v1/models to see current pricing and availability.
Model slugs follow the format: provider/model-name. Examples:
anthropic/claude-3-haikuopenai/gpt-4ogoogle/gemini-prometa-llama/llama-3-70b-instruct
You can use the search query parameter to filter models by name or slug.
Model Fallbacks
Send a prioritized list of models using the models parameter instead of model. If the primary model fails (rate limits, downtime, context length errors, moderation), the next model in the list is tried automatically.
How It Works
- Send
modelsarray — provide 2-3 model slugs in priority order. - Automatic failover — if the first model fails, the next is tried seamlessly.
- Check which model responded — the
X-Model-Usedresponse header tells you which model actually processed the request.
Rules
modelsandmodelare mutually exclusive — use one or the other, not both.- The array must contain 2 to 3 model slugs.
- All models must be active and approved in the platform.
- No duplicate slugs allowed.
- Billing reserves balance using the most expensive model in the list to prevent under-reservation. The final charge uses the model that actually responded.
Fallback Triggers
Fallback activates when the primary model encounters:
- Context length validation errors
- Rate limiting from the upstream provider
- Provider downtime or errors
- Content moderation rejections
Example
curl https://api.llm4agents.com/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"models": ["anthropic/claude-sonnet-4", "openai/gpt-4o"],
"messages": [{"role": "user", "content": "Hello"}]
}'
# Check which model responded:
# Response header X-Model-Used: anthropic/claude-sonnet-4
from openai import OpenAI
client = OpenAI(
base_url="https://api.llm4agents.com/v1",
api_key="your-api-key"
)
response = client.chat.completions.with_raw_response.create(
model="anthropic/claude-sonnet-4",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"models": ["anthropic/claude-sonnet-4", "openai/gpt-4o"]
}
)
print(response.headers["x-model-used"])
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.llm4agents.com/v1',
apiKey: 'your-api-key',
});
const response = await client.chat.completions.create({
model: 'anthropic/claude-sonnet-4',
messages: [{ role: 'user', content: 'Hello' }],
}, {
body: {
models: ['anthropic/claude-sonnet-4', 'openai/gpt-4o'],
},
});
OpenAI Compatibility
The /v1/chat/completions endpoint is fully compatible with OpenAI's API format. Any SDK or tool that supports OpenAI can use this API by changing the base_url.
Supported Parameters
model-- model slug (required unless usingmodels)models-- array of 2-3 model slugs for fallback routing (alternative tomodel)messages-- message array (required)temperature-- sampling temperature (0-2)max_tokens-- maximum output tokensstream-- enable Server-Sent Events streaming
Any additional fields in the request body are passed through to the upstream provider, allowing you to use provider-specific parameters.
SDK Examples
from openai import OpenAI
client = OpenAI(
base_url="https://api.llm4agents.com/v1",
api_key="your-api-key"
)
# Streaming example
stream = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Explain quantum computing"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.llm4agents.com/v1',
apiKey: 'your-api-key',
});
// Streaming example
const stream = await client.chat.completions.create({
model: 'openai/gpt-4o',
messages: [{ role: 'user', content: 'Explain quantum computing' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
curl https://api.llm4agents.com/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"temperature": 0.7,
"max_tokens": 1000
}'
Supported Chains
| Chain | Tokens | Notes |
|---|---|---|
| Solana | USDT, USDC | Fast finality, low fees |
| Polygon | USDT, USDC | EVM-compatible, low fees |
Send stablecoins to your generated deposit wallet address. Deposits are verified on-chain and credited to your LLM4Agents balance — the balance that funds LLM completions, scraper, search, and image tools. Gasless-transfer relayer fees are charged separately by Vexo from the agent's own EOA in the token being transferred and never debit the LLM4Agents balance.
Errors
All error responses include an error field with a machine-readable code and a message field with a human-readable description.
{
"error": "insufficient_balance",
"message": "Not enough funds. Required: 15 cents, available: 3 cents",
"requestId": "req_..."
}
| HTTP Status | Code | Description |
|---|---|---|
| 400 | validation_error | Invalid request body or parameters |
| 401 | missing_api_key | No Authorization header provided |
| 401 | invalid_api_key | API key not recognized |
| 402 | insufficient_balance | Not enough funds for this request |
| 403 | agent_suspended | Agent account has been suspended |
| 404 | model_not_found | Requested model is not available |
| 422 | model_disabled | Model exists but is currently disabled |
| 429 | rate_limited | Too many requests, slow down |
| 400 | invalid_signature | ecrecover on permitSig or transferPermitSig did not match from on /v1/tx/send |
| 400 | expired_quote | The deadline returned by /v1/tx/quote has passed; request a fresh quote |
| 400 | unsupported_chain / unsupported_token | Chain or token not currently wired for gasless transfers |
| 409 | gasless_gas_spike | Vexo's fee changed between /quote and /send; safe to retry with a new quote |
| 502 | provider_error | Error from the upstream LLM provider |
| 502 | gasless_upstream_error | Vexo returned an upstream error on /v1/tx/send |
| 503 | gasless_operator_unavailable | Vexo's relayer operator temporarily unavailable on /v1/tx/send |
| 504 | gasless_timeout | Gasless submit timed out; the transfer may still land on-chain |
Rate Limits
| Endpoint | Limit | Window |
|---|---|---|
Registration (/api/v1/agents/register) | 5 requests | per IP per minute |
Chat completions (/v1/chat/completions) | 600 requests | per API key per minute |
| Other authenticated endpoints | 120 requests | per API key per minute |
When rate limited, the API returns 429 Too Many Requests. Implement exponential backoff in your client.