# LLM for Agents API — Full Documentation

> Unified LLM inference platform for AI agents. OpenAI-compatible chat completions, headless-browser scraper, Google search, AI image tools, and private file workspace — all billed from a single stablecoin balance.

- **Base URL (REST):** `https://api.llm4agents.com`
- **MCP endpoint:** `https://mcp.llm4agents.com/mcp` (Streamable HTTP, JSON response mode)
- **Authentication:** `Authorization: Bearer sk-proxy-...` for both REST and MCP (same key)
- **Funding:** USDT or USDC on Polygon or Solana, sent to an agent-specific deposit wallet
- **Pricing model:** pay-per-token for chat completions; per-call for tools; markup applied transparently
- **OpenAPI:** [https://api.llm4agents.com/docs/openapi.json](https://api.llm4agents.com/docs/openapi.json)
- **TypeScript SDK:** [`@llm4agents/sdk`](https://github.com/llmforagents/sdk)

---

## Getting Started

### 1. Register an agent

```bash
curl -X POST https://api.llm4agents.com/api/v1/agents/register \
  -H "Content-Type: application/json" \
  -d '{"name": "my-agent"}'
```

Response (201):

```json
{
  "uuid": "a1b2c3d4-e5f6-...",
  "apiKey": "sk-proxy-abc123...",
  "name": "my-agent",
  "createdAt": "2026-04-14T12:00:00.000Z",
  "requestId": "req_..."
}
```

The `apiKey` is shown **only once**. Store it securely — keys are SHA-256 hashed server-side and cannot be recovered.

### 2. Top up the LLM4Agents balance

Generate a deposit wallet, then send USDT or USDC on Solana or Polygon. The wallet is **deposit-only**; every confirmed transfer credits your LLM4Agents balance, which pays for chat completions, scraper, search, image tools, and workspace files. Gasless-transfer relayer fees (`/v1/tx/*`) are **not** debited from this balance — they are paid by your own EOA in the token being transferred.

```bash
curl -X POST https://api.llm4agents.com/api/v1/wallets/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"chain": "solana", "token": "USDC"}'
```

### 3. Call the API

The chat endpoint is OpenAI-compatible. Point any OpenAI SDK at the proxy by overriding `base_url`:

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.llm4agents.com/v1",
    api_key="sk-proxy-...",
)
response = client.chat.completions.create(
    model="anthropic/claude-3-haiku",
    messages=[{"role": "user", "content": "Hello!"}],
)
```

---

## Authentication

All authenticated endpoints require:

```
Authorization: Bearer sk-proxy-...
```

Three auth zones:

- **Public:** `/healthz`, `/readyz`, `/api/v1/agents/register`, `/api/v1/webhooks/vexowallet/:secret`, `/admin/*` panel HTML, `/docs`, `/llms.txt`, `/llms-full.txt`.
- **Agent bearer:** `/api/v1/wallets`, `/api/v1/balance`, `/api/v1/models`, `/api/v1/transactions`, `/v1/chat/*`, `/v1/embeddings`, `/v1/tx/*`.
- **Admin (`X-Admin-Secret` header):** `/api/v1/admin/*`.

A fourth, alternative billing mode runs on `/v1/chat/completions` only:

- **x402 walk-up payment:** present an `X-PAYMENT: <base64 PaymentPayload>` header instead of (or alongside) the Bearer key. No account, no pre-deposit — each request is settled on-chain in USDC on Base via the [x402 protocol](https://github.com/x402-foundation/x402) and the Coinbase CDP facilitator. See `POST /v1/chat/completions` for the full request/response contract, including the trailing SSE `event: x402-receipt` for streaming responses.

---

## TypeScript SDK — `@llm4agents/sdk`

Single facade covering every endpoint and MCP tool. Repo: [github.com/llmforagents/sdk](https://github.com/llmforagents/sdk).

### Install

```bash
npm install @llm4agents/sdk
# Optional — only needed for client.transfer (gasless transfers)
npm install ethers
```

Universal runtime (Node 18+, browser, Cloudflare Workers, Deno, Bun). Zero runtime deps; `ethers ^6` is an optional peer dep.

### Initialize

```typescript
import { LLM4AgentsClient } from '@llm4agents/sdk';

const client = new LLM4AgentsClient({
  apiKey: process.env.LLM4AGENTS_API_KEY!,
  baseUrl: 'https://api.llm4agents.com',     // optional
  mcpUrl:  'https://mcp.llm4agents.com/mcp', // optional
  timeout: 30_000,                            // optional
});
```

### `client.chat`

```typescript
// Single completion
const res = await client.chat.completions.create({
  model: 'anthropic/claude-sonnet-4',
  messages: [{ role: 'user', content: 'Hello' }],
});

// Streaming
const stream = await client.chat.completions.create({
  model: 'anthropic/claude-sonnet-4',
  messages: [...],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

// Conversation with auto tool-execution loop
const conv = client.chat.conversation({
  model: 'anthropic/claude-sonnet-4',
  system: 'You are a research assistant',
  tools: client.tools,
  onToolCall:   (name, args)   => true,        // return false to skip
  onToolResult: (name, result) => {/* log */},
  maxToolRounds: 5,
});
const answer = await conv.say('Search Bitcoin news and summarize the top 3');
console.log(answer.content);
console.log(answer.toolCalls); // ToolCallRecord[]

// Streaming conversation with typed StreamEvent
for await (const ev of await conv.stream('Now find the current price')) {
  switch (ev.type) {
    case 'text':       process.stdout.write(ev.content); break;
    case 'tool_start': /* ev.name, ev.args */ break;
    case 'tool_end':   /* ev.name, ev.result, ev.durationMs */ break;
    case 'done':       /* ev.response.usage */ break;
  }
}

// History is a JSON-serializable array — persist anywhere
const saved = conv.messages;
const rehydrated = client.chat.conversation({ model: '...', history: saved });
```

### `client.embeddings`

```typescript
const res = await client.embeddings.create({
  model: 'openai/text-embedding-3-large',
  input: 'How many vectors fit in a haystack?',
});
console.log(res.data[0].embedding.length, res.usage.prompt_tokens);

// Batch
const batch = await client.embeddings.create({
  model: 'openai/text-embedding-3-small',
  input: ['first', 'second', 'third'],
});
```

### `client.wallets`

```typescript
const wallet = await client.wallets.generate({ chain: 'polygon', token: 'USDC' });
const balance = await client.wallets.balance();
const page = await client.wallets.transactions({ limit: 20, type: 'deposit' });
```

### `client.transfer` — gasless stablecoin transfers

```typescript
// One-call (quote + sign + submit)
const result = await client.transfer.send({
  chain: 'polygon', token: 'USDC',
  to: '0xRecipient...', amount: '10.50',
  privateKey: process.env.EOA_PRIVATE_KEY!,
});

// Two-step (inspect fee first)
const quote = await client.transfer.quote({
  chain: 'polygon', token: 'USDC',
  from: '0xSender...', to: '0xRecipient...', amount: '10.50',
});
const submitted = await client.transfer.submit(quote, process.env.EOA_PRIVATE_KEY!);
```

If `ethers` is not installed and `client.transfer` is used, the SDK throws:
`"ethers is required for gasless transfers. Install it: npm install ethers"`.

### `client.tools` (MCP)

```typescript
// Scraper
await client.tools.scraper.fetchHtml({ url: 'https://example.com' });
await client.tools.scraper.markdown({ url: 'https://example.com' });
await client.tools.scraper.screenshot({ url: 'https://example.com', fullPage: true });

// Search
await client.tools.search.google({ q: 'TypeScript SDK design' });
await client.tools.search.googleNews({ q: 'Bitcoin', tbs: 'qdr:d' });
await client.tools.search.googleMaps({ q: 'coffee near me' });

// Image
await client.tools.image.generate({ prompt: 'A robot writing code' });
await client.tools.image.edit({ prompt: 'Make it blue', imageUrl: '...' });
await client.tools.image.analyze({ prompt: 'What is this?', imageUrl: '...' });

// Workspace
await client.tools.workspace.upload({ filename: 'scrapes/page-1.md', content_base64: '...', days_to_store: 7 });
await client.tools.workspace.list({ prefix: 'scrapes/' });
await client.tools.workspace.download({ filename: 'scrapes/page-1.md', format: 'url', url_ttl_minutes: 5 });

// Tool definitions for custom tool loops or other LLMs
const defs = client.tools.definitions; // ToolDefinition[]
```

### `client.models`

```typescript
const models = await client.models.list();
```

### Errors

All failures surface as a single `LLM4AgentsError` with stable `code`, upstream `statusCode`, and `requestId` from the `x-request-id` response header.

```typescript
import { LLM4AgentsError } from '@llm4agents/sdk';

try {
  await client.chat.completions.create({ ... });
} catch (err) {
  if (err instanceof LLM4AgentsError) {
    switch (err.code) {
      case 'insufficient_balance': /* top up */ break;
      case 'rate_limited':         /* back off */ break;
      case 'model_not_found':      /* pick fallback */ break;
      case 'gas_spike':            /* re-quote and retry */ break;
      case 'tool_loop_limit':      /* raise maxToolRounds or shorten task */ break;
      default:                     console.error(err.code, err.requestId);
    }
  }
}
```

Error code union: `auth_error` | `rate_limited` | `network_error` | `timeout` | `api_error` | `model_not_found` | `model_disabled` | `context_overflow` | `insufficient_balance` | `gas_spike` | `signature_mismatch` | `invalid_token` | `operator_unavailable` | `deadline_expired` | `tool_not_found` | `tool_execution_error` | `tool_loop_limit`.

---

## REST Endpoints Reference

### `POST /api/v1/agents/register` (public)

Register a new agent. Rate-limited to 5 requests per IP per minute.

**Anti-spam: deposit within 15 minutes.** Newly registered agents that do not
receive any deposit (across any chain/token) within ~15 minutes are
automatically deleted to prevent database saturation. A deposit of even 1 cent
permanently exempts the agent from this sweep. The cron sweeper runs every
5 minutes, so the actual TTL is between 15 and 20 minutes from registration.

Recommended onboarding flow:
1. `POST /api/v1/agents/register` — receive API key (save it, you can't retrieve it later)
2. `POST /api/v1/wallets/generate` — receive a deposit address
3. Send any amount of supported stablecoin (USDC/USDT) to that address
4. Start using `/v1/chat/completions` and other authenticated endpoints

Request body:

| Field | Type   | Description                        |
|-------|--------|------------------------------------|
| name  | string | required, 1–100 chars              |

Response (201):

```json
{
  "uuid": "...",
  "apiKey": "sk-proxy-...",
  "name": "my-agent",
  "createdAt": "2026-04-14T12:00:00.000Z",
  "requestId": "req_...",
  "depositDeadline": "2026-04-14T12:15:00.000Z",
  "depositRequiredWithinMinutes": 15,
  "notice": "Anti-spam protection: this agent will be automatically deleted if no deposit is received within 15 minutes. You can simply register a new agent if that happens. Once the first deposit (any amount) is credited, the agent becomes permanent. This measure exists solely to prevent malicious actors from generating agents indefinitely."
}
```

The `depositDeadline` is computed as `createdAt + depositRequiredWithinMinutes`.
Clients SHOULD surface this deadline to the user, schedule the deposit before it
elapses, and treat the `notice` field as the canonical anti-spam explanation.

### `GET /healthz` (public)

```json
{ "status": "ok", "service": "llm-proxy-api", "timestamp": "..." }
```

### `POST /api/v1/wallets/generate` (auth)

Generates a unique top-up address. **Deposit-only** — credits the LLM4Agents balance used to pay for chat completions, scraper, search, image tools, and workspace files. Gasless-transfer relayer fees are paid separately by the agent's own EOA. Idempotent: same `chain`/`token` returns the existing wallet.

Request body:

| Field | Type   | Description                          |
|-------|--------|--------------------------------------|
| chain | string | required — `"solana"` or `"polygon"` |
| token | string | required — `"USDT"` or `"USDC"`      |

### `GET /api/v1/balance` (auth)

Returns available balance and per-wallet sub-account breakdown.

### `GET /api/v1/models` (auth)

Returns the active model catalog with effective markup-aware pricing per million tokens.

### `POST /v1/chat/completions` (auth OR x402 walk-up)

OpenAI-compatible. Supports:

- `model` — single model slug (mutually exclusive with `models`).
- `models` — array of 2–3 slugs for fallback routing. All slugs must be active. Billing reserves balance using the most expensive model in the list to prevent under-reservation; the final charge uses the model that actually responded.
- `messages` — required.
- `temperature`, `max_tokens`, `stream` — standard.
- Additional fields are forwarded to the upstream provider.

The response includes the `X-Model-Used` header indicating which model actually answered.

**x402 walk-up payment mode.** Submit the [x402 protocol](https://github.com/x402-foundation/x402) `X-PAYMENT` header instead of (or alongside) the `Authorization: Bearer …` header. No account, no pre-deposited balance — each request is settled on-chain in USDC on Base (or Base Sepolia in staging). Accepts the `exact` scheme today.

x402 is supported on multiple surfaces — `POST /v1/chat/completions`, the MCP endpoint at `mcp.llm4agents.com/mcp`, and the REST scraper / search / image surfaces under `/v1/scrape/*`, `/v1/search/*`, `/v1/image/*`. The wire protocol is identical across all of them; only the priced unit differs (dynamic upper bound for chat, per-tool flat rate from `service_pricing.x402_value` for the rest).

- **Unpaid request:** the server emits `HTTP 402` with a base64-encoded `PaymentRequirements` in the `PAYMENT-REQUIRED` response header AND in the JSON body (`{ x402Version: 2, accepts: [PaymentRequirements] }`).
- **Signed retry:** include `X-PAYMENT: <base64 PaymentPayload>` (use [`x402-fetch`](https://www.npmjs.com/package/x402-fetch) or [`x402-axios`](https://www.npmjs.com/package/x402-axios)). The server verifies the signed authorization with the Coinbase CDP facilitator, runs the LLM call, then settles for the **real cost** (always `≤` the signed `maxAmount`) and returns the response.
- **Response headers (success):** standard billing headers PLUS `X-Payment-Method: x402` and `X-Payer-Address: 0x…`. No `X-Balance-Remaining-Cents` (walk-up clients have no balance).
- **Streaming:** trailing SSE chunk `event: x402-receipt` followed by `data: { transaction, network, amount, payer }` arrives after the OpenAI-shaped `data: [DONE]`. Settlement happens in `ctx.waitUntil` so the client sees the LLM response immediately.
- **Replay protection:** every `nonce` is single-use. Reuse → `HTTP 409 x402_nonce_reused` without invoking the LLM or the facilitator.
- **Expired authorization:** `validBefore <= now` → `HTTP 400 x402_authorization_expired`.

### `POST /v1/scrape/{fetch_html,markdown,links,screenshot,pdf,extract}` (auth OR x402 walk-up)

REST mirror of the MCP scraper tools. Body is the MCP `arguments` object directly (e.g. `{ "url": "https://example.com" }`). Routes through the `SCRAPER` service binding so billing happens exactly once in the scraper worker — no double-billing risk. Bearer mode debits `service_pricing.value` from the LLM4Agents balance; x402 walk-up settles `service_pricing.x402_value` on-chain. Same response shapes as the MCP equivalents.

### `POST /v1/search/{google,news,maps,batch}` (auth OR x402 walk-up)

REST mirror of `google_search`, `google_news`, `google_maps`, `google_batch_search`. Body fields per the MCP tools (`q`, `gl`, `hl`, `tbs`, `page`, `location`; `batch` takes `queries[]` 1–100). `batch` bills per query in both modes. Same dual billing model.

### `POST /v1/image/{generate,edit,analyze}` (auth OR x402 walk-up)

REST mirror of `generate_image`, `edit_image`, `analyze_image`. Body matches the MCP `arguments`. Generation pricing depends on output megapixels (≤1.5MP or >1.5MP) — the same threshold drives both `value` and `x402_value`. Walk-up clients can call `analyze_image` for vision Q&A without a balance.

### `GET /v1/workspace/download/:token` (public — token is the auth)

Public endpoint that consumes a one-time download token issued by `workspace_download({ format: 'url' })`. No Bearer or X-PAYMENT header required — possession of the token is the authorization. The token is single-use: the second hit returns `410 Gone`. Tokens expire 1-15 minutes after issuance (configurable via `url_ttl_minutes` on the issuing MCP call).

Response: streams the file bytes from R2 through the worker. Headers include `content-type`, `content-length`, and `content-disposition: attachment; filename="..."`. `cache-control: no-store`.

Billing: the original agent was billed at token issuance (per-MB download rate from `service_pricing`). This endpoint is consumption only.

Errors:
- `400 token_required` — empty token in URL
- `410 token_invalid_or_used` — token missing, already consumed, or expired (collapsed for enumeration safety)
- `404 file_not_found` — the file was deleted between issuance and consumption (rare race condition)

### `POST /v1/embeddings` (auth)

OpenAI-compatible embeddings. Body:

- `model` — embedding model slug (e.g. `openai/text-embedding-3-large`). Must point to a row of `model_type='embedding'` in the catalog. The embedding catalog is hand-curated because OpenRouter's public listing endpoint omits embedding models.
- `input` — single string or an array of up to 2048 strings.
- `encoding_format` — `"float"` (default) or `"base64"`.
- `dimensions` — optional override for vector dimensionality.
- `user` — optional opaque end-user identifier passed through to the upstream provider.

Response shape mirrors OpenAI: `{ object: "list", data: [{ object: "embedding", embedding, index }], model, usage: { prompt_tokens, total_tokens } }`. Embeddings have no completion tokens; billing is input-only. The `X-Model-Used` header reports which model actually responded.

### `POST /v1/tx/quote` and `POST /v1/tx/send` (auth)

Non-custodial gasless ERC-20 stablecoin transfers via Vexo's `StablecoinForwarder`. The platform holds no keys, signs nothing, and charges nothing — your agent's EOA signs an EIP-2612 `Permit` and an EIP-712 `TransferPermit` locally; the proxy validates the signatures via `ecrecover` and relays them to Vexo. Recorded as `transactions.type = 'gas_sponsored'` with `amount_usd_cents = 0`.

Two-step flow:

1. `POST /v1/tx/quote` — submit `(chain, token, from, to, amount)`. Receive the relayer fee, on-chain nonces, both EIP-712 typed-data payloads, and a 10-minute `deadline`.
2. Sign `typedData.permit` and `typedData.transferPermit` locally (e.g. `ethers.Wallet.signTypedData`). Split each signature into `{v, r, s}`.
3. `POST /v1/tx/send` — submit the signed bundle. The platform re-derives each digest, runs `ecrecover`, returns `400 invalid_signature` if either recovered signer ≠ `from`, otherwise forwards to Vexo.

Currently active: **Polygon / USDC** (`0x3c499c542cEF5E3811e1192ce70d8cC03d5c3359`, 6 decimals). Other chains/tokens are documented but not yet wired (`gaslessSupported = false`).

Documented but pending: `polygon/DAI`, plus `USDC/DAI/PYUSD/USDS/USDe/RLUSD/FDUSD` on `ethereum`, `arbitrum`, `optimism`, `avalanche`, `base`, plus `solana` and `tron` SPL/TRC tokens.

### `GET /api/v1/transactions` (auth)

Paginated ledger of deposits, chat charges, scraper / search / image / workspace charges, and gasless events. Query params: `limit`, `offset`, `type`.

---

## MCP Tools

All tools live behind a single MCP Streamable HTTP endpoint. Stateless — no MCP session management.

**MCP endpoint:** `POST https://mcp.llm4agents.com/mcp`
**Protocol:** MCP Streamable HTTP (JSON response mode)

**Two billing modes (pick one per request):**

1. **Bearer + balance** — `Authorization: Bearer YOUR_API_KEY`. Per-call cost is debited from the LLM4Agents balance using the `service_pricing.value` rate (aggressive per-cent, sometimes 5–10× cheaper than walk-up). Browser-session tools (`session_*`) are Bearer-only.

2. **x402 walk-up** — `X-PAYMENT: <base64 PaymentPayload>` instead of (or alongside) Bearer. Per-call cost is settled on-chain using `service_pricing.x402_value` (flat rate, ~10% below x402engine.app reference). Tools with `x402_value = NULL` are not exposed in walk-up mode (`session_*` and any unconfigured tool). Wire protocol identical to chat completions — 402 with `paymentRequirements`, signed retry, single-use nonces.

The same priced unit applies on the REST surfaces (`/v1/scrape/*`, `/v1/search/*`, `/v1/image/*` — same dual billing, see those sections).

### Scraper Tools (headless browser)

| Tool         | Description                                                      | No Proxy | Datacenter | Residential |
|--------------|------------------------------------------------------------------|---------:|-----------:|------------:|
| `fetch_html` | Fetch full page HTML (auto-escalates proxy tier on failure)      |  $0.0007 |    $0.0009 |     $0.0037 |
| `markdown`   | Convert page to markdown                                         |  $0.0010 |    $0.0012 |     $0.0040 |
| `links`      | Extract all links                                                |  $0.0007 |    $0.0009 |     $0.0037 |
| `screenshot` | Take a screenshot                                                |  $0.0010 |    $0.0012 |     $0.0040 |
| `pdf`        | Generate PDF                                                     |  $0.0012 |    $0.0014 |     $0.0042 |
| `extract`    | Extract structured data via CSS selectors                        |  $0.0012 |    $0.0014 |     $0.0042 |

#### `fetch_html` behavior

Uses a **10 s per-attempt** navigation timeout (configurable 1000–10000 ms via `timeout_ms`) and a `domcontentloaded` ready-state — heavy SPAs with continuous background traffic don't stall on `networkidle`. By default the requested `proxy_tier` is honored exactly: `none` means no proxy, no escalation.

Set `auto_fallback: true` to opt into a transparent retry chain when a tier fails: starting from `none` tries `none → datacenter → residential`; from `datacenter` tries `datacenter → residential`; from `residential` makes one attempt.

With `auto_fallback`, you are billed at the tier that actually returned the page (`tier_used` in the response). On total failure the tool returns a soft-error result (no exception) so the model can decide what to do next, billed at the originally requested (cheapest) tier. Deterministic errors (SSRF block, payload too large) abort the chain immediately and refund the reservation.

```jsonc
// Successful response (escalated from "none" to "datacenter")
{
  "ok": true,
  "tier_used": "datacenter",
  "html": "<html>...</html>",
  "status": 200,
  "finalUrl": "https://example.com/",
  "attempts": [
    { "tier": "none", "error": "Navigation timeout of 10000 ms exceeded" }
  ]
}

// Total failure
{
  "ok": false,
  "tier_used": "none",
  "attempts": [
    { "tier": "none",        "error": "..." },
    { "tier": "datacenter",  "error": "..." },
    { "tier": "residential", "error": "..." }
  ],
  "reason": "All proxy tiers failed. Last error: ...",
  "hint":   "The page may be down, blocking automated requests, or unreachable. Consider trying a different URL, a search query, or another tool.",
  "url":    "https://..."
}
```

Persistent-session tools — `session_create`, `session_exec`, `session_close`, `session_status` — billed by duration plus actions for multi-step browser workflows.

### Search Tools (Google via Serper)

| Tool                  | Cost per Call  | Description                                 |
|-----------------------|----------------|---------------------------------------------|
| `google_search`       | $0.0012        | Google web search — organic results         |
| `google_news`         | $0.0012        | Google News — recent articles               |
| `google_maps`         | $0.0012        | Google Maps — places and businesses         |
| `google_batch_search` | $0.0012 × N    | Batch 1–100 web searches in a single call   |

All search tools accept: `q` (required), `gl`, `hl`, `tbs` (date filter, e.g. `qdr:d`), `page`, `location`.

### Image Tools (AI-powered)

| Tool             | Cost per Call                  | Description                                                     |
|------------------|--------------------------------|-----------------------------------------------------------------|
| `generate_image` | $0.01 (≤1.5MP) / $0.02 (>1.5MP)| Generate an image from a text prompt                            |
| `edit_image`     | $0.02                          | Edit an image using a text instruction (image-to-image)         |
| `analyze_image`  | $0.006                         | Analyze an image and answer questions about it (vision)         |

- `generate_image` accepts: `prompt` (required), `width` (512–2048, default 1024), `height` (512–2048, default 1024). Returns base64 PNG.
- `edit_image` accepts: `prompt` (required), `image` (URL or base64, required), `aspect_ratio` (optional). Returns base64 image.
- `analyze_image` accepts: `prompt` (required), `image` (URL or base64, required). Returns text analysis.

### Workspace Tools

Every agent gets a private, namespaced file workspace backed by Cloudflare R2. Both Bearer and x402 walk-up are supported. 10 MCP tools:

- `workspace_create` — idempotent confirmation that the workspace exists. Free.
- `workspace_list({ prefix?, limit? })` — list files. Free, rate-limited (60 req/min).
- `workspace_stat({ filename })` — file metadata. Free, rate-limited.
- `workspace_delete({ filename })` — delete a file (no storage refund). Free, rate-limited.
- `workspace_upload({ filename, content_base64, days_to_store, content_type? })` — inline upload, ≤10 MB. Billed: 0.01¢/MB + 0.0001¢/MB/day storage.
- `workspace_upload_init({ filename, size_bytes, days_to_store, content_type? })` — start a large upload. Returns a presigned R2 PUT URL (single-use, content-length bound) and an `upload_id`. Cost is reserved upfront.
- `workspace_upload_finalize({ upload_id })` — confirm the PUT, settle billing, register the file. Must run within 15 min of init or the reservation is refunded by cron.
- `workspace_download({ filename, format?: 'inline'|'url', url_ttl_minutes? })` — billed: 0.004¢/MB. `inline` returns base64 (≤10 MB). `url` returns a single-use proxied URL (`/v1/workspace/download/:token`) that streams through our worker — **direct R2 URLs are never exposed** so per-download billing is enforced.
- `workspace_extend({ filename, additional_days })` — extend storage on a file. Billed at storage rate × bytes × days.
- `workspace_copy({ source_filename, dest_filename, days_to_store })` — server-side copy. Billed for destination storage only.

Pricing: $0.10/GB upload base, ~$0.03/GB-month storage, $0.04/GB download. x402 walk-up rates are ~10% lower per-MB. All paid ops have a 1¢ floor.

Files auto-expire when `days_to_store` runs out (hourly cleanup cron). The workspace row is auto-created on the first paid op — `workspace_create` is optional.

---

## Errors

All error responses include a machine-readable `error` code and a human-readable `message`:

```json
{
  "error": "insufficient_balance",
  "message": "Not enough funds. Required: 15 cents, available: 3 cents",
  "requestId": "req_..."
}
```

| HTTP | Code                            | Description                                                          |
|-----:|---------------------------------|----------------------------------------------------------------------|
|  400 | `validation_error`              | Invalid request body or parameters                                   |
|  400 | `invalid_signature`             | `ecrecover` on `permitSig` or `transferPermitSig` did not match `from` |
|  400 | `expired_quote`                 | The `deadline` returned by `/v1/tx/quote` has passed                 |
|  400 | `unsupported_chain` / `unsupported_token` | Chain or token not currently wired for gasless transfers   |
|  401 | `missing_api_key`               | No Authorization header provided                                     |
|  401 | `invalid_api_key`               | API key not recognized                                               |
|  402 | `insufficient_balance`          | Not enough funds for this request                                    |
|  403 | `agent_suspended`               | Agent account has been suspended                                     |
|  404 | `model_not_found`               | Requested model is not available                                     |
|  409 | `gasless_gas_spike`             | Vexo's fee changed between `/quote` and `/send`; retry with new quote|
|  422 | `model_disabled`                | Model exists but is currently disabled                               |
|  429 | `rate_limited`                  | Too many requests, slow down                                         |
|  502 | `provider_error`                | Error from the upstream LLM provider                                 |
|  502 | `gasless_upstream_error`        | Vexo returned an upstream error on `/v1/tx/send`                     |
|  503 | `gasless_operator_unavailable`  | Vexo's relayer operator temporarily unavailable                      |
|  504 | `gasless_timeout`               | Gasless submit timed out; the transfer may still land on-chain       |

---

## Rate Limits

| Endpoint                                     | Limit          | Window               |
|----------------------------------------------|----------------|----------------------|
| `POST /api/v1/agents/register`               | 5 requests     | per IP per minute    |
| `POST /v1/chat/completions`                  | 600 requests   | per API key per minute |
| Other authenticated endpoints                | 120 requests   | per API key per minute |

When rate limited, the API returns `429 Too Many Requests`. Implement exponential backoff in your client.

---

## Supported Chains for Deposits

| Chain    | Tokens     | Notes                              |
|----------|------------|------------------------------------|
| Solana   | USDT, USDC | Fast finality, low fees            |
| Polygon  | USDT, USDC | EVM-compatible, low fees           |

Send stablecoins to your generated deposit wallet address. Deposits are verified on-chain and credited automatically. The credited balance funds chat completions, scraper, search, image tools, and workspace files. Gasless-transfer relayer fees are charged separately by Vexo from the agent's own EOA in the token being transferred and never debit the LLM4Agents balance.