TokenSurf API Docs

TokenSurf is an OpenAI-compatible proxy. Change one line, save 40-94% on LLM costs.

Base URL: https://api.tokensurf.io/v1
All endpoints follow the OpenAI API spec. Your existing code works unchanged.

Quickstart

1. Sign up — get 1,000 free credits:

curl -X POST https://tokensurf.io/api/signup \
  -H "Content-Type: application/json" \
  -d '{"email": "you@company.com"}'

2. Save your API key (starts with ts_, shown only once).

3. Add your provider key (e.g. your OpenAI key):

curl -X POST https://tokensurf.io/api/keys/ \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"provider": "openai", "apiKey": "sk-your-openai-key"}'

4. Use it — change your base URL:

from openai import OpenAI

client = OpenAI(
    api_key="ts_your_key",
    base_url="https://api.tokensurf.io/v1")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}]
)
# Simple query → auto-routed to gpt-4o-mini (94% savings)
# Check response headers: X-TokenSurf-Downgraded: true

Authentication

All requests require a Bearer token in the Authorization header:

Authorization: Bearer ts_your_api_key_here

API keys start with ts_ and are generated at signup. We store only the SHA-256 hash — if you lose your key, you'll need to create a new account.

Provider Keys

TokenSurf doesn't call LLMs directly. You bring your own API keys for each provider. Keys are encrypted with AES-256-GCM before storage.

Supported providers:

ProviderKey formatGet one
OpenAIsk-...platform.openai.com
Anthropicsk-ant-...console.anthropic.com
GoogleAIza...aistudio.google.com
OpenRoutersk-or-...openrouter.ai/keys

Credits & Billing

Chat Completions

POST /v1/chat/completions

100% OpenAI-compatible. Supports streaming, tool calls, JSON mode.

Request body

FieldTypeRequiredDescription
modelstringYesModel ID (e.g. gpt-4o, claude-sonnet-4.6, deepseek/deepseek-r1)
messagesarrayYesArray of message objects (role, content)
streambooleanNoEnable SSE streaming (default: false)
temperaturenumberNoSampling temperature
max_tokensintegerNoMaximum output tokens
toolsarrayNoTool/function definitions (forces COMPLEX routing)
response_formatobjectNoJSON mode (forces COMPLEX routing)

Example: non-streaming

curl -X POST https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

Response

{
  "id": "chatcmpl-abc123",
  "model": "gpt-4o-mini",  // ← downgraded for simple query
  "choices": [{
    "message": { "role": "assistant", "content": "Paris." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 14, "completion_tokens": 2, "total_tokens": 16 }
}

Example: streaming

curl -X POST https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-sonnet-4.6", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

Example: OpenRouter model

curl -X POST https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "deepseek/deepseek-r1", "messages": [{"role": "user", "content": "Explain quantum computing"}]}'
# Any model with provider/name format → routes via OpenRouter

List Models

GET /v1/models

Returns all supported models with pricing and routing info.

curl https://api.tokensurf.io/v1/models

Signup

POST /api/signup

Create a new account. Returns an API key (shown only once) and 1,000 free credits.

curl -X POST https://tokensurf.io/api/signup \
  -H "Content-Type: application/json" \
  -d '{"email": "dev@example.com"}'

Response

{
  "apiKey": "ts_7d96f6aac5009f1b...",
  "apiKeyPrefix": "ts_7d96f6...",
  "credits": 1000,
  "message": "Save your API key — it cannot be recovered."
}

Dashboard

GET /api/dashboard/

Returns your credit balance, usage stats, and savings for the current month.

curl https://tokensurf.io/api/dashboard/ \
  -H "Authorization: Bearer ts_your_key"

Manage Provider Keys

GET /api/keys/ — Check which providers are configured

POST /api/keys/ — Save a provider key

DELETE /api/keys/ — Remove a provider key

Save a key

curl -X POST https://tokensurf.io/api/keys/ \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"provider": "anthropic", "apiKey": "sk-ant-your-key"}'

Valid providers: openai, anthropic, google, openrouter

Rotate API Key

POST /api/keys/ with {"action": "rotate"}

Generates a new API key. Your old key continues to work for 24 hours (grace period).

curl -X POST https://tokensurf.io/api/keys/ \
  -H "Authorization: Bearer YOUR_FIREBASE_ID_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"action": "rotate"}'
# Returns: {"status": "rotated", "apiKey": "ts_new...", "prefix": "ts_abc123..."}

Routing Config

GET /api/routingConfigApi/ — Get current routing configuration

PUT /api/routingConfigApi/ — Update routing configuration

DELETE /api/routingConfigApi/ — Reset to defaults

Configuration fields

FieldTypeDescription
enabledbooleanMaster switch for smart routing
aiClassifierbooleanUse Gemini Flash for ambiguous queries
ambiguousFallback"conservative" | "aggressive"How to handle ambiguous queries when AI is off
modelOverridesobjectPer-model {enabled, customTarget} overrides
providerEnabledobjectEnable/disable routing to each provider
providerPrioritystring[]Provider preference order for fallback chains

Buy Credits

POST /api/checkout

Creates a Stripe Checkout session. Redirect the user to the returned URL.

curl -X POST https://tokensurf.io/api/checkout \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"amount": 25}'
# Returns: {"url": "https://checkout.stripe.com/..."}

Amount must be between $5 and $500. $1 = 1,000 credits.

Health Check

GET /api/health

Returns system status, provider health, cache hit rates, and latency percentiles. No authentication required.

curl https://tokensurf.io/api/health
# Returns: {"status":"healthy","region":"us-central1","providers":{...},"metrics":{...}}

Organizations (Teams)

Manage organizations for team-based API key management with per-key budgets and rate limits.

MethodEndpointDescription
GET/api/orgs/List your organizations
POST/api/orgs/Create organization ({"name": "..."})
GET/api/orgs/:idGet org details + members
PUT/api/orgs/:idUpdate org (owner/admin)
DELETE/api/orgs/:idDelete org (owner only)
POST/api/orgs/:id/membersAdd member ({"email": "...", "role": "member"})
DELETE/api/orgs/:id/membersRemove member ({"userId": "..."})

Roles: owner (full control), admin (manage keys + members), member (read-only).

Team API Keys

Create labeled API keys for your organization with per-key budgets, rate limits, and model restrictions.

MethodEndpointDescription
GET/api/org-keys/:orgIdList org's API keys
POST/api/org-keys/:orgIdCreate key (owner/admin)
DELETE/api/org-keys/:orgIdDelete key (owner/admin)

Create team key

curl -X POST https://tokensurf.io/api/org-keys/ORG_ID \
  -H "Authorization: Bearer YOUR_FIREBASE_ID_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"label": "production", "monthlyBudget": 10000, "rpm": 60}'
# Returns: {"apiKey": "ts_org_...", "prefix": "ts_org_abc123..."}

Team keys use the ts_org_ prefix. They consume credits from the organization's balance. Each key can have a monthly budget cap and model allowlist.

Architecture

TokenSurf is a single proxy that sits between your app and LLM providers. Every request flows through this pipeline:

Request Pipeline

// 1. Your app sends a standard OpenAI SDK request
POST /v1/chat/completions
Authorization: Bearer ts_your_key
{ "model": "gpt-4o", "messages": [...] }

// 2. TokenSurf proxy handles it
rate-limit  → In-memory token bucket (60 req/min, 10 req/s burst per key)
auth        → Validate API key (Redis cache → Firestore fallback)
abuse       → Abuse detection (throttle/block on anomalous patterns)
credits     → Redis DECR (atomic, lock-free, ~1ms) with Firestore background sync
classify    → Rule engine (0ms) → classifier cache → AI classifier (~50ms)
route       → If simple + downgrade target exists: swap model
circuit-brk → Check provider health → fallback to alternative provider if down
forward     → Pooled HTTP connection with retry (2 retries, exponential backoff)
translate   → Convert response to OpenAI format (if Anthropic or Google)
quality     → 5% of responses scored async by Gemini Flash-Lite (1-10 scale)
log         → Structured logging + async usage aggregation

// 3. Response returned to your app in OpenAI format
+ X-TokenSurf-Model: gpt-4o-mini
+ X-TokenSurf-Downgraded: true
+ X-TokenSurf-Complexity: simple
+ X-TokenSurf-Request-Id: a1b2c3d4...
+ X-TokenSurf-Region: us-central1

Complexity Classification

The classifier runs in two stages. It's conservative by design — when uncertain, it keeps your original model.

SignalResultWhat triggers it
Tools / function callingComplexAny request with tools parameter
Structured outputComplexAny request with response_format
Code patternsComplex"analyze", "implement", "refactor", "debug", code blocks
Long conversationComplex6+ messages in the conversation
Long messageComplex500+ estimated tokens in last user message
Factual questionSimple"What is", "Define", "Translate", "Calculate"
Very short querySimpleUnder 50 tokens, 1-2 messages
Everything elseAmbiguousSent to Gemini Flash AI classifier or treated as complex

Provider Translation

You always send and receive the OpenAI format. TokenSurf translates internally:

ProviderRequest translationResponse translation
OpenAIPass-throughPass-through
AnthropicExtract system messages, merge consecutive roles, ensure first message is userMap end_turnstop, reconstruct choices array
GoogleSystem → systemInstruction, assistantmodel roleMap STOP/MAX_TOKENS/SAFETY finish reasons
OpenRouterPass-through (OpenAI-compatible)Pass-through

Security

Streaming

Streaming ("stream": true) is fully supported across all providers. SSE events from Anthropic and Google are translated in real-time to the OpenAI chat.completion.chunk format.

Resilience

TokenSurf is built for millions of requests per month with multiple layers of fault tolerance:

LayerMechanismDetails
Rate LimitingToken bucket60 req/min, 10 req/sec burst per API key. Returns 429 with Retry-After header.
Circuit BreakerPer-provider state machineCLOSED → OPEN (fail fast for 30s) → HALF_OPEN (probe) → CLOSED. Triggers on 5+ failures in 60s.
RetryExponential backoff2 retries with jitter on 429/500/502/503. Respects Retry-After headers.
Fallback ChainsCross-provider equivalencesWhen a provider is down, routes to an equivalent model on another provider (e.g. gpt-4o → claude-sonnet-4-6).
Connection Poolingundici HTTP poolsPersistent TCP/TLS connections to all providers. Saves 50-100ms per request.
Abuse DetectionBehavioral analysisThrottles on high request rates (>600/hour) or error rates (>50%). Escalates to key blocking.

Caching

Redis (Memorystore) caching eliminates Firestore from the hot path. All caching is transparent and gracefully degrades if Redis is unavailable.

CacheKeyTTLImpact
Authapikey:{hash}5 min90%+ of Firestore auth queries eliminated (50ms → 1ms)
Creditscredits:{userId}10 minAtomic Redis DECR replaces Firestore transactions (30ms → 1ms)
Classifierclassify:{hash}1 hourSkips Gemini AI call for repeated ambiguous queries (~200ms saved)

Cache is invalidated on: credit top-ups, key rotation, provider key changes, and routing config updates.

Quality Scoring

TokenSurf automatically samples 5% of non-streaming responses and scores them for quality using Gemini Flash-Lite. This helps you verify that downgraded models still meet your quality bar.

ScoreRatingMeaning
9-10ExcellentComprehensive, accurate, well-structured
7-8GoodMostly correct with minor issues
4-6FairPartially correct or vague
1-3PoorIncorrect, irrelevant, or harmful

Quality scores are aggregated per model per month and visible in your dashboard. Downgraded responses are tracked separately so you can compare original vs routed model quality.

Scoring cost: ~$0.00005 per scored response (Gemini Flash-Lite). At 5% sample rate, this adds ~$0.0000025 per request.

How Routing Works

Every request goes through a two-stage classifier:

  1. Rule-based pre-filter (0ms) — catches obvious simple/complex queries using pattern matching:
    • SIMPLE: Short factual questions, translations, calculations, definitions
    • COMPLEX: Code blocks, multi-step reasoning, tool calls, JSON mode, long prompts (500+ tokens), long conversations (6+ messages)
  2. AI classifier — for ambiguous queries, a Gemini Flash-Lite call classifies in <3 seconds. If it times out, defaults to COMPLEX.
Conservative by design: When in doubt, we keep your original model. You never get worse quality — you only save money.

Routing Table

Simple queries get downgraded. Complex queries and cheap models pass through unchanged.

ProviderModelSimple → Routes ToSavings
OpenAIgpt-4ogpt-4o-mini94%
gpt-4-turbogpt-4o-mini98%
gpt-4gpt-4o-mini99%
gpt-4o-mini / gpt-3.5-turbopass-through
Anthropicclaude-opus-4.6 / 4.5claude-haiku-4.580%
claude-sonnet-4.6 / 4.5claude-haiku-4.567%
claude-opus-4.1 / 4.0claude-haiku-4.593%
claude-sonnet-4.0claude-haiku-4.567%
claude-haiku-*pass-through
Googlegemini-3.1-pro-previewgemini-2.5-flash84%
gemini-2.5-progemini-2.5-flash72%
gemini-*-flash*pass-through
OpenRouterAny provider/model formatpass-through (300+ models)

Fallback Chains

When a provider is unavailable (circuit breaker open or persistent 5xx), TokenSurf automatically routes to an equivalent model on another provider. Fallback order follows your providerPriority setting.

Primary ModelAnthropic FallbackGoogle Fallback
gpt-4oclaude-sonnet-4-6gemini-2.5-pro
gpt-4o-miniclaude-haiku-4-5gemini-2.5-flash
Primary ModelOpenAI FallbackGoogle Fallback
claude-sonnet-4-6gpt-4ogemini-2.5-pro
claude-haiku-4-5gpt-4o-minigemini-2.5-flash

Fallback only triggers when the provider is fully down (not for 4xx client errors). The X-TokenSurf-Fallback: true header indicates a fallback was used.

Response Headers

Every proxy response includes these headers:

HeaderValueDescription
X-TokenSurf-Modelgpt-4o-miniThe model that actually served the request
X-TokenSurf-Downgradedtrue / falseWhether the model was downgraded
X-TokenSurf-Complexitysimple / complexHow the query was classified
X-TokenSurf-Request-Ida1b2c3d4-...Unique ID for tracing and support
X-TokenSurf-Regionus-central1Which region served the request
X-TokenSurf-FallbacktruePresent when a fallback provider was used

OpenAI

Requests for OpenAI models are forwarded directly to api.openai.com. Format is pass-through — no translation needed.

Models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo

Required key: openai

Anthropic

Requests are translated from OpenAI format to the Anthropic Messages API. System messages are extracted into the system parameter. Streaming events are transformed to OpenAI SSE format.

Models: claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5, claude-opus-4.5, claude-sonnet-4.5, claude-opus-4.1, claude-sonnet-4.0, claude-opus-4.0, claude-haiku-3.5

Required key: anthropic

Google Gemini

Requests are translated to Gemini's generateContent format. System messages become systemInstruction. Roles are mapped (assistantmodel).

Models: gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite

Required key: google

OpenRouter

Any model ID containing a / (e.g. deepseek/deepseek-r1) is automatically routed through OpenRouter. Format is OpenAI-compatible — no translation needed.

Popular models: meta-llama/llama-3.3-70b-instruct, meta-llama/llama-4-maverick, deepseek/deepseek-chat, deepseek/deepseek-r1, mistralai/mistral-large-latest, qwen/qwen-2.5-72b-instruct, cohere/command-r-plus

Required key: openrouter

See all 300+ models at openrouter.ai/models

Python

from openai import OpenAI

client = OpenAI(
    api_key="ts_your_key",
    base_url="https://api.tokensurf.io/v1"
)

# Works with any supported model
response = client.chat.completions.create(
    model="gpt-4o",  # or claude-sonnet-4.6, gemini-2.5-pro, deepseek/deepseek-r1
    messages=[{"role": "user", "content": "Hello"}]
)

# Check if downgraded
# response.headers["X-TokenSurf-Downgraded"]

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "ts_your_key",
  baseURL: "https://api.tokensurf.io/v1",
});

const response = await client.chat.completions.create({
  model: "claude-opus-4.6",
  messages: [{ role: "user", content: "Hello" }],
});

cURL

curl https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Error Codes

HTTPTypeMeaning
400invalid_request_errorMissing model/messages, unsupported model, no provider key, body too large, or model not allowed for org key
401authentication_errorMissing or invalid API key (ts_ or ts_org_)
402insufficient_creditsNo credits remaining, or org key monthly budget exhausted
403invalid_request_errorModel not in org key's allowlist
405invalid_request_errorWrong HTTP method
409Email already registered (signup), or member already in org
429rate_limit_errorRate limit exceeded (per-key bucket) or abuse detection throttle. Check Retry-After header.
502provider_errorUpstream provider failed after retries — credit is automatically refunded
503provider_unavailableProvider circuit breaker is open and no fallback configured — credit refunded
Provider errors (502/503): If the upstream provider fails, your credit is automatically refunded. You only pay for successful requests. The proxy retries up to 2 times with exponential backoff before returning an error.