Overview
0 credits
Get started with TokenSurf
Complete these steps to start saving on LLM costs.
1. Create account — get your API key
Done
2. Add a provider key — OpenAI, Anthropic, Google, or OpenRouter
Add key →
3. Test in the Playground — send your first request and see routing in action
Try it →
4. Check your savings — see cost breakdown by model on the Analytics page
View →
Plan
Free
1,000 req/month
Credits
0
$0.001 per request
Savings
$0.00
This month
Requests
0
0% auto-routed
Cost Saved
$0.00
vs $0.00 direct

Quick Actions

API Playground
Test any model, see routing decisions
Routing Engine
Configure smart routing rules
Response Cache
Cache identical requests for free
🛡
Rules & Guardrails
Content rules, PII redaction

Providers

OA
OpenAI
Not set
AN
Anthropic
Not set
GG
Google
Not set
OR
OpenRouter
Not set

Feature Status

Smart Routing On
Response Cache Off
PII Redaction Off
Webhooks Off

Recent Activity

No requests yet. Try the Playground to send your first request.

Total Requests
0
This month
Downgraded
0
0% of requests
Original Cost
$0.00
Without routing
Actual Cost
$0.00
$0.00 saved

Cost Breakdown

Routing intelligence saves you money by sending simple queries to cheaper models.

MetricValue
Total Requests0
Prompt Tokens0
Completion Tokens0
Original Cost$0.0000
Actual Cost$0.0000
Total Savings$0.0000
Auto-Routed Requests0
Downgrade Rate0%

Cost by Model

Breakdown of spend and savings per model this month.

Model breakdown will appear after your first requests.

Cost by Feature Tag

Tag requests with X-TokenSurf-Tag: checkout to track spend per feature, team, or environment.

Add X-TokenSurf-Tag headers to your requests to see cost breakdowns by feature.

Request Logs

Last 50 API requests with routing decisions, cost, and latency.

No logs yet

Make your first API request to see logs here. Logs appear within 60 seconds.

Response Headers Reference

Every proxy response includes these headers for debugging:

HeaderDescriptionExample
X-TokenSurf-ModelFinal model usedgpt-4o-mini
X-TokenSurf-DowngradedWas the request routed to a cheaper model?true
X-TokenSurf-ComplexityClassified complexity levelsimple

1. Your API Key

Use this as your api_key in the OpenAI SDK.

ts_...
Current key: ts_...

2. Base URL

Point your OpenAI SDK to this base URL:

https://api.tokensurf.io/v1

3. Add a Provider Key

Scroll down to the Provider API Keys section below and add at least one API key (OpenAI, Anthropic, Google, or OpenRouter).

4. Drop-in Replacement

Replace your base URL. That's it. Same SDK, same code, lower costs.

Python Node.js cURL
from openai import OpenAI client = OpenAI( api_key="ts_...", base_url="https://api.tokensurf.io/v1" ) response = client.chat.completions.create( model="gpt-4o", # will auto-route simple queries to gpt-4o-mini messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content)

5. Check Response Headers

# Check response headers print(response.headers["X-TokenSurf-Model"]) # gpt-4o-mini print(response.headers["X-TokenSurf-Downgraded"]) # true print(response.headers["X-TokenSurf-Complexity"]) # simple
Connect at least one provider to start routing requests.

6. Provider API Keys

Add your provider keys. They're encrypted with AES-256-GCM at rest. TokenSurf never stores plaintext keys.

OA
OpenAI
Not set
AN
Anthropic
Not set
GG
Google Gemini
Not set
OR
OpenRouter
Not set

How Routing Works

TokenSurf analyzes each request's complexity and routes it to the cheapest compatible model:

If you requestSimple queries route toYou save
gpt-4ogpt-4o-mini~90%
claude-sonnet-4-6claude-haiku-4-5~85%
gemini-2.5-progemini-2.0-flash~80%
gpt-4-turbogpt-4o-mini~95%
All OpenAI Anthropic Google OpenRouter
ModelProviderInput $/1MOutput $/1MDowngrades to
Current Plan
Free
1,000 requests/month
Credit Balance
0
PAYG + overage credits
Plan Credits Left
0
Included this period
Total Saved
$0.00
Lifetime savings

Subscription Plans

Commit to volume, pay less per request. All plans include AI-powered routing.

Free
1,000 req/month
$0/month
Basic routing, dashboard
Growth
500K req/month
$400/month
$0.0008/req — Save 20%
Priority support, team access
Enterprise
50M+ req/month
Custom pricing
From $0.0004/req
99.9% SLA, account manager
Contact Sales

Top Up Credits

Pay as you go, or add overage credits for your plan. Credits never expire.

$10
10,000 credits
$0.001 / request
$50
50,000 credits
$0.001 / request
$500
500,000 credits
$0.001 / request

Payment History

No payments yet.

Prompt Templates

Store reusable system prompts server-side. Reference in requests with X-TokenSurf-Template: template_id to auto-inject the prompt. Change routing behavior without redeploying your app.

No templates yet. Create one to store reusable system prompts.

API Playground

Test any model with your API key. See the response, routing decision, and cost breakdown.

Response will appear here...
Loading routing configuration...

Routing Engine

Control exactly how TokenSurf routes your requests. Changes take effect immediately. View model catalog

Smart Routing
Master switch. When off, requests always use the exact model you specify — no downgrades, no cost savings.
AI Classifier (Gemini)
Uses Gemini Flash Lite to classify ambiguous queries. Adds ~50ms latency but improves routing accuracy. Requires a Google provider key.
Ambiguous Query Fallback
When the rule-based classifier can't decide and AI classifier is off — what should happen?

Provider Routing

Enable or disable routing to each provider without removing your API keys. Disabled providers will reject requests for their models.

OpenAI
gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
Anthropic
claude-opus, claude-sonnet, claude-haiku families
Google Gemini
gemini-2.5-pro, gemini-2.5-flash, gemini-3.x previews
OpenRouter
Llama, DeepSeek, Mistral, Qwen, Cohere, and 300+ models

Model Downgrade Rules

Fine-tune which models get downgraded and where they route to. Only models with a default downgrade target are shown.

Model Downgrade Routes to Savings

How Classification Works

Understanding the routing pipeline helps you tune it for your workload.

// 1. Rule-based classifier runs first (0ms) if (hasTools || hasResponseFormat) → "complex" // never downgrade if (matchesComplexPattern(lastMessage)) → "complex" // code, analyze, etc. if (messages >= 6 || tokens >= 500) → "complex" // long context if (tokens <= 50 && messages <= 2) → "simple" // short question if (matchesSimplePattern(lastMessage)) → "simple" // "what is", "define" else"ambiguous" // needs AI or fallback // 2. If ambiguous + AI classifier enabled → Gemini classifies (~50ms) // 3. If ambiguous + AI disabled → uses your fallback setting // 4. If simple + model has downgrade target → route to cheaper model // 5. Your per-model overrides apply last
SignalClassified asWhat triggers it
Tools / function callingComplexAny request with tools parameter
Structured outputComplexAny request with response_format
Code patternsComplex"analyze", "implement", "refactor", code blocks
Long conversationComplex6+ messages in the conversation
Long messageComplex500+ estimated tokens in last user message
Factual questionSimple"What is", "Define", "Translate", "Calculate"
Very short querySimpleUnder 50 tokens, 1-2 messages
Everything elseAmbiguousHandled by AI classifier or your fallback

Actions

Content Routing Rules

Override the classifier based on prompt content. Rules are evaluated in order — first match wins. Use regex patterns.

No content rules yet. Add rules like "if prompt contains code blocks, never downgrade."

Guardrails

Protect sensitive data before it leaves your system. PII is redacted from messages before forwarding to providers.

PII Redaction
Auto-detect and redact emails, SSNs, credit card numbers, phone numbers, and IP addresses from prompts before sending to LLM providers. Redacted items are replaced with markers like [EMAIL REDACTED].

Custom Classifier Prompt

Override the default AI classifier with your own prompt. Must instruct the model to respond with SIMPLE or COMPLEX. Leave empty to use the default classifier.

Actions

Response Cache

Cache identical requests and return instant responses. Eliminates provider API calls entirely for repeated queries — zero cost, zero latency.

Semantic Cache
When enabled, identical requests (same model + messages + temperature) return a cached response instantly. No provider call, no credit consumed.
Cache TTL
How long cached responses are valid. Shorter = fresher data. Longer = more cache hits.

Actions

Model Quality Scores

Auto-sampled

5% of API responses are automatically scored for quality using Gemini Flash-Lite. Scores help you verify that downgraded models still meet your quality bar.

Quality scores will appear after your first requests are sampled.

Score Scale

ScoreRatingMeaning
9-10ExcellentComprehensive, accurate, well-structured response
7-8GoodMostly correct with minor issues
4-6FairPartially correct or vague
1-3PoorIncorrect, irrelevant, or harmful

How It Works

For each sampled response, TokenSurf sends the original prompt and the model's response to Gemini Flash-Lite, which scores it on accuracy, completeness, relevance, and helpfulness. Downgraded responses are tracked separately so you can compare quality between your original and routed models.

System Health

Error Rate
0%
Proxy p95
Cache Hit
Response Cache
TTFT p95
Tok/sec p50
Redis

Provider Health

Circuit breaker status per provider. When a provider has repeated failures, the circuit opens and requests fail fast or route to fallback providers.

Response Headers

Every proxy response includes these headers for debugging and observability:

HeaderDescriptionExample
X-TokenSurf-ModelFinal model usedgpt-4o-mini
X-TokenSurf-DowngradedWas the request routed to a cheaper model?true
X-TokenSurf-ComplexityClassified complexity levelsimple
X-TokenSurf-Request-IdUnique request ID for tracinga1b2c3d4...
X-TokenSurf-RegionRegion that served the requestus-central1
X-TokenSurf-FallbackWas a fallback provider used?true

Your Organizations

No organizations yet

No organizations yet. Create one to manage team API keys.

Account

Email
...
User ID
...
API Key Prefix
ts_...

Webhook

Receive a POST request for every routing decision. Feed into Slack, PagerDuty, or your own analytics.

Webhook URL
Leave empty to disable. Must be HTTPS.

Latency & Priority

Control routing based on speed requirements and per-request priority levels.

Max Latency Target
If a provider's p95 latency exceeds this threshold, force downgrade to a faster model. Set to 0 to disable.
Priority Routing
Allow per-request priority override via X-TokenSurf-Priority header. high = never downgrade. low = always downgrade.

Danger Zone

Regenerate API Key
Your old key will stop working immediately
Delete Account
Permanently delete your account, API keys, usage data, and provider keys. This cannot be undone.
Total Cost Savings Generated
$0
Across 0 routed requests
0
Requests
0
Active Users
$0
MRR
0%
Avg Savings
0
Quality Score
Request Volume
Monthly requests Cumulative savings
Cost Impact
Direct API cost $0
$0
Through TokenSurf $0
$0
0%
Average cost reduction
$0 saved
Quality Score Distribution
Downgraded responses maintain high quality
0% of responses scored 7+
Provider Distribution
Requests routed across providers
Routing Intelligence
AI classifier automatically routes simple requests to cheaper models
0%
Downgraded to cheaper models
0%
Kept on premium models
0%
Uptime