TokenSurf Dashboard

Get started with TokenSurf

Complete these steps to start saving on LLM costs.

2. Add a provider key — OpenAI, Anthropic, Google, or OpenRouter

Add key →

3. Test in the Playground — send your first request and see routing in action

Try it →

4. Check your savings — see cost breakdown by model on the Analytics page

View →

Plan

Free

1,000 req/month

Credits

0

$0.001 per request

Savings

$0.00

This month

Requests

0

0% auto-routed

Cost Saved

$0.00

vs $0.00 direct

Quick Actions

▶

API Playground

Test any model, see routing decisions

⚙

Routing Engine

Configure smart routing rules

⚡

Response Cache

Cache identical requests for free

🛡

Rules & Guardrails

Content rules, PII redaction

Providers

OA

OpenAI

Not set

AN

Anthropic

Not set

GG

Google

Not set

OR

OpenRouter

Not set

Feature Status

Smart Routing On

Response Cache Off

PII Redaction Off

Webhooks Off

Recent Activity

No requests yet. Try the Playground to send your first request.

Total Requests

0

This month

Downgraded

0

0% of requests

Original Cost

$0.00

Without routing

Actual Cost

$0.00

$0.00 saved

Cost Breakdown

Routing intelligence saves you money by sending simple queries to cheaper models.

Metric	Value
Total Requests	0
Prompt Tokens	0
Completion Tokens	0
Original Cost	$0.0000
Actual Cost	$0.0000
Total Savings	$0.0000
Auto-Routed Requests	0
Downgrade Rate	0%

Cost by Model

Breakdown of spend and savings per model this month.

Model breakdown will appear after your first requests.

Cost by Feature Tag

Tag requests with X-TokenSurf-Tag: checkout to track spend per feature, team, or environment.

Add X-TokenSurf-Tag headers to your requests to see cost breakdowns by feature.

Request Logs

Last 50 API requests with routing decisions, cost, and latency.

No logs yet

Make your first API request to see logs here. Logs appear within 60 seconds.

Response Headers Reference

Every proxy response includes these headers for debugging:

Header	Description	Example
X-TokenSurf-Model	Final model used	gpt-4o-mini
X-TokenSurf-Downgraded	Was the request routed to a cheaper model?	true
X-TokenSurf-Complexity	Classified complexity level	simple

1. Your API Key

Use this as your api_key in the OpenAI SDK.

ts_...

Current key: ts_...

2. Base URL

Point your OpenAI SDK to this base URL:

https://api.tokensurf.io/v1

3. Add a Provider Key

Scroll down to the Provider API Keys section below and add at least one API key (OpenAI, Anthropic, Google, or OpenRouter).

4. Drop-in Replacement

Replace your base URL. That's it. Same SDK, same code, lower costs.

Python Node.js cURL

from openai import OpenAI client = OpenAI( api_key="ts_...", base_url="https://api.tokensurf.io/v1" ) response = client.chat.completions.create( model="gpt-4o", # will auto-route simple queries to gpt-4o-mini messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content)

import OpenAI from "openai"; const client = new OpenAI({ apiKey: "ts_...", baseURL: "https://api.tokensurf.io/v1" }); const res = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }] }); console.log(res.choices[0].message.content);

curl https://api.tokensurf.io/v1/chat/completions \ -H "Authorization: Bearer ts_..." \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }'

5. Check Response Headers

# Check response headers print(response.headers["X-TokenSurf-Model"]) # gpt-4o-mini print(response.headers["X-TokenSurf-Downgraded"]) # true print(response.headers["X-TokenSurf-Complexity"]) # simple

Connect at least one provider to start routing requests.

6. Provider API Keys

Add your provider keys. They're encrypted with AES-256-GCM at rest. TokenSurf never stores plaintext keys.

OA

OpenAI

Not set

AN

Anthropic

Not set

GG

Google Gemini

Not set

OR

OpenRouter

Not set

How Routing Works

TokenSurf analyzes each request's complexity and routes it to the cheapest compatible model:

If you request	Simple queries route to	You save
gpt-4o	gpt-4o-mini	~90%
claude-sonnet-4-6	claude-haiku-4-5	~85%
gemini-2.5-pro	gemini-2.0-flash	~80%
gpt-4-turbo	gpt-4o-mini	~95%

All OpenAI Anthropic Google OpenRouter

Model	Provider	Input $/1M	Output $/1M	Downgrades to

Current Plan

Free

1,000 requests/month

Credit Balance

0

PAYG + overage credits

Plan Credits Left

0

Included this period

Total Saved

$0.00

Lifetime savings

Subscription Plans

Commit to volume, pay less per request. All plans include AI-powered routing.

Free

1,000 req/month

$0/month

Basic routing, dashboard

Growth

500K req/month

$400/month

$0.0008/req — Save 20%

Priority support, team access

Scale

5M req/month

$3,000/month

$0.0006/req — Save 40%

Dedicated support, custom rules, quality scoring

Enterprise

50M+ req/month

Custom pricing

From $0.0004/req

99.9% SLA, account manager

Contact Sales

Top Up Credits

Pay as you go, or add overage credits for your plan. Credits never expire.

$10

10,000 credits

$0.001 / request

$50

50,000 credits

$0.001 / request

$200

200,000 credits

$0.001 / request

$500

500,000 credits

$0.001 / request

Payment History

No payments yet.

Prompt Templates

Store reusable system prompts server-side. Reference in requests with X-TokenSurf-Template: template_id to auto-inject the prompt. Change routing behavior without redeploying your app.

No templates yet. Create one to store reusable system prompts.

API Playground

Test any model with your API key. See the response, routing decision, and cost breakdown.

Model System Prompt (optional) User Message

Response

                  Response will appear here...
                

Loading routing configuration...

Routing Engine

Control exactly how TokenSurf routes your requests. Changes take effect immediately. View model catalog

Smart Routing

Master switch. When off, requests always use the exact model you specify — no downgrades, no cost savings.

AI Classifier (Gemini)

Uses Gemini Flash Lite to classify ambiguous queries. Adds ~50ms latency but improves routing accuracy. Requires a Google provider key.

Ambiguous Query Fallback

When the rule-based classifier can't decide and AI classifier is off — what should happen?

Provider Routing

Enable or disable routing to each provider without removing your API keys. Disabled providers will reject requests for their models.

OpenAI

gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo

Anthropic

claude-opus, claude-sonnet, claude-haiku families

Google Gemini

gemini-2.5-pro, gemini-2.5-flash, gemini-3.x previews

OpenRouter

Llama, DeepSeek, Mistral, Qwen, Cohere, and 300+ models

Model Downgrade Rules

Fine-tune which models get downgraded and where they route to. Only models with a default downgrade target are shown.

Model	Downgrade	Routes to	Savings

How Classification Works

Understanding the routing pipeline helps you tune it for your workload.

// 1. Rule-based classifier runs first (0ms) if (hasTools || hasResponseFormat) → "complex" // never downgrade if (matchesComplexPattern(lastMessage)) → "complex" // code, analyze, etc. if (messages >= 6 || tokens >= 500) → "complex" // long context if (tokens <= 50 && messages <= 2) → "simple" // short question if (matchesSimplePattern(lastMessage)) → "simple" // "what is", "define" else → "ambiguous" // needs AI or fallback // 2. If ambiguous + AI classifier enabled → Gemini classifies (~50ms) // 3. If ambiguous + AI disabled → uses your fallback setting // 4. If simple + model has downgrade target → route to cheaper model // 5. Your per-model overrides apply last

Signal	Classified as	What triggers it
Tools / function calling	Complex	Any request with `tools` parameter
Structured output	Complex	Any request with `response_format`
Code patterns	Complex	"analyze", "implement", "refactor", code blocks
Long conversation	Complex	6+ messages in the conversation
Long message	Complex	500+ estimated tokens in last user message
Factual question	Simple	"What is", "Define", "Translate", "Calculate"
Very short query	Simple	Under 50 tokens, 1-2 messages
Everything else	Ambiguous	Handled by AI classifier or your fallback

Actions

Content Routing Rules

Override the classifier based on prompt content. Rules are evaluated in order — first match wins. Use regex patterns.

No content rules yet. Add rules like "if prompt contains code blocks, never downgrade."

Guardrails

Protect sensitive data before it leaves your system. PII is redacted from messages before forwarding to providers.

PII Redaction

Auto-detect and redact emails, SSNs, credit card numbers, phone numbers, and IP addresses from prompts before sending to LLM providers. Redacted items are replaced with markers like [EMAIL REDACTED].

Custom Classifier Prompt

Override the default AI classifier with your own prompt. Must instruct the model to respond with SIMPLE or COMPLEX. Leave empty to use the default classifier.

Actions

Response Cache

Cache identical requests and return instant responses. Eliminates provider API calls entirely for repeated queries — zero cost, zero latency.

Semantic Cache

When enabled, identical requests (same model + messages + temperature) return a cached response instantly. No provider call, no credit consumed.

Cache TTL

How long cached responses are valid. Shorter = fresher data. Longer = more cache hits.

Actions

Model Quality Scores

Auto-sampled

5% of API responses are automatically scored for quality using Gemini Flash-Lite. Scores help you verify that downgraded models still meet your quality bar.

Quality scores will appear after your first requests are sampled.

Score Scale

Score	Rating	Meaning
9-10	Excellent	Comprehensive, accurate, well-structured response
7-8	Good	Mostly correct with minor issues
4-6	Fair	Partially correct or vague
1-3	Poor	Incorrect, irrelevant, or harmful

How It Works

For each sampled response, TokenSurf sends the original prompt and the model's response to Gemini Flash-Lite, which scores it on accuracy, completeness, relevance, and helpfulness. Downgraded responses are tracked separately so you can compare quality between your original and routed models.

System Health

Error Rate

0%

Proxy p95

—

Cache Hit

—

Response Cache

—

TTFT p95

—

Tok/sec p50

—

Redis

—

Provider Health

Circuit breaker status per provider. When a provider has repeated failures, the circuit opens and requests fail fast or route to fallback providers.

Response Headers

Every proxy response includes these headers for debugging and observability:

Header	Description	Example
X-TokenSurf-Model	Final model used	gpt-4o-mini
X-TokenSurf-Downgraded	Was the request routed to a cheaper model?	true
X-TokenSurf-Complexity	Classified complexity level	simple
X-TokenSurf-Request-Id	Unique request ID for tracing	a1b2c3d4...
X-TokenSurf-Region	Region that served the request	us-central1
X-TokenSurf-Fallback	Was a fallback provider used?	true

Your Organizations

No organizations yet

No organizations yet. Create one to manage team API keys.

Account

Email

...

User ID

...

API Key Prefix

ts_...

Webhook

Receive a POST request for every routing decision. Feed into Slack, PagerDuty, or your own analytics.

Webhook URL

Leave empty to disable. Must be HTTPS.

Latency & Priority

Control routing based on speed requirements and per-request priority levels.

Max Latency Target

If a provider's p95 latency exceeds this threshold, force downgrade to a faster model. Set to 0 to disable.

Priority Routing

Allow per-request priority override via X-TokenSurf-Priority header. high = never downgrade. low = always downgrade.

Danger Zone

Regenerate API Key

Your old key will stop working immediately

Delete Account

Permanently delete your account, API keys, usage data, and provider keys. This cannot be undone.

Total Cost Savings Generated

$0

Across 0 routed requests

0

Requests

0

Active Users

$0

MRR

0%

Avg Savings

0

Quality Score

Request Volume

Monthly requests Cumulative savings

Cost Impact

Direct API cost $0

$0

Through TokenSurf $0

$0

0%

Average cost reduction

$0 saved

Quality Score Distribution

Downgraded responses maintain high quality

0% of responses scored 7+

Provider Distribution

Requests routed across providers

Routing Intelligence

AI classifier automatically routes simple requests to cheaper models

0%

Downgraded to cheaper models

0%

Kept on premium models

0%

Uptime

Welcome back

This Month's Savings

Quick Actions

Providers

Feature Status

Recent Activity

Cost Breakdown

Cost by Model

Cost by Feature Tag

Request Logs

No logs yet

Response Headers Reference

1. Your API Key

Your New API Key

2. Base URL

3. Add a Provider Key

4. Drop-in Replacement

5. Check Response Headers

6. Provider API Keys

How Routing Works

Subscription Plans

Top Up Credits

Payment History

Prompt Templates

API Playground

Routing Engine

Provider Routing

Model Downgrade Rules

How Classification Works

Actions

Content Routing Rules

Guardrails

Custom Classifier Prompt

Actions

Response Cache

Actions

Model Quality Scores

Score Scale

How It Works

System Health

Provider Health

Response Headers

Your Organizations

No organizations yet

Organization

Members

API Keys

Account

Webhook

Latency & Priority

Danger Zone