Quick Actions
Providers
Feature Status
Recent Activity
No requests yet. Try the Playground to send your first request.
Cost Breakdown
Routing intelligence saves you money by sending simple queries to cheaper models.
| Metric | Value |
|---|---|
| Total Requests | 0 |
| Prompt Tokens | 0 |
| Completion Tokens | 0 |
| Original Cost | $0.0000 |
| Actual Cost | $0.0000 |
| Total Savings | $0.0000 |
| Auto-Routed Requests | 0 |
| Downgrade Rate | 0% |
Cost by Model
Breakdown of spend and savings per model this month.
Model breakdown will appear after your first requests.
Cost by Feature Tag
Tag requests with X-TokenSurf-Tag: checkout to track spend per feature, team, or environment.
Add X-TokenSurf-Tag headers to your requests to see cost breakdowns by feature.
Request Logs
Last 50 API requests with routing decisions, cost, and latency.
No logs yet
Make your first API request to see logs here. Logs appear within 60 seconds.
Response Headers Reference
Every proxy response includes these headers for debugging:
| Header | Description | Example |
|---|---|---|
| X-TokenSurf-Model | Final model used | gpt-4o-mini |
| X-TokenSurf-Downgraded | Was the request routed to a cheaper model? | true |
| X-TokenSurf-Complexity | Classified complexity level | simple |
1. Your API Key
Use this as your api_key in the OpenAI SDK.
ts_...2. Base URL
Point your OpenAI SDK to this base URL:
3. Add a Provider Key
Scroll down to the Provider API Keys section below and add at least one API key (OpenAI, Anthropic, Google, or OpenRouter).
4. Drop-in Replacement
Replace your base URL. That's it. Same SDK, same code, lower costs.
5. Check Response Headers
6. Provider API Keys
Add your provider keys. They're encrypted with AES-256-GCM at rest. TokenSurf never stores plaintext keys.
How Routing Works
TokenSurf analyzes each request's complexity and routes it to the cheapest compatible model:
| If you request | Simple queries route to | You save |
|---|---|---|
| gpt-4o | gpt-4o-mini | ~90% |
| claude-sonnet-4-6 | claude-haiku-4-5 | ~85% |
| gemini-2.5-pro | gemini-2.0-flash | ~80% |
| gpt-4-turbo | gpt-4o-mini | ~95% |
Subscription Plans
Commit to volume, pay less per request. All plans include AI-powered routing.
Top Up Credits
Pay as you go, or add overage credits for your plan. Credits never expire.
Payment History
No payments yet.
Prompt Templates
Store reusable system prompts server-side. Reference in requests with X-TokenSurf-Template: template_id to auto-inject the prompt. Change routing behavior without redeploying your app.
No templates yet. Create one to store reusable system prompts.
API Playground
Test any model with your API key. See the response, routing decision, and cost breakdown.
Routing Engine
Control exactly how TokenSurf routes your requests. Changes take effect immediately. View model catalog
Provider Routing
Enable or disable routing to each provider without removing your API keys. Disabled providers will reject requests for their models.
Model Downgrade Rules
Fine-tune which models get downgraded and where they route to. Only models with a default downgrade target are shown.
| Model | Downgrade | Routes to | Savings |
|---|
How Classification Works
Understanding the routing pipeline helps you tune it for your workload.
| Signal | Classified as | What triggers it |
|---|---|---|
| Tools / function calling | Complex | Any request with tools parameter |
| Structured output | Complex | Any request with response_format |
| Code patterns | Complex | "analyze", "implement", "refactor", code blocks |
| Long conversation | Complex | 6+ messages in the conversation |
| Long message | Complex | 500+ estimated tokens in last user message |
| Factual question | Simple | "What is", "Define", "Translate", "Calculate" |
| Very short query | Simple | Under 50 tokens, 1-2 messages |
| Everything else | Ambiguous | Handled by AI classifier or your fallback |
Actions
Content Routing Rules
Override the classifier based on prompt content. Rules are evaluated in order — first match wins. Use regex patterns.
No content rules yet. Add rules like "if prompt contains code blocks, never downgrade."
Guardrails
Protect sensitive data before it leaves your system. PII is redacted from messages before forwarding to providers.
Custom Classifier Prompt
Override the default AI classifier with your own prompt. Must instruct the model to respond with SIMPLE or COMPLEX. Leave empty to use the default classifier.
Actions
Response Cache
Cache identical requests and return instant responses. Eliminates provider API calls entirely for repeated queries — zero cost, zero latency.
Actions
Model Quality Scores
Auto-sampled5% of API responses are automatically scored for quality using Gemini Flash-Lite. Scores help you verify that downgraded models still meet your quality bar.
Quality scores will appear after your first requests are sampled.
Score Scale
| Score | Rating | Meaning |
|---|---|---|
| 9-10 | Excellent | Comprehensive, accurate, well-structured response |
| 7-8 | Good | Mostly correct with minor issues |
| 4-6 | Fair | Partially correct or vague |
| 1-3 | Poor | Incorrect, irrelevant, or harmful |
How It Works
For each sampled response, TokenSurf sends the original prompt and the model's response to Gemini Flash-Lite, which scores it on accuracy, completeness, relevance, and helpfulness. Downgraded responses are tracked separately so you can compare quality between your original and routed models.
System Health
Provider Health
Circuit breaker status per provider. When a provider has repeated failures, the circuit opens and requests fail fast or route to fallback providers.
Response Headers
Every proxy response includes these headers for debugging and observability:
| Header | Description | Example |
|---|---|---|
| X-TokenSurf-Model | Final model used | gpt-4o-mini |
| X-TokenSurf-Downgraded | Was the request routed to a cheaper model? | true |
| X-TokenSurf-Complexity | Classified complexity level | simple |
| X-TokenSurf-Request-Id | Unique request ID for tracing | a1b2c3d4... |
| X-TokenSurf-Region | Region that served the request | us-central1 |
| X-TokenSurf-Fallback | Was a fallback provider used? | true |
Your Organizations
No organizations yet
No organizations yet. Create one to manage team API keys.
Account
Webhook
Receive a POST request for every routing decision. Feed into Slack, PagerDuty, or your own analytics.
Latency & Priority
Control routing based on speed requirements and per-request priority levels.
X-TokenSurf-Priority header. high = never downgrade. low = always downgrade.