Provider Status
Recent Activity
No requests yet. Make your first API call to see activity here.
Cost Breakdown
Routing intelligence saves you money by sending simple queries to cheaper models.
| Metric | Value |
|---|---|
| Total Requests | 0 |
| Prompt Tokens | 0 |
| Completion Tokens | 0 |
| Original Cost | $0.0000 |
| Actual Cost | $0.0000 |
| Total Savings | $0.0000 |
| Auto-Routed Requests | 0 |
| Downgrade Rate | 0% |
Request Logs
Recent API requests with routing decisions. Logs are available via response headers on each request.
No logs yet
Make your first API request to see logs here. Each response includes X-TokenSurf-Model and X-TokenSurf-Downgraded headers.
Response Headers Reference
Every proxy response includes these headers for debugging:
| Header | Description | Example |
|---|---|---|
| X-TokenSurf-Model | Final model used | gpt-4o-mini |
| X-TokenSurf-Downgraded | Was the request routed to a cheaper model? | true |
| X-TokenSurf-Complexity | Classified complexity level | simple |
1. Your API Key
Use this as your api_key in the OpenAI SDK.
ts_...2. Base URL
Point your OpenAI SDK to this base URL:
3. Add a Provider Key
Go to Providers and add at least one API key (OpenAI, Anthropic, Google, or OpenRouter).
4. Drop-in Replacement
Replace your base URL. That's it. Same SDK, same code, lower costs.
5. Check Response Headers
Provider API Keys
Add your provider keys. They're encrypted with AES-256-GCM at rest. TokenSurf never stores plaintext keys.
How Routing Works
TokenSurf analyzes each request's complexity and routes it to the cheapest compatible model:
| If you request | Simple queries route to | You save |
|---|---|---|
| gpt-4o | gpt-4o-mini | ~90% |
| claude-sonnet-4-6 | claude-haiku-4-5 | ~85% |
| gemini-2.5-pro | gemini-2.0-flash | ~80% |
| gpt-4-turbo | gpt-4o-mini | ~95% |
Top Up Credits
Pay as you go. Credits never expire. Powered by Stripe.
Pricing
Simple, transparent pricing. You only pay for API requests, not for tokens.
| Item | Price | Notes |
|---|---|---|
| API Request | 1 credit | Regardless of tokens or model |
| Credit Cost | $0.001 | $1 = 1,000 requests |
| Free Tier | 1,000 credits | On signup, no card required |
| Provider Costs | Your keys | You pay providers directly via your own keys |
Routing Engine
Control exactly how TokenSurf routes your requests. Changes take effect immediately. View model catalog
Provider Routing
Enable or disable routing to each provider without removing your API keys. Disabled providers will reject requests for their models.
Model Downgrade Rules
Fine-tune which models get downgraded and where they route to. Only models with a default downgrade target are shown.
| Model | Downgrade | Routes to | Savings |
|---|
How Classification Works
Understanding the routing pipeline helps you tune it for your workload.
| Signal | Classified as | What triggers it |
|---|---|---|
| Tools / function calling | Complex | Any request with tools parameter |
| Structured output | Complex | Any request with response_format |
| Code patterns | Complex | "analyze", "implement", "refactor", code blocks |
| Long conversation | Complex | 6+ messages in the conversation |
| Long message | Complex | 500+ estimated tokens in last user message |
| Factual question | Simple | "What is", "Define", "Translate", "Calculate" |
| Very short query | Simple | Under 50 tokens, 1-2 messages |
| Everything else | Ambiguous | Handled by AI classifier or your fallback |
Actions
Model Quality Scores
Auto-sampled5% of API responses are automatically scored for quality using Gemini Flash-Lite. Scores help you verify that downgraded models still meet your quality bar.
Quality scores will appear after your first requests are sampled.
Score Scale
| Score | Rating | Meaning |
|---|---|---|
| 9-10 | Excellent | Comprehensive, accurate, well-structured response |
| 7-8 | Good | Mostly correct with minor issues |
| 4-6 | Fair | Partially correct or vague |
| 1-3 | Poor | Incorrect, irrelevant, or harmful |
How It Works
For each sampled response, TokenSurf sends the original prompt and the model's response to Gemini Flash-Lite, which scores it on accuracy, completeness, relevance, and helpfulness. Downgraded responses are tracked separately so you can compare quality between your original and routed models.
System Health
Provider Health
Circuit breaker status per provider. When a provider has repeated failures, the circuit opens and requests fail fast or route to fallback providers.
Response Headers
Every proxy response includes these headers for debugging and observability:
| Header | Description | Example |
|---|---|---|
| X-TokenSurf-Model | Final model used | gpt-4o-mini |
| X-TokenSurf-Downgraded | Was the request routed to a cheaper model? | true |
| X-TokenSurf-Complexity | Classified complexity level | simple |
| X-TokenSurf-Request-Id | Unique request ID for tracing | a1b2c3d4... |
| X-TokenSurf-Region | Region that served the request | us-central1 |
| X-TokenSurf-Fallback | Was a fallback provider used? | true |
Account
Routing
Full routing controls are in the Routing Engine page.