OpenAI, Anthropic, Google Gemini, OpenRouter with 300+ models. Unified OpenAI-compatible API format across all providers.
Live
Fallback Chains
If a provider is down, automatically retry with an equivalent model on another provider. Cross-provider model mapping (e.g. gpt-4o to claude-sonnet-4-6).
Live
Retry with Exponential Backoff
Auto-retry on 429/500/502/503 with jitter. Respects Retry-After headers. Up to 2 retries before failing.
Live
Rate Limiting & Abuse Detection
Token bucket rate limiter (60 req/min per key). Automatic abuse detection with throttle and block escalation on anomalous patterns.
Live
Model Quality Scoring
5% of responses auto-scored by Gemini Flash-Lite (1-10 scale). Per-model quality aggregation, downgraded vs original comparison in dashboard.
Live
Teams & Organizations
Create orgs, invite members with roles (owner/admin/member). Labeled API keys with per-key budgets, rate limits, and model allowlists.
Live
System Health & Observability
Real-time health endpoint, circuit breaker per provider, latency percentiles (p50/p95/p99), cache hit rates, structured audit logging.
Live
Multi-Region Deployment
Proxy deployed in US, EU, and Asia (us-central1, europe-west1, asia-northeast1). Redis caching, connection pooling, API key rotation with 24h grace period.
Up NextIn development
Request Logs & Search
Full searchable log of every request in-dashboard: model used, complexity score, latency, tokens, cost, and routing decision.
Content-Based Routing Rules
Custom regex and pattern rules: "if prompt contains code block, never downgrade" or "if system prompt says JSON, keep complex".
Cost Budget Caps
Set daily or monthly hard ceilings on provider spend. Auto-reject after threshold — catch runaway loops before they drain your wallet.
PlannedOn deck
Streaming Metrics (TTFT, tok/s)
Track time-to-first-token and throughput per provider per model. Know which is actually fastest for your workload, not just cheapest.
API Playground
Test any model from the dashboard. Send a prompt, see the response, routing decision, and cost breakdown side-by-side.
Usage Alerts & Anomaly Detection
Alert when spend exceeds $/hour, error rate spikes above 5%, or latency degrades. Catch broken integrations and runaway loops.
Webhooks on Routing Decisions
POST to your endpoint whenever a downgrade happens. Feed into your own analytics, Slack, PagerDuty, or custom dashboards.
ExploringResearch & design
Semantic Caching
Cache identical or embedding-similar prompts. Instant responses, zero provider cost for repeated queries.
PII Redaction & Content Filtering
Auto-strip SSNs, emails, credit cards before forwarding. Compliance without changing your application code.
Prompt Template Library
Store and version reusable system prompts server-side. Change routing behavior without redeploying your app.
Custom Classifier Rules
Upload your own classification prompt or pattern rules. Inject domain-specific knowledge about what's "simple" for your use case.
A/B Testing & Shadow Mode
Send the same request to 2 models, compare quality, return only one. Validate routing decisions with real production data.
Context Window Management
Auto-trim conversation history to fit model context limits. Prevent 400 errors on long conversations without changing your app.
Latency-Based Routing
Route based on latency SLAs, not just cost. "I need responses under 2s — pick the fastest model that meets quality."
Priority Queues
Tag requests as high/low priority. Critical user-facing calls get fast models instantly, background batch jobs wait for cheap ones.
Want to shape what we build?
Your usage data and feedback directly influence our priorities. The features you need most get built first.