One line of code. No quality loss. No architectural changes. Just smarter routing.
Wolfcast.ai is a prediction market platform where users place forecasts and compete on accuracy. To help users navigate complex financial data, Wolfcast built Luna — an AI assistant powered by Google Gemini that answers everything from basic questions to deep analytical queries about market movements.
Luna handles thousands of queries daily across a spectrum of complexity: simple factual questions, platform how-tos, and sophisticated multi-turn conversations involving market context, trend analysis, and prediction comparisons.
Wolfcast routed all Luna queries to Google's premium Gemini models. Great for quality. Terrible for the budget. As user counts grew, AI became the largest operational expense — and most queries didn't need a premium model.
"What is a prediction market?" doesn't need gemini-2.5-pro. But it was being routed there anyway, at $1.25/MTok input.
At 200 msgs/day/user, costs grew linearly. No easy way to separate simple from complex without building a custom classifier.
Wolfcast evaluated building their own query classifier. Estimated timeline: 2–3 months of engineering, plus ongoing maintenance.
Luna's value is accurate, helpful answers. Any cost optimization had to preserve response quality for complex analytical queries.
TokenSurf sits between Wolfcast and Google Gemini. It classifies each query's complexity in real-time and routes simple questions to cheaper models — while passing complex analytical queries straight through to the premium model, untouched.
const response = await fetch( 'https://generativelanguage .googleapis.com/v1beta/ models/gemini-2.5-pro :generateContent', { method: 'POST', body: JSON.stringify(query), headers: { 'Authorization': apiKey } } );
const response = await fetch( 'https://api.tokensurf.io /v1/chat/completions', { method: 'POST', body: JSON.stringify(query), headers: { 'Authorization': tsKey } } );
Three steps: create a TokenSurf account, store the key with defineSecret('TOKENSURF_API_KEY'), change the URL.
| Model Downgrade | Cost Before | Cost After | Savings |
|---|---|---|---|
| gemini-2.5-pro → gemini-2.5-flash | $1.25 / $10.00 | $0.30 / $2.50 | 76% |
| gemini-3.1-pro → gemini-2.5-flash | $2.00 / $12.00 | $0.30 / $2.50 | 84% |
TokenSurf's classification engine runs in real-time on every request. Here's how it handles typical Luna traffic:
Short factual question. Matches simple patterns. Under 50 tokens.
Analytical request with market context. Matches complex patterns. 10K+ token system prompt.
Sign up in 30 seconds. Get 1,000 free credits. No credit card required.