What are tokens?
Tokens are the unit of measurement for AI model usage. Every conversation your agent has consumes tokens:- Input tokens — generated from the caller’s speech (converted to text), plus your agent’s instructions, knowledge base context, and tool responses
- Output tokens — the agent’s spoken responses
How tokens are counted
A typical call’s token usage depends on:| Factor | Impact on tokens |
|---|---|
| Call duration | Longer calls = more tokens |
| Instruction length | Longer instructions = more input tokens per turn |
| Knowledge base context | More retrieved chunks = more input tokens |
| Tool usage | Tool responses add input tokens |
| Response verbosity | Longer agent responses = more output tokens |
Viewing usage
Check your token usage in Settings → Tokens:- Current usage — tokens consumed this billing period
- Allowance — your plan’s monthly token limit
- Usage percentage — visual indicator of how much you’ve used
Token top-ups
If you exceed your monthly token allowance, you can purchase top-ups:Top-up tokens don’t expire at the end of the month. They’re consumed after your monthly allowance is used up.
Reducing token consumption
Shorten your instructions
Shorten your instructions
Remove redundant or overly detailed instructions. The AI model is good at inferring behavior from concise guidelines.
Optimize knowledge base
Optimize knowledge base
Remove outdated or irrelevant documents. Fewer, higher-quality chunks mean less context per query.
Use a smaller model
Use a smaller model
If your use case is straightforward, consider switching to
gpt-4o-mini-realtime which uses fewer tokens per response.Limit response length
Limit response length
Set the max output tokens in your agent’s model settings to cap response length.
