Skip to main content

Available models

AssistPulse uses OpenAI’s Realtime API for voice conversations. The model determines the quality and capability of your agent’s responses.
ModelBest for
gpt-4o-realtimeMost conversations — excellent balance of speed and intelligence
gpt-4o-mini-realtimeHigh-volume, simpler use cases — faster and cheaper
Model availability may change as new models are released. The default model is recommended for most use cases.

Model settings

Temperature

Controls how creative or deterministic the agent’s responses are:
  • 0.0 — highly deterministic, always picks the most likely response
  • 0.5 — balanced (default)
  • 1.0 — more creative and varied responses
For customer service and reception agents, keep the temperature between 0.3 and 0.7. Higher values can lead to inconsistent or unexpected answers.

Max output tokens

Limits the length of each response. Options:
  • Default — no explicit limit (model decides)
  • Custom value — set a specific token limit
Setting the max output tokens too low may cause the agent to cut off mid-sentence. Only adjust this if you need to control response length for cost reasons.

Tool choice

Controls how the agent decides when to use connected tools:
  • Auto — the model decides when a tool is relevant (recommended)
  • Required — forces the model to use a tool on every turn
  • None — disables tool usage entirely

Token consumption

Every conversation consumes tokens:
  • Input tokens — the caller’s speech converted to text, plus instructions and knowledge context
  • Output tokens — the agent’s spoken responses
Token usage is tracked per call and visible in your call history. See token usage for details on limits and top-ups.