Skip to main content

Voice selection

AssistPulse offers five distinct voices for your agent. Each voice has its own character and tone:
VoiceCharacter
AshNeutral and professional
BalladWarm and melodic
CoralFriendly and conversational
SageCalm and authoritative
VerseEnergetic and expressive
The best way to find the right voice is to test each one with a short conversation. Create a test call and listen to how each voice handles your specific instructions.

Speech speed

Control how fast your agent speaks with the speed slider:
  • 0.5× — slow and deliberate, useful for complex or technical information
  • 1.0× — natural speaking pace (default)
  • 2.0× — fast, suitable for brief confirmations
Speed adjustments apply to the AI’s output speech only. Caller speech recognition is not affected.

Audio settings

The format used for capturing the caller’s voice. Options:
  • pcm16 — uncompressed 16-bit PCM (highest quality, default)
  • g711_ulaw — compressed telephony format (lower bandwidth)
  • g711_alaw — alternative telephony compression
For most use cases, the default pcm16 provides the best recognition accuracy.
The format used for the agent’s spoken responses. Same options as input format. Use pcm16 for best voice quality.
Reduces background noise from the caller’s environment:
  • Near field — optimized for close-range microphones (headsets, handsets)
  • Far field — optimized for speakerphones and distant microphones
  • Off — no noise reduction

Turn detection

Turn detection controls when the agent decides the caller has finished speaking and it’s time to respond.
SettingDescriptionDefault
Silence durationHow long to wait after the caller stops speaking (ms)500ms
Prefix paddingAudio buffer before voice activity detection (ms)300ms
ThresholdVoice activity detection sensitivity (0-1)0.5
If your agent interrupts callers too often, increase the silence duration to 700-800ms. If it waits too long to respond, decrease it to 300-400ms.

Transcription

When enabled, every call is transcribed in real time and saved to the call record.
  • Transcription model — the speech-to-text model used (default: whisper-1)
  • Language — force a specific language for better accuracy, or leave empty for auto-detection
Transcription must be enabled to generate call summaries and sentiment analysis. See call analysis.