Voice and speech

Voice selection

AssistPulse offers five distinct voices for your agent. Each voice has its own character and tone:

Voice	Character
Ash	Neutral and professional
Ballad	Warm and melodic
Coral	Friendly and conversational
Sage	Calm and authoritative
Verse	Energetic and expressive

The best way to find the right voice is to test each one with a short conversation. Create a test call and listen to how each voice handles your specific instructions.

Speech speed

Control how fast your agent speaks with the speed slider:

0.5× — slow and deliberate, useful for complex or technical information

1.0× — natural speaking pace (default)

2.0× — fast, suitable for brief confirmations

Speed adjustments apply to the AI’s output speech only. Caller speech recognition is not affected.

Audio settings

Input audio format

The format used for capturing the caller’s voice. Options:

pcm16 — uncompressed 16-bit PCM (highest quality, default)
g711_ulaw — compressed telephony format (lower bandwidth)
g711_alaw — alternative telephony compression

For most use cases, the default pcm16 provides the best recognition accuracy.

Output audio format

The format used for the agent’s spoken responses. Same options as input format. Use pcm16 for best voice quality.

Noise reduction

Reduces background noise from the caller’s environment:

Near field — optimized for close-range microphones (headsets, handsets)
Far field — optimized for speakerphones and distant microphones
Off — no noise reduction

Turn detection

Turn detection controls when the agent decides the caller has finished speaking and it’s time to respond.

Setting	Description	Default
Silence duration	How long to wait after the caller stops speaking (ms)	500ms
Prefix padding	Audio buffer before voice activity detection (ms)	300ms
Threshold	Voice activity detection sensitivity (0-1)	0.5

If your agent interrupts callers too often, increase the silence duration to 700-800ms. If it waits too long to respond, decrease it to 300-400ms.

Transcription

When enabled, every call is transcribed in real time and saved to the call record.

Transcription model — the speech-to-text model used (default: whisper-1)

Language — force a specific language for better accuracy, or leave empty for auto-detection

Transcription must be enabled to generate call summaries and sentiment analysis. See call analysis.

Getting started

Get started

Agents

Connections

Telephony

Calls

Billing

Account

Voice selection

Speech speed

Audio settings

Turn detection

Transcription

Getting started

Get started

Agents

Connections

Telephony

Calls

Billing

Account

Documentation Index

​Voice selection

​Speech speed

​Audio settings

​Turn detection

​Transcription

Voice selection

Speech speed

Audio settings

Turn detection

Transcription