Skip to main content
Text to speech in DialNexa is what the caller actually hears. It turns the model’s reply into audio through a selected voice, language, voice model, speed, stability, volume, and provider path. A voice can make a correct answer feel helpful, rushed, unclear, or strangely formal. DialNexa voice selector showing voice filters, sample playback, Nexa voice ID copy button, and row language selection. DialNexa voice settings popover showing voice model, speed, stability, and volume controls for a selected voice.
The caller does not hear your provider architecture. They hear a voice saying their name, amount, date, and next step. Test those words.

Choosing A Provider

ProviderUse it whenTest before publishing
ElevenLabsYou want broad voice auditioning, strong speaker personality, and a familiar provider for brand voice work.Flash v2.5 behavior, names, numbers, speed, stability, and long compliance lines.
CartesiaYou want fast streamed speech, clean call audio, and language-aware voice output.Language fit, volume, speed, and repeated short phrases.
SmallestAIYour agent targets Indian callers and you need a natural Indian-accented voice (Hindi, Hinglish, Indian English).Indian names, locality names, and mixed-language phrases.
Sarvam AIYour agent addresses Indian English (en-IN) callers and you want a locally natural voice experience.Indian English pronunciation, numbers, and dates in Indian format.
For provider background, use the ElevenLabs integration catalog page. Cartesia, SmallestAI, and Sarvam AI are selected from the DialNexa voice selector when available in your workspace.
Choose ElevenLabs when the voice personality matters and you want to audition a wider library. In the current dashboard path, ElevenLabs agent versions are standardized on Flash v2.5 (eleven_flash_v2_5) where supported, so treat that as the main model to test.

How The Voice Selector Works

The selector is designed for large voice libraries.
UI controlWhat it does
Provider filterShows all public voices, only ElevenLabs voices, or only Cartesia voices.
Language filterFilters by languages supported by available voices. English is the starting filter in the current modal.
Gender filterNarrows visible rows to male or female where the voice record includes that label.
Accent filterUses provider-specific accent metadata exposed by the voice records.
SearchMatches voice name, provider voice id, internal voice id, voice_ id, and vel_ Nexa voice id.
Sample playbackPlays the sample recording for quick auditioning.
Nexa voice IDLets users copy the vel_ voice id for notes or API-adjacent setup.
Row language selectorPicks the exact language for that voice before applying it to the agent.

Voice Settings In The Popover

SettingWhat users changePractical test
Voice ModelThe synthesis model available for the current voice.Start with the visible model choice for the selected voice. For ElevenLabs, test Flash v2.5 where shown.
Voice SpeedHow quickly the agent speaks.Read dates, amounts, phone numbers, and local names. Faster speech usually fails there first.
Voice StabilityHow much variation the voice has. Lower values can sound more emotional; higher values can sound calmer.Repeat the same line three times and check consistency.
Voice VolumeOutput loudness in the call path.Listen through speaker mode, headphones, and the actual telephony route.
Do not copy provider documentation numbers into DialNexa sliders. Use the UI values and test calls. The dashboard maps provider ranges before saving.

Audio Cache And Repeated Speech

Audio Cache stores synthesized audio for repeated phrases. It works best when the generated text, voice provider, voice id, voice settings, and output format repeat.
Cache-friendly phraseWhy it works
A fixed welcome line.Same text and same voice configuration repeat across calls.
A compliance disclosure.Usually identical and latency-sensitive.
A short confirmation.Repeats often and starts quickly when cached.
Cache-unfriendly phraseWhy it misses
A line with caller name, amount, or time.Variables change the text.
A long model-generated explanation.The model may phrase it differently each turn.
A reply based on fresh API data.External data changes the output.
Audio Cache loves repetition. If every sentence is personalized confetti, cache will politely sit there doing very little.

Where Voice Quality Shows Up Outside The Call

Voice quality is not only a caller comfort issue. It changes whether downstream work is trusted.
Downstream workWhy voice quality mattersHelpful links
Written confirmations.If the caller misheard a date or amount, the follow-up message may look surprising or wrong.Email with Resend, Gmail.
Sales handoff.A confident summary is less useful if the caller struggled to understand the agent.HubSpot, Salesforce.
Support escalation.The recording helps the support owner judge tone, not only the transcript text.Zendesk, Intercom.
Repeated campaigns.Cache-friendly fixed lines can reduce perceived delay on common phrases.Audio Cache monitoring.

Voice Review Checklist

1

Test the first sentence

The welcome line sets trust. Check pace, pronunciation, greeting tone, and whether the voice fits the use case.
2

Test difficult words

Include brand terms, product names, locality names, acronyms, medicine names, plan names, and agent names.
3

Test numbers and dates

Amounts, due dates, order IDs, phone numbers, and appointment slots reveal speech issues quickly.
4

Test interruption recovery

Interrupt the agent during the greeting and check how naturally it resumes.
5

Review recording and transcript together

The transcript shows content. The recording shows delivery.

Supported Voices And Models

Review voice fields and model fields.

Speech Settings

Enable Audio Cache and tune speech behavior.

Multilingual And Hinglish Calls

Match voice, language, and transcriber.

Audio Cache Monitoring

Read cache evidence on the call detail page.