Skip to main content
Speech to text in DialNexa is the part of a Voice AI call that decides what the system believes the caller said. If this layer is wrong, every downstream feature inherits the mistake: the model answers the wrong question, functions receive wrong arguments, summaries sound confident but false, and post-call fields become review work. DialNexa transcriber selector showing the Soniox option for the selected agent language. DialNexa call detail page showing summary, transcription tabs, live transcript, and accurate transcript controls.
If the transcript says the caller asked about carrots when they clearly asked about careers, do not rewrite the whole prompt yet. Start with listening.

Transcriber Options In The Dashboard

Display nameProviderBest fitImportant limit
Deepgram Flux (English only)Deepgram flux-general-enEnglish calls. Fast turn boundaries and low latency. Default Deepgram option in the dashboard.English only - the language compatibility filter removes it when a non-English language is selected.
SonioxSonioxHindi-English, Hinglish, multilingual, and Indian-accented calls.Test noisy phone calls and local names before scaling.
The selector can show different options depending on the agent language. Pricing per minute for each option is shown next to the selector in the dashboard. Cascaded agents can also use fallback STT from the transcriber settings popover. The current dashboard selector does not expose AssemblyAI for new primary or fallback choices. For background on the provider itself, see the Deepgram integration catalog page. For how transcription evidence appears after the call, see transcripts, recordings, and summaries.

Deepgram Flux Versus Soniox

Start with Deepgram Flux for English calls - general support calls, outbound scripts, and structured workflows where fast turn boundaries matter. Flux is optimised for English and is the default Deepgram option in the DialNexa dashboard.

What DialNexa Configures Behind The Selector

These details explain the behavior users see in the dashboard and call records.
DialNexa behaviorUser-visible result
Deepgram Flux uses short endpointing for English.English calls feel quicker because the system closes caller turns sooner.
Flux is restricted to English in the builder.Non-English language choices are not a fit for Flux - the selector filters it out automatically.
Soniox receives Hindi and English hints.Hindi-English calls are treated as mixed-language speech rather than one textbook language.
Soniox controls Response Eagerness.The patient-to-eager slider appears only for supported Soniox paths.
Fallback STT runs a backup transcriber in parallel.Users can choose a different fallback transcriber and a fallback wait time in milliseconds.
Transcriber pricing is fetched per selected option.Users can see INR per minute beside transcriber rows when the billing preview is available.

Fallback STT

Fallback STT helps when the primary transcriber is slow or unreliable for a specific caller population. When enabled, DialNexa can use a backup transcriber if it finalizes first and the primary result does not arrive within the configured wait. DialNexa transcriber settings popover showing fallback STT enabled with a fallback transcriber selected and a fallback wait value.
SettingRecommendation
Primary transcriberKeep this as your best-fit provider for the caller language.
Fallback transcriberPick a different provider or model from the primary option.
Fallback waitStart with 500 ms, then tune from real call recordings and transcripts.
Fallback STT applies to cascaded agents. Speech to Speech agents do not use a separate STT provider.

Transcript Types

DialNexa can show more than one transcript view for the same call.
TranscriptWhen to use it
Live transcriptDebug turn-taking, interruptions, and what the agent heard before replying.
Accurate transcriptReview cleaner post-call text when available.
RecordingSettle disputes about what was said, noise, overlap, silence, pronunciation, and phone quality.
Text tells you what the system believed. Audio tells you what happened. When they disagree, trust the recording first.

How To Test Speech To Text

1

Use the same caller script

Compare transcribers with the same greeting, caller answers, interruption, name, city, number, and final outcome.
2

Test names and places

Use real customer names, locality names, company names, product terms, and common abbreviations.
3

Test interruptions

Ask a caller to speak during the greeting, correct themselves mid-answer, pause, and give one-word replies.
4

Test language switching

For Hindi-English calls, include English numbers, Hindi phrases, and mixed casual replies in the same call.
5

Compare transcript with recording

Mark the exact point where the transcript diverges from audio.
6

Review downstream effects

Check function arguments, workflow branches, summaries, and post-call fields. Bad listening quietly becomes bad automation.

How Transcription Affects Integrations

Integrations usually receive data that started as caller speech. If the transcript is wrong, a CRM update, ticket note, WhatsApp message, or spreadsheet row can be wrong too.
Integration resultTranscription riskWhat to inspect first
CRM field update in HubSpot or Salesforce.Name, company, lead score, or intent was misheard.Recording plus accurate transcript.
Ticket note in Zendesk or Intercom.Issue category or callback detail was captured incorrectly.Transcript segment near the caller’s problem statement.
WhatsApp or email follow-up.The agent repeats the wrong date, amount, or next step.Post-call fields and the exact generated message.
Review queue in Google Sheets.Bad extraction looks like a bad lead record.Transcript, summary, and field definitions together.

Common Speech To Text Mistakes

Flux is labeled English only in the dashboard. Use Soniox for mixed-language, Hinglish, and non-English calls.
If the LLM received the wrong caller words, the LLM is not the first problem.
Plivo, SIP trunking, and web calls can produce different audio with the same agent. Compare through the same route when testing transcribers.
Real callers use speaker mode, traffic, low signal, and short answers. Test those before a large campaign does it for you.

Supported Transcribers

Read the reference table for selectable transcribers.

Speech Settings

Tune Response Eagerness and Denoising Mode.

Latency And Turn Taking

Understand how turn boundaries affect response timing.

Transcripts And Recordings

Review call evidence after live calls.