How DialNexa Voice AI Works - DialNexa Documentation

DialNexa Voice AI works by moving each call through a loop: receive audio, transcribe speech, decide the next response with the configured agent version, optionally call tools, synthesize speech, and store the result in Call History.

Screenshot placeholder: Voice AI call loop

Add a call loop diagram or dashboard screenshot. Suggested alt text: DialNexa Voice AI call loop from caller audio to Deepgram or Soniox transcript, LLM response, ElevenLabs or Cartesia voice output, and call log.

If you can follow the loop, you can debug most calls without guessing. Guessing is dramatic, but logs are cheaper.

The Runtime Loop

Caller audio arrives

Audio enters through a Plivo number, SIP trunk, web call, batch call, workflow call node, API call, or test call.

The transcriber listens

Deepgram or Soniox converts speech into text. The transcript is later shown in Call History as realtime and, where available, post-call text.

The agent decides

The published agent version supplies prompt, system prompt, language, LLM model, dynamic variables, functions, and safety settings.

Tools may run

Functions and integrations can end a call, book a calendar event, call an API, send a WhatsApp message, send an email, or trigger another configured action.

The voice speaks

ElevenLabs or Cartesia synthesizes the response. Audio Cache can reduce latency for repeated exact phrases.

Evidence is saved

Call History receives status, summary, transcript, recording URL, post-call analysis, transfer details, Audio Cache data, and metadata.

Provider Work At Each Runtime Layer

Layer	Provider choices	Why users should care
Audio route	Plivo, SIP trunking, web call.	Changes number ownership, audio quality, routing, and whether phone network issues are involved.
Speech to text	Deepgram or Soniox.	Changes transcript accuracy, language fit, turn timing, and Response Eagerness support.
Reasoning	OpenAI, Google, or Groq.	Changes instruction following, latency, structured behavior, and fallback strategy.
Text to speech	ElevenLabs or Cartesia.	Changes voice identity, pronunciation, streaming behavior, and cache compatibility.
External actions	Wati, Resend, Custom Functions, webhooks.	Changes what the call or workflow can do outside DialNexa.

Runtime Input Versus Saved Evidence

Do not confuse what starts a call with what proves it happened.

Layer	Examples	Where reviewed
Input	Recipient number, metadata, dynamic variables, workflow lead variables, selected outbound number.	Batch setup, workflow lead, API request, or test call modal.
Agent version	Prompt, model, voice, transcriber, welcome mode, functions, post-call fields, security settings.	Agent builder and version history.
Call result	Status, duration, sentiment, end reason, transcript, summary, recording, extracted fields.	Call History and call detail page.

What Can Change A Reply

The same caller sentence can lead to different behavior when these settings change.

Prompt and system prompt

Instruction wording decides goal, boundaries, escalation rules, and acceptable answers.

Dynamic variables

Caller-specific values can change greeting, eligibility, due date, location, or transfer destination.

Functions

Tools add actions the model can call during the conversation.

LLM and temperature

Model choice and temperature affect reasoning style and consistency.

Debug By Layer

Audio sounds bad

Check recording quality, phone path, SIP trunk behavior, web microphone, and Denoising Mode.

Transcript is wrong

Check language, Deepgram or Soniox selection, background noise, and whether the caller spoke over the agent.

Transcript is right but answer is wrong

Check prompt, dynamic variables, functions, knowledge source, model family, and temperature.

Answer is right but late

Check Response Eagerness, Audio Cache, fallback LLM, function latency, and integration action placement.

Speech To Text

Understand Deepgram and Soniox behavior.

LLM Behavior

Tune reasoning and fallback behavior.

Provider Selection Guide

Choose the complete provider stack.

Text To Speech

Choose ElevenLabs or Cartesia.

Integrations

Understand Wati, Resend, and workflow actions.

Screenshot placeholder: Voice AI call loop

​The Runtime Loop

​Provider Work At Each Runtime Layer

​Runtime Input Versus Saved Evidence

​What Can Change A Reply

Prompt and system prompt

Dynamic variables

Functions

LLM and temperature

​Debug By Layer

​Related Reading

Speech To Text

LLM Behavior

Provider Selection Guide

Text To Speech

Integrations

The Runtime Loop

Provider Work At Each Runtime Layer

Runtime Input Versus Saved Evidence

What Can Change A Reply

Debug By Layer

Related Reading