Screenshot placeholder: Voice AI call loop
Add a call loop diagram or dashboard screenshot. Suggested alt text: DialNexa Voice AI call loop from caller audio to Deepgram or Soniox transcript, LLM response, ElevenLabs or Cartesia voice output, and call log.
The Runtime Loop
Caller audio arrives
Audio enters through a Plivo number, SIP trunk, web call, batch call, workflow call node, API call, or test call.
The transcriber listens
Deepgram or Soniox converts speech into text. The transcript is later shown in Call History as realtime and, where available, post-call text.
The agent decides
The published agent version supplies prompt, system prompt, language, LLM model, dynamic variables, functions, and safety settings.
Tools may run
Functions and integrations can end a call, book a calendar event, call an API, send a WhatsApp message, send an email, or trigger another configured action.
The voice speaks
ElevenLabs or Cartesia synthesizes the response. Audio Cache can reduce latency for repeated exact phrases.
Provider Work At Each Runtime Layer
| Layer | Provider choices | Why users should care |
|---|---|---|
| Audio route | Plivo, SIP trunking, web call. | Changes number ownership, audio quality, routing, and whether phone network issues are involved. |
| Speech to text | Deepgram or Soniox. | Changes transcript accuracy, language fit, turn timing, and Response Eagerness support. |
| Reasoning | OpenAI, Google, or Groq. | Changes instruction following, latency, structured behavior, and fallback strategy. |
| Text to speech | ElevenLabs or Cartesia. | Changes voice identity, pronunciation, streaming behavior, and cache compatibility. |
| External actions | Wati, Resend, Custom Functions, webhooks. | Changes what the call or workflow can do outside DialNexa. |
Runtime Input Versus Saved Evidence
Do not confuse what starts a call with what proves it happened.| Layer | Examples | Where reviewed |
|---|---|---|
| Input | Recipient number, metadata, dynamic variables, workflow lead variables, selected outbound number. | Batch setup, workflow lead, API request, or test call modal. |
| Agent version | Prompt, model, voice, transcriber, welcome mode, functions, post-call fields, security settings. | Agent builder and version history. |
| Call result | Status, duration, sentiment, end reason, transcript, summary, recording, extracted fields. | Call History and call detail page. |
What Can Change A Reply
The same caller sentence can lead to different behavior when these settings change.Prompt and system prompt
Instruction wording decides goal, boundaries, escalation rules, and acceptable answers.
Dynamic variables
Caller-specific values can change greeting, eligibility, due date, location, or transfer destination.
Functions
Tools add actions the model can call during the conversation.
LLM and temperature
Model choice and temperature affect reasoning style and consistency.
Debug By Layer
Audio sounds bad
Audio sounds bad
Check recording quality, phone path, SIP trunk behavior, web microphone, and Denoising Mode.
Transcript is wrong
Transcript is wrong
Check language, Deepgram or Soniox selection, background noise, and whether the caller spoke over the agent.
Transcript is right but answer is wrong
Transcript is right but answer is wrong
Check prompt, dynamic variables, functions, knowledge source, model family, and temperature.
Answer is right but late
Answer is right but late
Check Response Eagerness, Audio Cache, fallback LLM, function latency, and integration action placement.
Related Reading
Speech To Text
Understand Deepgram and Soniox behavior.
LLM Behavior
Tune reasoning and fallback behavior.
Provider Selection Guide
Choose the complete provider stack.
Text To Speech
Choose ElevenLabs or Cartesia.
Integrations
Understand Wati, Resend, and workflow actions.