Voice AI For Real Customer Calls
Start with the kind of call you want to run. Provider settings make sense only after the user outcome is clear.| If your user needs to | The agent must be good at | Read next |
|---|---|---|
| Book appointments. | Hearing dates and times, calling the calendar or booking API, confirming the slot, and ending politely. | Functions, custom functions, and Google Calendar. |
| Qualify leads. | Asking short questions, scoring answers, writing structured fields, and pushing context to the sales team. | Post-call analysis, HubSpot, and Salesforce. |
| Send reminders. | Reading a fixed script, handling objections, retrying later, and sending written follow-up. | Batch calls, WhatsApp with Wati, and email with Resend. |
| Handle support intake. | Understanding messy caller speech, summarizing the issue, and routing the case to a team. | Transcripts and recordings, Zendesk, and Intercom. |
| Confirm orders or payments. | Speaking amounts clearly, verifying status, and sending the right confirmation. | Text to speech and voices, Shopify, and Stripe. |
The Call Loop You Debug In Production
Every live call moves through the same five layers. When a call fails, identify the layer before changing settings. Changing five things at once is how teams accidentally fix nothing and learn less.The call enters through a route
The route can be a Plivo number, a SIP trunk linked number, a web call, a batch call, a workflow call node, or a one-off test call. The route decides how audio reaches DialNexa and which published agent version should answer.
The transcriber hears the caller
Deepgram or Soniox turns caller audio into text in the current dashboard selector. The selected language and transcriber affect turn boundaries, mixed-language recognition, and whether a caller pause becomes the end of a turn.
The LLM chooses the next move
OpenAI, Google, or Groq receives the prompt, conversation history, variables, functions, knowledge context, and any previous turns. It returns the next spoken response or an action.
The voice provider speaks
ElevenLabs, Cartesia, SmallestAI, or Sarvam AI converts the response into audio. Voice, voice model, speed, stability, volume, language fit, and Audio Cache decide how the agent sounds to the caller.
Provider Choices Users Actually Make
| Layer | Dashboard choice | Best first question |
|---|---|---|
| Speech to text | Deepgram Flux (English only) and Soniox in the current dashboard selector. | Did the system hear the caller correctly? |
| LLM | OpenAI, Google, Groq, depending on workspace access. | Did the model receive correct text and still choose the wrong response? |
| Text to speech | ElevenLabs, Cartesia, SmallestAI, Sarvam AI. | Did the response content make sense but sound wrong, slow, too fast, or mispronounced? |
| Telephony | Plivo number, SIP trunk, web call. | Did the same agent behave differently across phone path, SIP path, or browser path? |
| Evidence | Call History, call detail tabs, exports, webhooks. | Which artifact proves what happened? |
When Integrations Enter The Picture
Voice AI becomes useful when the call result moves somewhere your team already works. An integration should answer a simple question: what should happen after the caller gives useful information?| Caller outcome | Good next action | Relevant integration docs |
|---|---|---|
| Caller asks for an appointment. | Create or update a calendar event, then send a confirmation. | Using integrations in agents, Google Calendar, Gmail. |
| Lead is qualified. | Add context to the CRM and alert the owner. | Integration functions, HubSpot, Salesforce, Slack. |
| Support case needs follow-up. | Create or enrich a ticket with transcript and summary context. | Zendesk, Intercom, Google Sheets. |
| Campaign needs written confirmation. | Send a WhatsApp or email message after the call branch completes. | WhatsApp with Wati, email with Resend, Resend. |
Do not connect an integration just because it exists. Connect it when the caller has given enough information for the action to be safe.
What A Published Agent Carries
DialNexa does not publish only a prompt. A published agent version carries the full call configuration that controls what the caller hears and what your team reviews later.| Configuration | Where users see it | Runtime effect |
|---|---|---|
| Language | Voice selector language choices or language selector, depending on dashboard layout. | Controls language fit, voice options, transcriber compatibility, and Hinglish Map visibility. |
| Transcriber | Agent builder transcriber selector. | Chooses Deepgram or Soniox and the model used for live speech recognition. |
| LLM model | Agent builder model selector and model settings popover. | Controls reasoning, function calling, fallback behavior, and per-minute model cost preview. |
| Voice and voice model | Voice selector and voice settings popover. | Controls speaker identity, output model, speed, stability, volume, and Audio Cache key behavior. |
| Call settings | Agent Settings. | Controls silence timeout, duration, voicemail, keypad detection, call ending, and transfer behavior. |
| Post-call fields | Agent Settings. | Defines the structured fields users expect after the call. |
Debug By Symptom, Not By Guesswork
- Caller was misheard
- Agent replied badly
- Agent sounded wrong
- Call felt slow
- Report looks wrong
Start with the recording and transcript. Check transcriber, language, phone path, noise, caller accent, and whether Flux was used only for English calls.
The First Pages To Read
Choose Voice AI Providers
Pick Deepgram, Soniox, OpenAI, Google, Groq, ElevenLabs, and Cartesia based on caller language, latency, cost, and voice quality.
Speech To Text
Understand what Deepgram and Soniox change in live calls.
LLMs And Conversation Behavior
Tune model selection, temperature, fallback, and predictive preprocessing.
Text To Speech And Voices
Choose and tune ElevenLabs, Cartesia, SmallestAI, or Sarvam AI voices.
Speech Settings
Tune Response Eagerness, Audio Cache, Denoising Mode, and Hinglish Map.
Call Detail Page
Read the evidence after real calls.
Dashboard Integrations
Move call outcomes into WhatsApp, email, workflows, and external tools.
Integration Catalog
Browse business systems and provider pages that can support voice workflows.
A Practical First Test
Run one controlled test before trusting any provider choice.Create one short test script
Include the welcome line, one caller interruption, one name, one city, one amount, and one final outcome.
Use one published draft stack
Pick language, transcriber, LLM, voice, and phone path. Publish the version you want to test.
Score the evidence
Mark transcript accuracy, first response delay, pronunciation, function calls, summary, and post-call fields.