Vicidial Multi-Server Migration for a 2,000-Agent Outbound BPO
Vicidial multi-server migration for this 2,000-agent BPO started with one limit — 350 was the cap on the old single-server setup. A single-server Vicidial deployment that maxed out at 350 concurrent agents — and what it took to grow it to 2,000.
The Vicidial Multi-Server Migration Challenge
The client was running everything on a single 16-core, 128 GB VM — Vicidial GUI, Asterisk, MySQL, recordings storage, and the dialer cron processes all on one box. It had been “good enough” for two years. Three things compounded:
- MySQL contention.
vicidial_loghad crossed 180 million rows; reporting queries from supervisors were locking the dialer cycle for 4–8 seconds at a time. - Asterisk channel exhaustion. At ~350 simultaneous calls the box hit 90% CPU; the dialer started missing its 6-second target cycle.
- Single point of failure. Any maintenance meant business downtime — the client had been deferring kernel patches for nine months.
Our Vicidial Multi-Server Migration Approach
We could not take production offline for a clean rebuild — the client was running 18-hour campaigns across two shifts. The migration had to be incremental: stand up the new cluster alongside, drain campaigns one at a time, decommission the old box. We budgeted 6 weeks; we used 8.
The architecture we settled on followed the pattern we documented in our Vicidial multi-server architecture guide: separate web, dialer, MySQL, and recordings tiers, with an OpenSIPS SBC pair handling SIP perimeter.
What We Built in This Vicidial Multi-Server Migration
- 1 web/admin server — LiteSpeed + the Vicidial GUI, no Asterisk.
- 3 dialer servers — Asterisk + the Perl dialer cron, sized for ~700 concurrent calls each.
- 1 MySQL primary + 1 read replica — primary for the dialer, replica dedicated to reporting and the supervisor dashboards.
- 1 recordings server — NFS, with hourly sync to S3-compatible cold storage.
- 2 OpenSIPS SBCs — active-active, anti-toll-fraud rules, TLS termination for the agent softphones. Drawing on our OpenSIPS 3.6 scaling work.
The cutover for each campaign was straightforward in retrospect: move the campaign to the new dialer servers in vicidial_servers, watch one full day of metrics, then drain the next. Five campaigns, four cutover windows.
Vicidial Multi-Server Migration Outcomes
- Concurrent agents: grew from 350 cap to 2,000 within 90 days post-migration.
- Drop rate: 11% → 2.1% on the largest campaign.
- Uptime: 99.7% rolling over the trailing 8 months. The 0.3% was two scheduled OS upgrades, no unplanned outages.
- Hopper performance: dialer cycle returned to consistent <6 s once MySQL was on its own box.
- MTTR for capacity issues: dropped because we can now add a dialer node in ~90 minutes from spare hardware.
What We'd Do Differently
Two things, honestly:
- SIP trunk re-registration took longer than scoped. Their two upstream carriers needed manual whitelisting for the new SBC IPs and one of them had a 5-business-day SLA. We should have started the carrier paperwork in week 1, not week 4. That cost us ~10 days of buffer.
- We migrated MySQL data at the wrong layer initially. Our first attempt was a
mysqldump+ restore which would have meant a 4-hour write freeze. We re-did the cutover with Percona XtraBackup + binlog replay; that worked but it added a week of testing.
Stack & Tools
- VICIbox 11 (Asterisk 18 LTS)
- MySQL 8.0 with GTID-based async replication
- OpenSIPS 3.6 (HA pair, custom Lua scripts for fraud filtering)
- HAProxy fronting the agent GUI
- Prometheus + Grafana for metrics; Asterisk channel exporter
- Percona XtraBackup for MySQL cutover
FAQ
How long was the actual cutover window for each campaign?
Roughly 30–45 minutes per campaign, scheduled at the natural end of the calling window for that campaign. No agent was logged in during the cutover; we re-launched the campaign on the new dialer once metrics stabilised.
Did you keep the old single server as a fallback?
Yes — for 30 days after the last campaign moved. We wrote a tested rollback runbook for each campaign, just in case. We never used it, but the option mattered to the operations team.
What broke that you did not expect?
Two things. First: a custom shell script the client had to push call lists into vicidial_list assumed a specific MySQL host. We caught it in dry-run; easy fix. Second: agent recording playback in the supervisor dashboard hit the dialer node directly via an old Apache config — needed an X-Forwarded-For aware reverse proxy rule on the new web server.
Is this architecture overkill for a smaller operation?
For under 500 simultaneous agents, a 2-node setup (web+dialer combined, MySQL separate) is usually enough. The 5-node design we deployed makes sense once you cross 800–1,000 agents or need true HA on the SIP perimeter.
Why a Vicidial Multi-Server Migration Compounds Over Time
A Vicidial multi-server migration done well pays back operationally before financially. The Vicidial multi-server migration we delivered for this 2,000-agent BPO unlocked patch cadence, predictable hopper performance, and disaster-recovery posture — none of which existed on the single-server setup. A Vicidial multi-server migration is not just a scaling exercise; it is a baseline for the next decade of dialer evolution. The clients we worked with on Vicidial multi-server migration in 2024 are still running the same architecture in 2026, with capacity added by node, not by emergency.
Further reading: VICIdial.org open-source project.
Need help on something like this? VITI Security works with operators, BPOs, and SMBs across India and abroad.
Talk to Our Vicidial Team