Active-active. No master, no failover.

PMG, Hornetsecurity, Mimecast, Proofpoint — they all have either master/replica architectures with failover drama or central cloud back-ends. NetCell MailGuard is the only SMTP gateway that genuinely clusters active-active without a master.

How does it work?

Every node in the cluster is an equal peer. Configuration changes and detection state are synchronised encrypted between all nodes in the background — the operator makes a change in the web UI, every node picks it up automatically. Inbound mail is distributed across nodes via the MX record in DNS; if one fails, the others take over without intervention.

Why not master/replica?

Classical master/replica architectures have three structural problems:

  • Promote complexity. When the master fails, a replica must actively be promoted to master — and every configuration touchpoint has to follow. That doesn't always go smoothly.
  • Split-brain risk. Under a network partition both masters believe they are the only active node — both accept mail, and reconciliation afterwards is manual work.
  • Quorum overhead. Avoiding split-brain typically requires an odd number of nodes — the extra node costs money without adding throughput.

MailGuard solves this through radical symmetry: every node is master, there is no "correct" node you could lose.

What does this mean operationally?

  • Hardware failure → no action needed. The remaining nodes keep serving, routing automatically skips the failed one.
  • Add a node → one command. The new node pulls the configuration from the existing cluster on first start and is productive within a minute.
  • Maintenance reboot → no failover plan. Drain a node from routing, reboot, put it back in. The others process mail in the meantime.
  • Cluster-wide quarantine view. The web UI shows all quarantined mail in one place — the admin sees the full quarantine in a single view, regardless of which node a mail physically resides on.
   ┌──────────────┐    encrypted    ┌──────────────┐
   │  Node 1      │ ◀────sync─────▶ │  Node 2      │
   │  Detection   │                 │  Detection   │
   │  stack       │ ◀──────────────▶│  stack       │
   │  + sandbox   │                 │  + sandbox   │
   └──────┬───────┘                 └──────┬───────┘
          │                                │
          │     ┌──────────────┐           │
          ◀──── │  Node 3      │ ◀─────────┤
          │     │  ...         │           │
          │     └──────────────┘           │
          │                                │
          └──────── MX round-robin ────────┘
                       │
              MX 10 mx1.example.com
              MX 10 mx2.example.com
              MX 10 mx3.example.com
                       │
                       ▼
              Existing mail server

Limit?

None. In practice customers run clusters with 2 to 10 nodes. Replication load scales linearly with the node count; at a typical configuration-change frequency of a handful of updates per hour it is negligible. Very large setups segment geographically — please reach out at the enterprise level.

Want to try a cluster?

Two VMs, two one-liners, cluster running. No quorum, no promote script, no magic.

Start a test