Answers>Learn about reliability & uptime>What is failover?
What is failover?
// Tags
failoverautomatic failover
TL;DR: Failover is the automatic process of switching from a failed component to a healthy backup without disrupting service. In blockchain infrastructure, failover ensures that when a node, server, or data center goes down, your application's RPC requests seamlessly reroute to another working endpoint.
The Basic Concept
Imagine you're driving on a highway and the main lane is suddenly blocked. If there's no alternate route, you're stuck. But if the highway system automatically redirects you to a parallel lane without you even noticing, that's failover in action.
In infrastructure terms, failover is the mechanism that detects when something breaks and immediately shifts traffic to a backup. The goal is zero (or near zero) interruption. Your application keeps making requests, receiving responses, and functioning normally, even though the component serving those requests just changed behind the scenes.
How Failover Works in Blockchain Infrastructure
Blockchain infrastructure has a specific set of components that can fail. The most common are individual nodes, entire regions or data centers, and specific blockchain client implementations. A good failover system handles all three.
At the node level, failover is relatively straightforward. A load balancer continuously monitors the health of every node in the pool, checking response times, error rates, and block height. If a node starts returning errors or falls behind on sync, the load balancer stops sending it traffic and distributes requests across the remaining healthy nodes.
At the regional level, things get more interesting. If an entire data center or cloud availability zone experiences an outage, DNS based failover kicks in. Requests that would normally route to that region are automatically redirected to the next nearest healthy region. This is why geographic distribution matters. A provider running nodes in only one region has no fallback when that region goes down.
At the client level, some providers run multiple blockchain client implementations (for example, both Geth and Erigon for Ethereum). If one client implementation has a bug or performance issue, traffic can shift to nodes running the alternative client. This layer of redundancy is less common but increasingly important.
Active vs Passive Failover
There are two broad approaches to failover, and the distinction matters for performance.
In active active configurations, all backup components are already running and handling traffic. When one fails, the others simply absorb its share of the load. There's no "cold start" delay because everything is already warm. This is the preferred model for blockchain infrastructure because nodes need to stay synced with the chain. A node that isn't actively processing blocks will be behind on block height and useless until it catches up.
In active passive configurations, backup components sit idle until needed. When the primary fails, the passive backup activates and takes over. This is cheaper to operate but introduces a delay. For blockchain nodes, this delay can be significant because the passive node may need to sync blocks before it can serve accurate data.
Most production blockchain infrastructure uses active active failover for exactly this reason. You can't afford to wait for a cold node to catch up when your DeFi protocol needs sub second data accuracy.
The table below summarizes how the two failover models compare for blockchain workloads:
Aspect
Active-Active
Active-Passive
Backup state
All nodes running and serving traffic
Standby nodes idle until needed
Failover delay
Near zero, no cold start
Noticeable, backup must warm up and sync
Operating cost
Higher, you pay for full capacity
Lower, standby uses fewer resources
Block height sync
Always current
May lag and need to catch up
Best for
Production RPC and real time data
Cost sensitive or non critical workloads
Why Failover Is Non Negotiable for Blockchain Apps
Traditional web applications can often tolerate brief interruptions. A user refreshes the page and everything works again. Blockchain applications don't have that luxury for several reasons.
Transactions are time sensitive. If your application submits a transaction and the RPC endpoint fails mid request, that transaction may be lost or submitted twice. Stale data is dangerous. A node that's behind on block height returns outdated information. If your application reads a stale balance and acts on it, the consequences can be financial. Events are irreversible. Unlike traditional systems where you can retry or roll back, on chain actions are permanent. Sending funds based on bad data from a failed node isn't something you can undo.
This is why failover isn't a nice to have feature. It's a fundamental requirement for any application that interacts with a blockchain in production.
How Quicknode Handles Failover
Quicknode's infrastructure uses active active failover across 14+ regions and multiple cloud providers. Every request is routed through a global load balancing layer that continuously evaluates node health, including block height sync status, response latency, and error rates. If any node or region degrades, traffic automatically shifts to the next best option without any action required from the developer.
For teams that need the strongest guarantees, dedicated clusters provide isolated infrastructure with custom failover configurations and guaranteed uptime SLAs.
What is the difference between failover and load balancing?
Failover and load balancing are closely related but solve different problems. Load balancing spreads incoming requests across a pool of healthy nodes so no single node is overwhelmed. Failover is what happens when one of those nodes stops being healthy: traffic is pulled from the failed node and redistributed to the rest. In practice the two work together, and a strong high availability setup needs both running side by side.
Capability
Load Balancing
Failover
Primary job
Distribute traffic evenly
Reroute traffic away from failures
When it acts
Continuously, on every request
Only when a component degrades or fails
Main goal
Performance and even utilization
Continuity and uptime
Trigger
Request volume
Health check failure
What counts as a single point of failure?
A single point of failure is any component that takes the whole system down when it breaks. A lone node, a single cloud region, or one client implementation can each become a single point of failure. Failover removes these weak links by always keeping a healthy alternative ready to take over. Reviewing the common blockchain failure modes is the fastest way to find the single points of failure hiding in your stack.
How do you measure whether failover is working?
You measure failover with a few concrete signals. Uptime shows how often the service was reachable, time to recover shows how long traffic took to shift after a failure, and error rate during an incident shows how many requests were affected before the switch completed. Continuous monitoring of your blockchain infrastructure is what surfaces these numbers so you can confirm failover is doing its job.
How can you test failover before a real outage?
The only way to trust failover is to rehearse it. Teams deliberately take a node or region offline in a controlled window and confirm that traffic reroutes without errors, an approach often called chaos testing. If you run your own nodes, you can stop a process and watch where requests go. If you rely on a managed provider such as the Quicknode Core API, failover is handled for you, though you can still validate it with cross region load tests.
Frequently Asked Questions
What is failover in simple terms?
Failover is the automatic switch from a broken component to a working backup so a service keeps running. Most of the time the user never notices the switch happened.
Is failover the same as redundancy?
No. Redundancy means having spare components ready, while failover is the process that moves traffic to those spares when something fails. You need infrastructure redundancy in place for failover to work.
How long does failover take?
With active-active systems, failover is often near instant because the backups are already serving traffic. With active-passive systems it can take from a few seconds to several minutes while the standby warms up and syncs.
Does failover prevent all downtime?
Failover dramatically reduces downtime but cannot guarantee zero interruption in every situation. Well designed active-active infrastructure spread across many regions gets very close, which is why it is the standard for production blockchain apps.
Why does failover matter so much for blockchain apps?
Blockchain actions are time sensitive and often irreversible, so a failed endpoint can drop a transaction or return stale data. Automatic failover protects both reliability and user funds, which is critical for services built on real time data streams.