Benefits of replication
- Increase availability – if one database goes down, it doesn’t bring down the entire system. Reads can start being sent to the replica.
- Decrease latency – having a replica database geographically closer to a user provides a faster response (e.g CDNs)
- Increase throughput – Having more databases to read from decreases the load on an individual database, thereby making reads faster and increasing overall throughput
How do all the databases store the same data?
Leader-based replication – Writes get sent to the leader (“primary”). After writing to local storage, the leader sends the transactions to the followers (“replicas”) via a replication log. From the client’s perspective, the replicas are read-only.
How long does replication take?
Usually under 1 second, but there are no guarantees. Synchronous replication is when the primary waits for the replicas to confirm that it wrote the transaction because responding back to the client. Asynchronous replication is when the primary doesn’t wait for the replicas to respond.
In practice, having all replicas be synchronous introduces a single-point-of failure. If that replica crashes then the whole system comes to a grinding halt. Usually there is a mix of synchronous and asynchronous replicas. Having one synchronous replicas provides two “sources of truth”. Asynchronous replicas are not sources of truth because of delays during the replication process.
How to add a new replica
- Take snapshot of primary
- Load snapshot onto new replica
- The replica connects to the leader and requests all changes since the snapshot
- When the replica has processed the changes, it is “caught up” and ready to serve
What happens when the leader crashes?
One of the followers needs to be promoted to be the new leader, clients need to be reconfigured to send their writes to the new leader, and the other followers need to start consuming data changes from the new leader. This is called failover.
The leader (or any node) is determined to have failed if it doesn’t respond to a heartbeat / health check within– say, 30 seconds.
When the ex-leader comes back up, it needs to be reconfigured so it knows its no longer the leader, and which node is the new leader.
Eventual consistency is an effect of a system that has numerous “read-only” replicas, and a single (or much smaller amount of) primary database that handles the writes. Clients may get different results when they query the replicas because they may not have caught up yet (replication lag). But given some time, they should eventually catch up.
Alternatives to single-leader replication
A single-leader system is a single point of failure. If the leader goes down, you can’t write to the database. A multi-leader system might be good to use if your application is spread across multiple datacenters. It works in the same way a single-leader setup does– Writes go to the leader and get propagated to the followers (replicas). If writes and reads are all requested in the same datacenter, it usually has better performance because those transactions don’t have to go out to the public Internet.
In a leaderless replication system, like DynamoDB, client read and write requests are sent to multiple databases in parallel. To avoid getting a stale response on read requests, a version number is attached to the response. The newest version is returned.