You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clusters can now be configured to automatically enable streaming
replication from a remote primary.
- The `spec.standby` section of the postgrescluster spec allows users to
define a `host` and `port` that point to a remote primary
- The `repoName` field is now optional
- Certificate auth is required when connecting to the primary. Users
must configure custom tls certs on the standby that allow this
authentication method
- Replication user will be the default `_crunchyrepl` user
- A cluster will not be created if the standby spec is invalid
- kuttl: deploy two clusters, a primary and standby, in a single
namespace. Ensure that the standby cluster has replicated the primary
data and the walreciever process is running
and [backup management]({{< relref "architecture/backups.md" >}})
12
-
strategies involve spreading your database clusters across multiple data centers
13
-
to help maximize uptime. In Kubernetes, this technique is known as "[federation](https://en.wikipedia.org/wiki/Federation_(information_technology))".
14
-
Federated Kubernetes clusters are able to communicate with each other,
8
+
Advanced high-availability and disaster recovery strategies involve spreading
9
+
your database clusters across multiple data centers to help maximize uptime.
10
+
In Kubernetes, this technique is known as "[federation](https://en.wikipedia.org/wiki/Federation_(information_technology))".
11
+
Federated Kubernetes clusters can communicate with each other,
15
12
coordinate changes, and provide resiliency for applications that have high
16
13
uptime requirements.
17
14
18
15
As of this writing, federation in Kubernetes is still in ongoing development
19
16
and is something we monitor with intense interest. As Kubernetes federation
20
17
continues to mature, we wanted to provide a way to deploy PostgreSQL clusters
21
18
managed by the [PostgreSQL Operator](https://www.crunchydata.com/developers/download-postgres/containers/postgres-operator)
22
-
that can span multiple Kubernetes clusters. This can be accomplished with a
23
-
few environmental setups:
24
-
25
-
- Two Kubernetes clusters
26
-
- An external storage system, using one of the following:
27
-
- S3, or an external storage system that uses the S3 protocol
28
-
- GCS
29
-
- Azure Blob Storage
30
-
- A Kubernetes storage system that can span multiple clusters
19
+
that can span multiple Kubernetes clusters.
31
20
32
21
At a high-level, the PostgreSQL Operator follows the "active-standby" data
33
22
center deployment model for managing the PostgreSQL clusters across Kubernetes
34
-
clusters. In one Kubernetes cluster, the PostgreSQL Operator deploy PostgreSQL as an
23
+
clusters. In one Kubernetes cluster, the PostgreSQL Operator deploys PostgreSQL as an
35
24
"active" PostgreSQL cluster, which means it has one primary and one-or-more
36
25
replicas. In another Kubernetes cluster, the PostgreSQL cluster is deployed as
37
26
a "standby" cluster: every PostgreSQL instance is a replica.
38
27
39
28
A side-effect of this is that in each of the Kubernetes clusters, the PostgreSQL
40
29
Operator can be used to deploy both active and standby PostgreSQL clusters,
41
-
allowing you to mix and match! While the mixing and matching may not ideal for
30
+
allowing you to mix and match! While the mixing and matching may not be ideal for
42
31
how you deploy your PostgreSQL clusters, it does allow you to perform online
43
32
moves of your PostgreSQL data to different Kubernetes clusters as well as manual
44
33
online upgrades.
45
34
46
35
Lastly, while this feature does extend high-availability, promoting a standby
47
36
cluster to an active cluster is **not** automatic. While the PostgreSQL clusters
48
-
within a Kubernetes cluster do support self-managed high-availability, a
49
-
cross-cluster deployment requires someone to specifically promote the cluster
37
+
within a Kubernetes cluster support self-managed high-availability, a
38
+
cross-cluster deployment requires someone to promote the cluster
50
39
from standby to active.
51
40
52
41
## Standby Cluster Overview
53
42
54
-
Standby PostgreSQL clusters are managed just like any other PostgreSQL cluster
55
-
that is managed by the PostgreSQL Operator. For example, adding replicas to a
56
-
standby cluster is identical as adding them to a primary cluster.
43
+
Standby PostgreSQL clusters are managed like any other PostgreSQL cluster that the PostgreSQL
44
+
Operator manages. For example, adding replicas to a standby cluster is identical to adding them to a
45
+
primary cluster.
57
46
58
-
As the architecture diagram above shows, the main difference is that there is
59
-
no primary instance: one PostgreSQL instance is reading in the database changes
60
-
from the backup repository, while the other replicas are replicas of that instance.
61
-
This is known as [cascading replication](https://www.postgresql.org/docs/current/warm-standby.html#CASCADING-REPLICATION).
62
-
replicas are cascading replicas, i.e. replicas replicating from a database server that itself is replicating from another database server.
47
+
The main difference between a primary and standby cluster is that there is no primary instance on
48
+
the standby: one PostgreSQL instance is reading in the database changes from either the backup
49
+
repository or via streaming replication, while other instances are replicas of it.
50
+
51
+
Any replicas created in the standby cluster are known as cascading replicas, i.e., replicas
52
+
replicating from a database server that itself is replicating from another database server. More
53
+
information about [cascading replication](https://www.postgresql.org/docs/current/warm-standby.html#CASCADING-REPLICATION)
54
+
can be found in the PostgreSQL documentation.
63
55
64
56
Because standby clusters are effectively read-only, certain functionality
65
-
that involves making changes to a database, e.g. PostgreSQL user changes, is
66
-
blocked while a cluster is in standby mode. Additionally, backups and restores
67
-
are blocked as well. While [pgBackRest](https://pgbackrest.org/)does support
57
+
that involves making changes to a database, e.g., PostgreSQL user changes, is
58
+
blocked while a cluster is in standby mode. Additionally, backups and restores
59
+
are blocked as well. While [pgBackRest](https://pgbackrest.org/)supports
68
60
backups from standbys, this requires direct access to the primary database,
69
61
which cannot be done until the PostgreSQL Operator supports Kubernetes
70
62
federation.
71
63
72
-
## Creating a Standby PostgreSQL Cluster
64
+
### Types of Standby Clusters
65
+
There are three ways to deploy a standby cluster with the Postgres Operator.
73
66
74
-
For creating a standby Postgres cluster with PGO, please see the [disaster recovery tutorial]({{< relref "tutorial/disaster-recovery.md" >}}#standby-cluster)
67
+
#### Repo-based Standby
68
+
69
+
A repo-based standby will connect to a pgBackRest repo stored in an external storage system
70
+
(S3, GCS, Azure Blob Storage, or any other Kubernetes storage system that can span multiple
71
+
clusters). The standby cluster will receive WAL files from the repo and will apply those to the
For creating a standby Postgres cluster with PGO, please see the [disaster recovery tutorial]({{< relref "tutorial/disaster-recovery.md" >}}#standby-cluster)
92
94
93
-
The effect of this is that all the Kubernetes Statefulsets and Deployments for this cluster are
94
-
scaled to 0.
95
+
### Promoting a Standby Cluster
95
96
96
-
We can then promote the standby cluster using the following:
97
+
There comes a time when a standby cluster needs to be promoted to an active cluster. Promoting a
98
+
standby cluster means that the standby leader PostgreSQL instance will become a primary and start
99
+
accepting both reads and writes. This has the net effect of pushing WAL (transaction archives) to
100
+
the pgBackRest repository. Before doing this, we need to ensure we don't accidentally create a split-brain
101
+
scenario.
97
102
98
-
```
99
-
spec:
100
-
standby:
101
-
enabled: false
102
-
```
103
+
If you are promoting the standby while the primary is still running, i.e., if this is not a disaster
104
+
scenario, you will want to [shutdown the active PostgreSQL cluster]({{< relref "tutorial/administrative-tasks.md" >}}#shutdown).
103
105
104
-
This command essentially removes the standby configuration from the Kubernetes
105
-
cluster’s DCS, which triggers the promotion of the current standby leader to a
106
-
primary PostgreSQL instance. You can view this promotion in the PostgreSQL
107
-
standby leader's (soon to be active leader's) logs:
106
+
The standby can be promoted once the primary is inactive, e.g., is either `shutdown` or failing.
107
+
This process essentially removes the standby configuration from the Kubernetes cluster’s DCS, which
108
+
triggers the promotion of the current standby leader to a primary PostgreSQL instance. You can view
109
+
this promotion in the PostgreSQL standby leader's (soon to be active leader's) logs.
108
110
109
-
With the standby cluster now promoted, the cluster with the original active
110
-
PostgreSQL cluster can now be turned into a standby PostgreSQL cluster. This is
111
-
done by deleting and recreating all PVCs for the cluster and re-initializing it
112
-
as a standby using the backup repository. Being that this is a destructive action
113
-
(i.e. data will only be retained if any Storage Classes and/or Persistent
111
+
Once the standby cluster is promoted, the cluster with the original active
112
+
PostgreSQL cluster can now be turned into a standby PostgreSQL cluster. This is
113
+
done by deleting and recreating all PVCs for the cluster and reinitializing it
114
+
as a standby using the backup repository. Being that this is a destructive action
115
+
(i.e., data will only be retained if any Storage Classes and/or Persistent
114
116
Volumes have the appropriate reclaim policy configured) a warning is shown
115
117
when attempting to enable standby.
116
118
117
119
The cluster will reinitialize from scratch as a standby, just
118
-
like the original standby that was created above. Therefore any transactions
119
-
written to the original standby, should now replicate back to this cluster.
120
+
like the original standby created above. Therefore any transactions
121
+
written to the original standby should now replicate back to this cluster.
0 commit comments