|
| 1 | +# PES (Patroni Environment Setup) on Windows |
| 2 | + |
| 3 | +The package consists of: |
| 4 | + |
| 5 | +- [patroni](https://github.com/zalando/patroni) HA |
| 6 | +- [etcd](https://github.com/etcd-io/etcd) distributed key-value store |
| 7 | +- [vip-manager](https://github.com/cybertec-postgresql/vip-manager) virtual IP manager |
| 8 | +- [PostgreSQL](https://www.postgresql.org/) database itself |
| 9 | +- [python](https://www.python.org/) runtime and packages |
| 10 | +- [micro](https://github.com/zyedidia/micro) console editor |
| 11 | + |
| 12 | +While the processes within `patroni` itself do not change much when running it under Windows, most challenges come from creating an environment in which patroni can run. |
| 13 | + |
| 14 | +- `patroni` needs to be able to run PostgreSQL, this can only be done by unprivileged users. |
| 15 | +- That unprivileged user in turn needs to be able to run Patroni, thus is needs access to Python. |
| 16 | +- `patroni` needs to run as soon as the machine is booted up, without requiring anybody to log in and start anything. |
| 17 | +- For that services are used under Windows. Since `etcd`, `patroni`, and `vip-manager` are not native Windows services, it is wise to use [WinSW](https://github.com/winsw/winsw/) wrapper for that. |
| 18 | + |
| 19 | +# Installing all the things |
| 20 | + |
| 21 | +You can choose two installation methods: |
| 22 | + |
| 23 | +1. by Installer (.exe) |
| 24 | +2. by unzipping (.zip) and running a PowerShell Script. |
| 25 | + |
| 26 | +Both installation methods need to be run with **Administrator** privileges. |
| 27 | + |
| 28 | +In either case, you will need to install everything into a path that can be made accessible to the unprivileged user that will later be used to run `patroni` and PostgreSQL. |
| 29 | + |
| 30 | +This rules out any Paths that are below `C:\Users` |
| 31 | + |
| 32 | +We recommend installing everything into a directory directly at the root of `C:\`, e.g. `C:\PES\` . The PostgreSQL data dir can still be located in another location, but this will also need to be made accessible to the user running PostgreSQL. |
| 33 | + |
| 34 | +The PowerShell Script `install.ps1` needs to be run with special Execution Policy because it is not signed by us. You can verify the contents beforehand. |
| 35 | +To change the Execution Policy only for the execution of the script: |
| 36 | + |
| 37 | +```powershell |
| 38 | +powershell.exe -File C:\PES\install.ps1 -ExecutionPolicy Bypass |
| 39 | +``` |
| 40 | + |
| 41 | +During the installation, the script or the installer will try to create a new user `pes` and assign a randomly chosen password. This password will be printed on the screen, so make sure to note it down somewhere. Don't worry if you forget this password. You can check it in the `patroni\patroni_service.xml` file. |
| 42 | + |
| 43 | +Afterward, the script or installer will make sure to grant access to the installation directory to the newly created user. |
| 44 | + |
| 45 | +Should any of this user-creating or access-granting fail to work, here are the commands you can use (and adapt) yourself to fix it: |
| 46 | + |
| 47 | +```powershell |
| 48 | +REM add a user with a password: |
| 49 | +net user username password /ADD |
| 50 | +
|
| 51 | +REM change the password only: |
| 52 | +net user username newpassword |
| 53 | +
|
| 54 | +REM grant full access: |
| 55 | +icacls C:\PES\ /q /c /t /grant username:F |
| 56 | +``` |
| 57 | + |
| 58 | +Even though a new user was just created, all remaining setup tasks need to be performed as an **Administrator**, primarily to register the Services. |
| 59 | + |
| 60 | +# Setup etcd |
| 61 | + |
| 62 | +From the base directory `C:\PES\`, go into the `etcd` directory and create a file `etcd.conf`. |
| 63 | + |
| 64 | +```yaml |
| 65 | +name: 'win1' |
| 66 | +data-dir: win1.etcd |
| 67 | +heartbeat-interval: 100 |
| 68 | +election-timeout: 1000 |
| 69 | +listen-peer-urls: http://0.0.0.0:2380 |
| 70 | +listen-client-urls: http://0.0.0.0:2379 |
| 71 | +initial-advertise-peer-urls: http://192.168.178.88:2380 |
| 72 | +advertise-client-urls: http://192.168.178.88:2379 |
| 73 | +initial-cluster: win1=http://192.168.178.88:2380,win2=http://192.168.178.89:2380,win3=http://192.168.178.90:2380 |
| 74 | +initial-cluster-token: 'etcd-cluster' |
| 75 | +initial-cluster-state: 'new' |
| 76 | +enable-v2: true |
| 77 | +``` |
| 78 | +
|
| 79 | +The config file above is for three-node etcd clusters, which is the minimum recommended size. |
| 80 | +You can go through and replace the IP-addresses in `initial-advertise-peer-urls`, `advertise-client-urls`, and `initial-cluster` to match those of your three cluster members-to-be. |
| 81 | +The mapping `name=url` in the `initial-cluster` value needs to contain the matching `name` and `initial-advertise-peer-urls` of your cluster members. |
| 82 | + |
| 83 | +When you're done adapting the above `etcd.conf` to your needs, copy it over to the other cluster members and change the name, and IP addresses or hostnames there accordingly. |
| 84 | + |
| 85 | +To make sure that `etcd` can be run after boot, we need to create a Windows Service. Windows Services require the executable to behave in a particular fashion and to react to certain signals, all of which `etcd` cannot do. The simplest option is to use a wrapper that behaves in this fashion and in turn, launches `etcd` for us. One such wrapper (and the best option it seems) is [WinSW](https://github.com/winsw/winsw). |
| 86 | + |
| 87 | +A copy of the `winsw.exe` executable is renamed `etcd_service.exe` and an accompanying `etcd_service.xml` config file is created. The config contains details on where to find the `etcd` executable, where the config file (`etcd.conf`) is located, and where the logs should go. |
| 88 | + |
| 89 | +The next version of WinSW will allow to provide YAML configuration files. |
| 90 | + |
| 91 | +## etcd service installation |
| 92 | + |
| 93 | +```powershell |
| 94 | +etcd_service.exe install |
| 95 | +``` |
| 96 | + |
| 97 | +Will register the service that will later launch `etcd` automatically for us. |
| 98 | + |
| 99 | +Apart from the messages on screen, you can check that the service is installed with: |
| 100 | + |
| 101 | +```powershell |
| 102 | +sc qc etcd |
| 103 | +``` |
| 104 | + |
| 105 | +You should see that the start type for this service is set to auto, which means "start the service automatically after booting up". |
| 106 | + |
| 107 | +Now that the service is installed, we need to create |
| 108 | + |
| 109 | +## etcd service running |
| 110 | + |
| 111 | +Having installed the service, you can start it manually: |
| 112 | + |
| 113 | +```powershell |
| 114 | +> etcd_service.exe start |
| 115 | +or |
| 116 | +> net start etcd |
| 117 | +or |
| 118 | +> sc start etcd |
| 119 | +``` |
| 120 | + |
| 121 | +You will need to go through the etcd Setup on all three hosts in order to successfully bootstrap the etcd cluster. Only after that you will be able to continue with the setup of Patroni. |
| 122 | + |
| 123 | +## etcd checking |
| 124 | + |
| 125 | +You can first take a look at `C:\PES\etcd\log\etcd_service.err.log`. If something went wrong during the installing or starting of the service already, the messages about that will be in `C:\PES\etcd\log\etcd_service.wrapper.log`. |
| 126 | + |
| 127 | +If there are no critical errors in those files, you can check if the etcd cluster is working allright, assuming that you've started all other etcd cluster members: |
| 128 | + |
| 129 | +```yaml |
| 130 | + C:\PES\etcd\etcdctl cluster-health |
| 131 | +``` |
| 132 | + |
| 133 | +TODO: put sample output here |
| 134 | + |
| 135 | +This should list all of your etcd cluster members and indicate that they are all working. |
| 136 | + |
| 137 | +If you receive any timeout errors or similar, something during the bootstrap went wrong. |
| 138 | + |
| 139 | +If you figured out the error that was preventing successful bootstrap of the cluster, it is best practice to 1. stop all etcd members 2. remove all etcd data directories 3. fix the error 4. start all etcd members. |
| 140 | + |
| 141 | +Some changes to the config (mainly those involving the initial cluster members and cluster name) will be ignored if the data dir has already been initialized. |
| 142 | + |
| 143 | +# Setup Patroni |
| 144 | + |
| 145 | +Warning: Do not begin setting up Patroni if your etcd cluster does not yet contain all cluster members, check `C:\PES\etcd\etcdctl cluster-health` to make sure. Otherwise you will have multiple Patroni instances who are not aware of their peers and will bootstrap on their own. |
| 146 | + |
| 147 | +From the base directory `C:\PES\`, go into the `patroni` directory and create (or edit) a file `patroni.yml`. |
| 148 | + |
| 149 | +```yaml |
| 150 | +scope: pgcluster |
| 151 | +namespace: /service/ |
| 152 | +name: win1 |
| 153 | +
|
| 154 | +restapi: |
| 155 | + listen: 0.0.0.0:8008 |
| 156 | + connect_address: 192.168.178.88:8008 |
| 157 | +
|
| 158 | +etcd: |
| 159 | + hosts: |
| 160 | + - 192.168.178.88:2379 |
| 161 | + - 192.168.178.89:2379 |
| 162 | + - 192.168.178.90:2379 |
| 163 | +
|
| 164 | +bootstrap: |
| 165 | + dcs: |
| 166 | + ttl: 30 |
| 167 | + loop_wait: 10 |
| 168 | + retry_timeout: 10 |
| 169 | + maximum_lag_on_failover: 1048906 |
| 170 | + postgresql: |
| 171 | + use_pg_rewind: true |
| 172 | + use_slots: true |
| 173 | + parameters: |
| 174 | + logging_collector: true |
| 175 | + log_directory: log |
| 176 | + log_filename: postgresql.log |
| 177 | + wal_keep_segments: 50 |
| 178 | + pg_hba: |
| 179 | + - host replication replicator 127.0.0.1/32 md5 |
| 180 | + - host all all 0.0.0.0/0 md5 |
| 181 | +
|
| 182 | + initdb: |
| 183 | + - encoding: UTF8 |
| 184 | + - data-checksums |
| 185 | +
|
| 186 | +postgresql: |
| 187 | + listen: 0.0.0.0:5432 |
| 188 | + connect_address: 192.168.178.88:5432 |
| 189 | + data_dir: C:/PES/pgsql/pgcluster_data |
| 190 | + bin_dir: C:/PES/pgsql |
| 191 | + authentication: |
| 192 | + replication: |
| 193 | + username: replicator |
| 194 | + password: reptilefluid |
| 195 | + superuser: |
| 196 | + username: postgres |
| 197 | + password: snakeoil |
| 198 | +
|
| 199 | +tags: |
| 200 | + nofailover: false |
| 201 | + noloadbalance: false |
| 202 | + clonefrom: false |
| 203 | + nosync: false |
| 204 | +``` |
| 205 | + |
| 206 | +:rotating_light: Under Windows one should double backslash path delimiter when used in patroni configuration, since it used as an escape character. To resolve the ambiguity we highly recommend to replace all backslashes with slashes in a folder names, e.g. |
| 207 | +`data_dir: C:/PES/pgsql/pgcluster_data` |
| 208 | + |
| 209 | +If you're running different Patroni clusters on top of the same etcd cluster, make sure to set a different `scope` (often reffered to as cluster name) for the different Patroni clusters. |
| 210 | + |
| 211 | +Change the `name` (this is the name of a member within the cluster `scope` ) to your liking; This name needs to be different for each cluster member. |
| 212 | +Setting the `name` to the hostname is often a good starting point. |
| 213 | + |
| 214 | +Replace the IP address in `restapi.connect_address` with the host's own IP address or hostname. This address will be used for communication from other Patroni members to this one. |
| 215 | + |
| 216 | +Replace the IP addresses in the `etcd.hosts` list to match the IP addresses or hostnames of your etcd cluster. |
| 217 | + |
| 218 | +Change the IP address in the `postgresql.listen` section to the host's own IP address or hostname. This address will be used when Patroni needs to pull a backup from the primary or to create the Streaming Replication connections. If Streaming Replication and backups should use a dedicated NIC put the IP address registered on that NIC here. |
| 219 | + |
| 220 | +If you intend to create a Patroni cluster from a preexisting PostgreSQL cluster, stop that cluster and put the location of that cluster's data directory into the `postgresql.data_dir` variable. If the PostgreSQL version of the preexisting cluster is different, change the `postgresql.bin_dir` accordingly. Make sure that the `pes` user can access both of those directories. |
| 221 | + |
| 222 | +For a full list of configuration items and their description, please refer to the Patroni [Documentation](https://patroni.readthedocs.io/en/latest/SETTINGS.html). |
| 223 | + |
| 224 | +When you're done adapting the above `patroni.yml` to your needs, copy it over to the other cluster members and change the name, and IP addresses or hostnames there accordingly. |
| 225 | + |
| 226 | +The creation of the Patroni Service and start is similar to the procedure for `etcd`. |
| 227 | +The major difference is that Patroni needs to be run as the `pes` user. For this reason, the `patroni_service.xml` contains the user name and password. |
| 228 | + |
| 229 | +## patroni service installation |
| 230 | + |
| 231 | +Create the service: |
| 232 | + |
| 233 | +```powershell |
| 234 | +C:\PES\patroni\patroni_service install |
| 235 | +``` |
| 236 | + |
| 237 | +Check the service: |
| 238 | + |
| 239 | +```powershell |
| 240 | +sc qc patroni |
| 241 | +``` |
| 242 | + |
| 243 | +You should see that the start type for this service is set to auto, which means "start the service automatically after booting up". |
| 244 | + |
| 245 | +## patroni service running |
| 246 | + |
| 247 | +Start the service: |
| 248 | + |
| 249 | +```powershell |
| 250 | +> etcd_service.exe start |
| 251 | +or |
| 252 | +> net start etcd |
| 253 | +or |
| 254 | +> sc start etcd |
| 255 | +``` |
| 256 | + |
| 257 | +It is recommended to start Patroni on one host first and check that it bootstrapped as expected, before starting the remaining cluster members. This is not to avoid race conditions, because Patroni can handle those fine. This recommendation is given mainly to make it easier to troubleshoot problems as soon as they arise. |
| 258 | + |
| 259 | +## Check Patroni |
| 260 | + |
| 261 | +You can first take a look at `C:\PES\patroni\log\patroni_service.err.log`. If something went wrong during the installing or starting of the service already, the messages about that will be in `C:\PES\patroni\log\patroni_service.wrapper.log`. |
| 262 | + |
| 263 | +If the `patroni_service.err.log` contains messages like "starting PostgreSQL failed" or similar, check the PostgreSQL log as well, which should be located in `C:\PES\pgsql\pgcluster_data\log\`. |
| 264 | + |
| 265 | +If there are no critical errors in those files, you can check if the Patroni cluster is working allright: |
| 266 | + |
| 267 | +```powershell |
| 268 | + C:\PES\patronictl list |
| 269 | +``` |
| 270 | + |
| 271 | +TODO: put sample output here |
| 272 | + |
| 273 | +This should list all of your Patroni cluster members and indicate that they are all working. |
| 274 | + |
| 275 | +If you are bootstrapping the cluster for the first time and the first cluster member did not yet show up, check the logs. |
| 276 | + |
| 277 | +If there are cluster members that display "Start failed" in their status field, you need to examine the logs on those machines first. |
| 278 | + |
| 279 | +TODO |
| 280 | + |
| 281 | +- edit config file in C:\patroni-win-x64\patronictl: |
| 282 | +python.exe patroni\[patronictl.py](http://patronictl.py/) -c C:\PES\patroni\patroni.yml %* |
| 283 | + |
| 284 | +# Setup vip-manager |
| 285 | + |
| 286 | +From the base directory `C:\PES\`, go into the `vip-manager` directory and create a file `vip-manager.yml`. |
| 287 | + |
| 288 | +```powershell |
| 289 | +# time (in milliseconds) after which vip-manager wakes up and checks if it needs to register or release ip addresses. |
| 290 | +interval: 1000 |
| 291 | +
|
| 292 | +# the etcd or consul key which vip-manager will regularly poll. |
| 293 | +trigger-key: "/service/pgcluster/leader" |
| 294 | +# if the value of the above key matches the trigger-value (often the hostname of this host), vip-manager will try to add the virtual ip address to the interface specified in Iface |
| 295 | +trigger-value: "win1" |
| 296 | +
|
| 297 | +ip: 192.168.88.123 # the virtual ip address to manage |
| 298 | +netmask: 24 # netmask for the virtual ip |
| 299 | +interface: "Ethernet" #interface to which the virtual ip will be added |
| 300 | +
|
| 301 | +dcs-type: etcd # etcd or consul |
| 302 | +# a list that contains all DCS endpoints to which vip-manager could talk. |
| 303 | +endpoints: |
| 304 | + - http://192.168.178.88:2379 |
| 305 | + - http://192.168.178.89:2379 |
| 306 | + - http://192.168.178.90:2379 |
| 307 | +
|
| 308 | +# how often things should be retried and how long to wait between retries. (currently only affects arpClient) |
| 309 | +retry-num: 2 |
| 310 | +retry-after: 250 #in milliseconds |
| 311 | +``` |
| 312 | + |
| 313 | +Change the `trigger-key` to match what the concatenation of these values from the patroni.yml gives: `<namespace> + "/" + <scope> + "/leader"` . Patroni store the current leader name in this key. |
| 314 | + |
| 315 | +Change the `trigger-value` to the `name` in the `patroni.yml` of this host. |
| 316 | + |
| 317 | +Change `ip`, `netmask`, `interface` to the virtual IP that will be managed and the appropriate netmask, as well as the networking interface on which the virtual IP should be registered. |
| 318 | + |
| 319 | +Change the `endpoints` list to the list of all your etcd cluster members. Do not forget the protocol prefrix: `http://` here. |
| 320 | + |
| 321 | +## vip-manager service installation |
| 322 | + |
| 323 | +The creation of the vip-manager Service and start is similar to the procedure for etcd. |
| 324 | +Create the service: |
| 325 | + |
| 326 | +```powershell |
| 327 | +C:\PES\vip-manager\vip-manager_service install |
| 328 | +``` |
| 329 | + |
| 330 | +Check the service: |
| 331 | + |
| 332 | +```powershell |
| 333 | +sc qc vip-manager |
| 334 | +``` |
| 335 | + |
| 336 | +You should see that the start type for this service is set to auto, which means "start the service automatically after booting up". |
| 337 | + |
| 338 | +## vip-manager service running |
| 339 | + |
| 340 | +Start the service: |
| 341 | + |
| 342 | +```powershell |
| 343 | +> vip-manager_service.exe start |
| 344 | +or |
| 345 | +> net start vip-manager |
| 346 | +or |
| 347 | +> sc start vip-manager |
| 348 | +``` |
| 349 | + |
| 350 | +## Check vip-manager |
| 351 | + |
| 352 | +You can first take a look at `C:\PES\vip-manager\log\vip-manager_service.err.log`. If something went wrong during the installing or starting of the service already, the messages about that will be in `C:\PES\vip-manager\log\vip-manager_service.wrapper.log`. |
| 353 | + |
| 354 | +When vip-manager is working as expected, it should log messages like ... |
| 355 | +TODO |
| 356 | + |
| 357 | +# Check Patroni cluster is working as expected |
| 358 | + |
| 359 | +- Trigger a couple of switchovers (`patronictl switchover <clustername>`) and observe (using `patronictl -w` that the demoted primary comes back up as a replica and clears its rewind state (i.e. switches to the new primary's timeline). Observe vip-manager log to make sure it is succesfully dropping the VIP on the old primary and registering it on the new primary. |
| 360 | +- Trigger a reinit of a replica (`patronictl reinit <clustername> <membername>`). |
| 361 | +- Reboot your machines at least once to check if all the services are starting as expected. |
0 commit comments