Patroni — high availability cluster management

Note
Patroni support may be removed in the future release due to the possibility of using the cluster computing management module QLUSTER.

Patroni is a Python application for building high availability QHB clusters based on streaming replication. It is used by such companies as Red Hat, IBM Compose, Zalando and many others. It allows you to convert the system from primary and backup servers to a high availability cluster that supports automatic controlled switch-over and failover. Patroni makes it easy to add new replicas to an existing cluster, supports dynamic changing of QHB configuration on all servers in the cluster in parallel, as well as many other features, such as synchronous replication, custom actions while switching nodes, REST API, ability to run custom commands to create a replica.

One of the possible technology stack combinations for organizing highly available clusters is:

  • QHB as a database system;
  • Patroni as a clustering solution;
  • etcd or consul as a distributed storage for Patroni.

Patroni is an open source application, so for usability and for compatibility with QHB we are making modifications in source packages. Compatible Patroni packages can be find in the QHB repositories.


Installing Components

Note
Patroni packages are only available for the following operating systems: Debian 10, Debian 12, Ubuntu 18, Ubuntu 22, Ubuntu 24, CentOS 7, CentOS 8, MosOS, and OpenSUSE.

Installing etcd

Example of installing on CentOS 7:

yum install etcd

You can also install it via packages or from source by following etcd installation documentation.

It is recommended to install etcd on separate machine where will not be installed Patroni and QHB nodes, since disk workload is essential for etcd. For normal operation of etcd cluster, the minimum number of nodes is three. For developing and testing purposes, one node is enough.

Installing Consul

IMPORTANT!
Due to Hashicorp restrictions, installation is currently only possible with VPN.

Installation is performed via packages from the cite pkgs.org or by following Consul installation documentation.

Consul servers are the location where Patroni data is stored and replicated. Therefore, the recommended minimum number of nodes is three. For development and testing, you can use one node. For productive high-load systems you should deploy Consul services on separate machines from Patroni and QHB.

The Consul client is also a member of the system and can connect to the Consul servers cluster to get information about the infrastructure. You can allocate the client on a server with Patroni and QHB to connect to the Consul servers cluster via a local Consul client.


Installing QHB

For installing, refer to QHB installation documentation.

There is no need for building and running QHB service as well as database cluster initialization. All control of the database system cluster implements via Patroni.

Installing Patroni

You should to install Patroni on the same machines where QHB servers already installed.

Package name for installing is qhb-patroni

For installing, refer to QHB installation documentation.

Note that special package qhb-patroni has been released for using with QHB. Standard Patroni for PostgreSQL cannot be used with QHB.

Check qhb-patroni version:

patroni --version

Result: Patroni 3.0.2


Setting Up Components

Setting Up etcd

The setting is performed independently of Patroni or QHB and can be performed preliminary (see the official etcd documentation for details). For each etcd node configuration you need to add the parameters required for launching in /etc/etcd/etcd.conf file. Also you should to create a directory for etcd data (e.g., /var/lib/etcd/default.etcd) on each node beforehand. Example of a minimal node configuration on 3 machines with addresses: X.X.X.101, X.X.X.102, X.X.X.103 is shown below.

For the first node:

ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="http://X.X.X.101:2380"
ETCD_LISTEN_CLIENT_URLS="http://X.X.X.101:2379,http://127.0.0.1:2379"
ETCD_NAME="etcd0"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://X.X.X.101:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://X.X.X.101:2379"
ETCD_INITIAL_CLUSTER="etcd0=http://X.X.X.101:2380,etcd1=http://X.X.X.102:2380,etcd2=http://X.X.X.103:2380"
ETCD_INITIAL_CLUSTER_TOKEN="qhb_token"

For the second node:

ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="http://X.X.X.102:2380"
ETCD_LISTEN_CLIENT_URLS="http://X.X.X.102:2379,http://127.0.0.1:2379"
ETCD_NAME="etcd1"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://X.X.X.102:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://X.X.X..102:2379"
ETCD_INITIAL_CLUSTER="etcd0=http://X.X.X.101:2380,etcd1=http://X.X.X.102:2380,etcd2=http://X.X.X.103:2380"
ETCD_INITIAL_CLUSTER_TOKEN="qhb_token"

For the third node:

ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="http://X.X.X.103:2380"
ETCD_LISTEN_CLIENT_URLS="http://X.X.X.103:2379,http://127.0.0.1:2379"
ETCD_NAME="etcd2"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://X.X.X.103:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://X.X.X.103:2379"
ETCD_INITIAL_CLUSTER="etcd0=http://X.X.X.101:2380,etcd1=http://X.X.X.102:2380,etcd2=http://X.X.X.103:2380"
ETCD_INITIAL_CLUSTER_TOKEN="qhb_token"

While cluster rebuilding, you need to reinitialize etcd database by deleting data in /var/lib/etcd/default.etcd directory.

Create the etcd service on each node for convenient cluster control:

[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/etcd/etcd.conf
User=etcd
# set up number of CPU in GOMAXPROCS
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd --name=\"${ETCD_NAME}\" --data-dir=\"${ETCD_DATA_DIR}\" --listen-client-urls=\"${ETCD_LISTEN_CLIENT_URLS}\""
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Setting Up Consul

The setting is performed independently of Patroni or QHB and can be performed preliminary (see the official Consul documentation for details). For each consul server cluster node configuration you need to add the parameters required for launching in /etc/consul.d/config.json file.

This is example of the minimal node configuration on 3 machines. Addresses and domain names:

X.X.X.101, qhb-srv-01.quantom.local;
X.X.X.102, qhb-srv-02.quantom.local;
X.X.X.103, qhb-srv-03.quantom.local.

For the first node of the consul server cluster (change bind_addr and client_addr for the second and third nodes):

{
  "bind_addr": "X.X.X.101",
  "bootstrap_expect": 3,
  "client_addr": "X.X.X.101",
  "datacenter": "dc1",
  "data_dir": "/opt/consul",
  "domain": "consul",
  "enable_script_checks": true,
  "dns_config": {
    "enable_truncate": true,
    "only_passing": true
  },
  "enable_syslog": true,
  "encrypt": "v6+HtK6JQ6kX2XgYkSFQM9KFXF1YeGyFHcRo6hWZbjI=",
  "leave_on_terminate": true,
  "log_level": "INFO",
  "rejoin_after_leave": true,
  "retry_join": [
    "qhb-srv-01.quantom.local",
    "qhb-srv-02.quantom.local",
    "qhb-srv-03.quantom.local"
  ],
  "server": true,
  "start_join": [
    "qhb-srv-01.quantom.local",
    "qhb-srv-02.quantom.local",
    "qhb-srv-03.quantom.local"
  ],
  "ui_config": { "enabled": true }
}

You can create consul client to connect to consul server cluster (for example, allocate it on Patroni servers). With this configuration Patroni connects to the local consul client, which accesses to the entire consul server cluster.

For consul client configuration you need to add the parameters required for launching in /etc/consul.d/config.json file.

Example of minimal client configuration:

{
  "bind_addr": "X.X.X.104",
  "client_addr": "X.X.X.104",
  "datacenter": "dc1",
  "node_name": "client01",
  "data_dir": "/opt/consul",
  "domain": "consul",
  "enable_script_checks": true,
  "dns_config": {
    "enable_truncate": true,
    "only_passing": true
  },
  "enable_syslog": true,
  "encrypt": "v6+HtK6JQ6kX2XgYkSFQM9KFXF1YeGyFHcRo6hWZbjI=",
  "leave_on_terminate": true,
  "log_level": "INFO",
  "rejoin_after_leave": true,
  "retry_join": [
    "qhb-srv-01.quantom.local",
    "qhb-srv-02.quantom.local",
    "qhb-srv-03.quantom.local"
  ],
  "server": false,
  "start_join": [
    "qhb-srv-01.quantom.local",
    "qhb-srv-02.quantom.local",
    "qhb-srv-03.quantom.local"
  ],
  "ui_config": { "enabled": true }
}

You should to validate each configuration:

consul validate /etc/consul.d/config.json

Create the consul service on each node for for convenient cluster control:

[Unit]
Description=Consul Service Discovery Agent
Documentation=https://www.consul.io/
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=consul
Group=consul
ExecStart=/usr/bin/consul agent \
    -node=qhb-srv-01.quantom.local \
    -config-dir=/etc/consul.d
ExecReload=/bin/kill -HUP $MAINPID
KillSignal=SIGINT
TimeoutStopSec=5
Restart=on-failure
SyslogIdentifier=consul

[Install]
WantedBy=multi-user.target

Setting Up Patroni

You need to create a directory for QHB database cluster (e.g., /mnt/patroni) on each host with qhb-patroni and assign the required rights to user qhb:

mkdir -p /mnt/patroni
chown qhb:qhb /mnt/patroni
chmod 700 /mnt/patroni

Create configuration file .yml on each host with qhb-patroni. You can copy examples of the configuration files in rpm systems /usr/share/doc/patroni-3.0.2/qhb1.yml (qhb2.yml) and deb systems /usr/share/doc/patroni/qhb1.yml (qhb2.yml) See Patroni documentation for details of the configuration parameters.

In addition to the basic configuration parameters specified in the Patroni documentation, in the qhb-patroni implementation have been added additional parameters for QHB control:

# QHB parameters
  hba_file: qhb_hba.conf
  ident_file: qhb_ident.conf
  database: qhb
  config_base_name: qhb

This is example of the minimal qhb-patroni cluster configuration on 2 hosts with addresses: X.X.X.104, X.X.X.105.

Note
The example shows connection parameters for etcd: for working with consul uncomment and add required strings.

qhb0.yml for the fist node of qhb-patroni cluster:

scope: cluster
# namespace: /service/
name: qhb0

restapi:
  listen: 127.0.0.1:8008
  connect_address: 127.0.0.1:8008

etcd:
  # Use «hosts» to provide multiply	workstations
  hosts:
    - X.X.X.101:2379
    - X.X.X.102:2379
    - X.X.X.103:2379

  #consul:
  #host: X.X.X.104:8500

bootstrap:
  # this section will be stored in etcd/Consul:/<namespace>/<application>/config
  # after the new cluster initialization, and all the other cluster members will
  # use it as «global configuration»
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
  # some desired parameters for 'initdb'
  initdb:  # Note: It should be a list (some parameters
    # require values and other is flags)
    - encoding: UTF8
    - data-checksums

  qhb_hba:  # Add the following strings to qhb_hba.conf after 'initdb' launching
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator X.X.X.0/24 md5
    - host all all 0.0.0.0/0 md5

  # Some additional users that you need to create after initializing the new cluster
  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb
postgresql:
  listen: 127.0.0.1,X.X.X.104,::1:5432
  connect_address: X.X.X.104:5432
  data_dir: /mnt/patroni
  bin_dir: /usr/local/qhb/bin
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: qhb
      password: qhb
    rewind:
      username: rewind_user
      password: rewind_password
  parameters:
    unix_socket_directories: '.'
  # QHB parameters
  hba_file: qhb_hba.conf
  ident_file: qhb_ident.conf
  database: qhb
  config_base_name: qhb

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false

qhb1.yml for the second node of qhb-patroni cluster:

scope: cluster
# namespace: /service/
name: qhb1

restapi:
  listen: 127.0.0.1:8008
  connect_address: 127.0.0.1:8008

etcd:
  # Use «hosts» to provide multiply	workstations
  hosts:
    - X.X.X.101:2379
    - X.X.X.102:2379
    - X.X.X.103:2379

  #consul:
  #host: X.X.X.105:8500

bootstrap:
  # this section will be stored in etcd/Consul:/<namespace>/<application>/config
  # after the new cluster initialization, and all the other cluster members will
  # use it as «global configuration»
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
  # some desired parameters for 'initdb'
  initdb:  # Note: It should be a list (some parameters
    # require values and other is flags)
    - encoding: UTF8
    - data-checksums

  qhb_hba:  # Add the following strings to qhb_hba.conf after 'initdb' launching
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator X.X.X.0/24 md5
    - host all all 0.0.0.0/0 md5

  # Some additional users that you need to create after initializing the new cluster
  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb
postgresql:
  listen: 127.0.0.1,X.X.X.105,::1:5432
  connect_address: X.X.X.105:5432
  data_dir: /mnt/patroni
  bin_dir: /usr/local/qhb/bin
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: qhb
      password: qhb
    rewind:
      username: rewind_user
      password: rewind_password
  parameters:
    unix_socket_directories: '.'
  # QHB parameters
  hba_file: qhb_hba.conf
  ident_file: qhb_ident.conf
  database: qhb
  config_base_name: qhb

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false

To check functionality you need to check the configuration and, if necessary, to correct all errors, for example, if the first cluster node configuration is located in /opt/patroni/:

patroni --validate-config /opt/patroni/qhb0.yml

If necessary, create the qhb-patroni service on each of the nodes for convenient qhb-patroni cluster control. The service is located in /usr/lib/systemd/system/qhb-patroni.service.

# This is an example systemd configuration file for Patroni
# You can copy it to "/etc/systemd/system/patroni.service",

[Unit]
Description=Runners to orchestrate a high-availability QHB
After=syslog.target network.target

[Service]
Type=simple

User=qhb
Group=qhb

# Read in configuration file if any, otherwise continue
EnvironmentFile=-/etc/patroni_env.conf

# WorkingDirectory = /var/lib/pgsql

# Where to send early-launch messages from the server
# This is usually determined by a general setting in systemd
#StandardOutput=syslog

# Precommands for launching the watchdog
# Convert comment if watchdog is part of your patroni installation
#ExecStartPre=-/usr/bin/sudo /sbin/modprobe softdog
#ExecStartPre=-/usr/bin/sudo /bin/chown qhb /dev/watchdog

# Launch the patroni process
# See examples of the configuration files for QHB in /usr/share/doc/patroni-3.0.2/qhb*.yml
ExecStart=/usr/bin/patroni /opt/qhb1.yml

# Send HUP to load from patroni.yml
ExecReload=/usr/bin/kill -s HUP $MAINPID

# pause the patroni process but not its child processes so that it gradually stops qhb
KillMode=process

# wait enough time for the server to start/stop
TimeoutSec=30

# Do not restart the service in case of fail; we want to check the database for errors manually
Restart=no

[Install]
WantedBy=multi-user.target


Launching Components

Launching etcd

Fist of all make sure that the required rules were added to firewall.

To launch etcd via the service on all nodes, you need to execute systemctl start etcd. The command will be completed when the nodes find each other, i.e. when the command will be executed on the second and third host. When re-creating a cluster, you must clear the etcd databases beforehand.

You can exam cluster structure using etcdctl member list.

Launching Consul

Fist of all make sure that the required rules were added to firewall.

To launch Consul via the service on all server and client nodes, you need to execute systemctl start consul.

At any cluster member address http(s), such as http://X.X.X.101:8500, you can exam the cluster structure (servers and clients) and current leader. See Consul official documentation for the details.

Launching Patroni

Fist of all make sure that the required rules were added to firewall.

You should launch Patroni only after launching distributed storage — etcd or consul.

To launch the first qhb-patroni node you need to execute patroni /opt/qhb0.yml as user qhb. If the first node is successfully launched, also start the remaining nodes specifying the appropriate configuration files.

To launch qhb-patroni via the service on all nodes, you need to execute systemctl start qhb-patroni.

You can exam qhb-patroni cluster state and structure on any node, for example patronictl -c /opt/qhb0.yml list.

During normal operation you will see State — running for all cluster members.


Running, Failover

When the database system on host 104 shuts down, the database system on host 105 becomes the primary one.

When the database system on host 104 is put into operation, it becomes a standby one.

When master on host 105 shuts down, host 104 becomes the primary one.


See Also