Skip to main content

Galera cluster recovery

TPE-HA: Mariadb/Galera is unable to bootstrap

If the MariaDB cluster stay down after a disruption (all tpe nodes down) and the SQL container on tpe_node1 or tpe_node2 always restart with this kind of log :

[support@tpe-node1 ~]$ docker logs -f $(docker ps -q --filter name=sql_node)
[...]
INFO: Reporting seqno: -1 to Zookeeper store.
[...]
ERROR: A unaivalable node have backuped a higher seqno, can't bootstrap.

SOLUTION:

Follow next step to recover by re bootstrapping the cluster.

Retrieve the seqno for sql_node1 and sql_node2 using :

[support@tpe-node1 ~]$ docker exec $(docker ps -q -f 'name=zk_node') java -Xmx256m org.apache.zookeeper.ZooKeeperMain get /galera/tpe/nodes/sql_node1/seqno
Connecting to localhost:2181

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
1645

The seqno is 1645 for this sql_node1 node.

And :

[support@tpe-node1 ~]$ docker exec $(docker ps -q -f 'name=zk_node') java  -Xmx256m org.apache.zookeeper.ZooKeeperMain get /galera/tpe/nodes/sql_node2/seqno
Connecting to localhost:2181

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
1643

The seqno is 1643 for this sql_node2 node.

Cluster should be bootstrapped with the node with the highest seqno:

If this node is the sql_node1 :

docker exec $(docker ps -q -f 'name=zk_node') java  -Xmx256m org.apache.zookeeper.ZooKeeperMain create /galera/tpe/forceboot ""
docker exec $(docker ps -q -f 'name=zk_node') java -Xmx256m org.apache.zookeeper.ZooKeeperMain create /galera/tpe/forceboot/node sql_node1

Else :

docker exec $(docker ps -q -f 'name=zk_node') java  -Xmx256m org.apache.zookeeper.ZooKeeperMain create /galera/tpe/forceboot ""
docker exec $(docker ps -q -f 'name=zk_node') java -Xmx256m org.apache.zookeeper.ZooKeeperMain create /galera/tpe/forceboot/node sql_node2

The Galera cluster restarts.

Update procedure fails with "current TPE image is not present anymore" message

SYMPTOM: If the TPE update procedure fails with the following error:

NOTE : This error may append when the current TPE image is not present anymore on TPE host.
For more details, please consult the TPE documentation.

SOLUTION:

The solution is to do a redeploy (TPE Service -> TPE Cluster operations -> Redeploy cluster)