Galera cluster recovery
TPE-HA: Mariadb/Galera is unable to bootstrap
If the MariaDB cluster stay down after a disruption (all tpe nodes down) and the SQL container on tpe_node1 or tpe_node2 always restart with this kind of log :
[support@tpe-node1 ~]$ docker logs -f $(docker ps -q --filter name=sql_node)
[...]
INFO: Reporting seqno: -1 to Zookeeper store.
[...]
ERROR: A unaivalable node have backuped a higher seqno, can't bootstrap.
SOLUTION:
Follow next step to recover by re bootstrapping the cluster.
Retrieve the seqno for sql_node1 and sql_node2 using :
[support@tpe-node1 ~]$ docker exec -it $(docker ps -q --filter "name=zk_node") zkCli.sh get /galera/tpe/nodes/sql_node1/seqno
Connecting to localhost:2181
2023-01-10 19:17:23,870 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
[...]
Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x10000016c1704e0, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
1645
cZxid = 0x5700007423
ctime = <Date>
mZxid = 0x5900002efb
mtime = <Date>
pZxid = 0x5700007423
cversion = 0
dataVersion = 1850
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0
The seqno is 1645 for this sql_node1 node.
And :
[support@tpe-node1 ~]$ docker exec -it $(docker ps -q --filter "name=zk_node") zkCli.sh get /galera/tpe/nodes/sql_node2/seqno
Connecting to localhost:2181
2023-01-10 19:17:23,870 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
[...]
Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x10000016c1704e0, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
1643
cZxid = 0x5700007423
ctime = <Date>
mZxid = 0x5900002efb
mtime = <Date>
pZxid = 0x5700007423
cversion = 0
dataVersion = 1850
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0
The seqno is 1643 for this sql_node2 node.
Cluster should be bootstrapped with the node with the highest seqno:
If this node is the sql_node1 :
docker exec -it $(docker ps -q --filter "name=zk_node") zkCli.sh create /galera/tpe/forceboot ""
docker exec -it $(docker ps -q --filter "name=zk_node") zkCli.sh create /galera/tpe/forceboot/node sql_node1
Else :
docker exec -it $(docker ps -q --filter "name=zk_node") zkCli.sh create /galera/tpe/forceboot ""
docker exec -it $(docker ps -q --filter "name=zk_node") zkCli.sh create /galera/tpe/forceboot/node sql_node2
The Galera cluster restarts.
Update procedure fails with "current TPE image is not present anymore" message
SYMPTOM: If the TPE update procedure fails with the following error:
NOTE : This error may append when the current TPE image is not present anymore on TPE host.
For more details, please consult the TPE documentation.
SOLUTION:
The solution is to do a redeploy (TPE Service -> TPE Cluster operations -> Redeploy cluster)