Skip to main content

Disaster recovery

Total disaster recovery

This procedure targets a global TPE cluster recovery in the following scenarios:

  • Loss of all TPE nodes (or one node in standalone mode).
  • Docker swarm cluster quorum loss when more than one node failed.

Before starting the recovery procedure, make sure to have a valid TPE Backup archive to be able to restore your data.

  1. Install the Operating System on the needed server(s).

  2. Redo the ThingPark Enterprise Cluster creation, Configuration and Deployment.

  3. Connect to Cockpit GUI.

  4. Go to the "TPE Backup" menu.

  5. Click on Restore.

  6. In "Restore source path", set the path where is mounted your TPE Backup archive

  7. Click on Next.

  8. Select the Backup you want to restore and click on Next.

  9. Wait until restoration is done.

    Here is the expected final output:

  10. Click on Close.

  11. To finish, go to the "TPE Services" menu and check that all services are well in running state and the TPE node(s) are in ready state.

High Availability node recovery

This procedure targets the recovery of one TPE node of a ThingPark Enterprise HA cluster (hardware failure).

  1. Install the Operating System.

  2. Connect to one of the running TPE nodes. Execute an ssh command to connect to the server:

    $ ssh support@${IP_OR_HOSTNAME_OF_TPE} -p 2222
  3. Remove the node lost from Docker Swarm (old node reference) by running the 2 following commands:

    $ docker node demote **<node_name>**
    $ docker node rm **<node_name>**

    Where <node_name> must be replaced by "tpe-node1", "tpe-node2" or "tpe-node3" following the node lost.

  4. Do the TPE Cluster Discover by running the following script:

    $ tpe-cluster-discover -i -c '{"hosts": [ {"ip": "<IP address node1>", "hostname":"tpe-node1", "sup_pass": "<support_password>" }, {"ip": "<IP address node2>", "hostname":"tpe-node2", "sup_pass": "<support_password>" }, {"ip": "<IP address node3>", "hostname":"tpe-node3", "sup_pass": "<support_password>" }]}'

    Where:

    • <IP address node> must be replaced by the IP address of each node.
    • <support_password> must be replaced by the support user password.
  5. Connect to Cockpit GUI (via a running TPE node, not the node under re-installation).

  6. Go to the "TPE Services" menu, under TPE cluster operations click on Redeploy cluster:

  7. You are prompted to confirm the cluster redeploy:

  8. Click on Confirm.

  9. Once the Cluster redeploy is done, go to the "TPE Configuration" menu.

  10. Click on Save & Apply.

  11. To finish, go to the "TPE Services" menu and check that all services are well in running state and the three nodes are in ready state.