Operator management

Actions to manage the operation

1: Pause a recovery plan
2: Recovering from a disaster
3: Workarounds

1 - Pause a recovery plan

How to pause a Recovery Plan.

Introduction

A RecoveryPlan can be paused in order to stop any operation in the source and destination cluster.

Requirements

Have access to the management cluster
Have configured the recovery plan.

Process

1. Pause the RecoveryPlan

Pause the RecoveryPlan using the following path operation:

kubectl patch recoveryplan <recovery_plan_name> -p '{"spec":{"suspend":true}}' --type=merge

2. Verify the RecoveryPlan status

Get the list of the defined RecoveryPlans:

kubectl get recoveryplan

The result should display the SUSPENDED column to true:

NAME      SUSPENDED   STATUS
example   true        Reconciled

3. Verify the containers

For the containers deployed in the cluster, you can verify the logs:

kubectl logs example-eventslistener-76c9889466-vrz7w

A log will appear indicating that RecoveryPlan is suspended:

Recovery plan is suspended

2 - Recovering from a disaster

How to recover the platform from a disaster

Introduction

In the circumstance that a disaster happens, the replicated contents can be recovered by using a RecoveryExecutionJob. Applying it will execute every recovery process set in the RecoveryPlan collection.

Requirements

Have access to the destination cluster
Have configured the recovery plan.

Process

1. Pause the RecoveryPlan

Pause the RecoveryPlan using the following path operation:

kubectl patch recoveryplan <recovery_plan_name> -p '{"spec":{"suspend":true}}' --type=merge

2. Identify the RecoveryExecutionPlan

Identify the RecoveryExecutionPlan configured in the previous step.

3. Deploy the RecoveryExecutionJob

Create the recoveryexecutionjob.yaml file with the following content:

apiVersion: dr.astronetes.io/v1alpha1
kind: RecoveryExecutionJob
metadata:
  generateName: <recovery_execution_plan_name>
  namespace: <namespace_name>
spec:
  recoveryExecutionPlanRef:
    name: <recovery_execution_plan_name>
    namespace: <namespace_name>

Deploy the RecoveryExecutionJob:

kubectl create -f recoveryexecutionjob.yaml

3 - Workarounds

Limitations and workarounds of Astronetes Disaster Recovery Operator.

Immutable parameters

Astronetes Disaster Recovery Operator synchronizes the state between two clusters by either creating new objects if they are missing from the destination cluster, by updating them if they already exist or by deleting them if they dissapear from the source cluster.

In most situations this behaviour is compatible with immutable parameters. Updates made to an immutable parameter will require deleting the object that contains it first to then recreate with the updated configuration. Astronetes Disaster Recovery Operator will detect the delete event and apply it before the recreation in the destination cluster automatically. There is no need for additional manual steps, the entire pipeline is managed by the Operator.

This is assuming that the RecoveryPlan is not paused. The Operator will fail to synchronize in the following situation:

A RecoveryPlan is paused in the management cluster.
An object that was selected for that RecoveryPlan is deleted and then recreated with updated configuraton in at least one immutable parameter.
The RecoveryPlan of the first step resumes operation.

The delete event was not detected by the RecoveryPlan when it was suspended so the object in the destination cluster was not deleted. Further events with the new configuration would not be able to be applied since they would read as an update to an immutable parameter.

In this case, the solution is to manually delete in the destination cluster every object with an updated immutable parameter that is selected by the previously suspended RecoveryPlan. The Operator will recreate them with the new configurations applied in the source cluster after the next resynchronization.