This is the multi-page printable view of this section. Click here to print.
Configure
1 - Integrating with Grafana
Introduction
The Resiliency Operator exports metrics in Prometheus format that can be visualized using custom Grafana dashboards.
Prerequisites
- Prometheus installed in the Kubernetes cluster
- Grafana configured to access the Prometheus
Process
1. Import dashboard for Assets
Access to Grafana and navigate to Home > Dashboards > Import.
Set the dashboard URL to https://astronetes.io/deploy/resiliency-operator/v1.4.0/grafana-dashboard-assets.json and click Load.
Configure the import and click con Import button to complete the process.
2. Import dashboard for Synchronizations
Access to Grafana and navigate to Home > Dashboards > Import.
Set the dashboard URL to https://astronetes.io/deploy/resiliency-operator/v1.4.0/grafana-dashboard-synchronizations.json and click Load.
Configure the import and click con Import button to complete the process.
2 - Integrating with Grafana Operator
Introduction
The Resiliency Operator exports metrics in Prometheus format that can be visualized using custom Grafana dashboards.
Prerequisites
- Prometheus installed in the Kubernetes cluster
- Grafana Operator installed in the cluster and configured to access the Prometheus
Process
1. Create the GrafanaDashboard for Assets
Create the GrafanaDashboard for Assets from the release manifests:
kubectl apply -f https://astronetes.io/deploy/resiliency-operator/v1.4.0/dashboard-assets.yaml
2. Create the GrafanaDashboard for Synchronizations
Create the GrafanaDashboard for Synchronizations from the release manifests:
kubectl apply -f https://astronetes.io/deploy/resiliency-operator/v1.4.0/dashboard-synchronizations.yaml
3 - Integrating with OpenShift Alerting
Introduction
OpenShift allows the creation of alerts based on Prometheus metrics to provide additional information about the functioning and status of Astronetes operator.
Prerequisites
- Access Requirement: cluster-admin access to the OpenShift cluster
Configure alerts
Two types of alerts are provided for managing the operator’s integration within the cluster and for monitoring the synchronization
Platform alerts
Metrics defined to assess the functionality of the integration between the product and the assets
Applying these rules:
oc apply -f https://astronetes.io/deploy/resiliency-operator/v1.4.0/alert-rules-resiliency-operator.yaml
Synchronization alerts
Metrics are employed to assess the status of synchronized objects.
For configuring this rule its necesary to follow these steps:
- Create this PrometheusRule manifest:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: failed-synchronize-items
namespace: <your-synchronization-namespace>
spec:
groups:
- name: synchronization-alerts
rules:
- alert: SynchronizationNotInSync
annotations:
summary: "There are synchronization items not in sync."
description: "Synchronization {{ $labels.synchronizationName }} is out of sync in namespace {{ $labels.synchronizationNamespace }}"
expr: astronetes_total_synchronized_objects{objectStatus!="Sync"} > 0
for: 1h
labels:
severity: warning
- alert: WriteOperationsFailed
annotations:
summary: "There are one or more write operations failed"
description: "Synchronization {{ $labels.synchronizationName }} failed write operator in namespace {{ $labels.synchronizationNamespace }}"
expr: astronetes_total_write_operations{writeStatus="failed"} > 0
for: 1h
labels:
severity: warning
Edit namespace: Use the namespace where synchronizes are deployed
Applying this rule:
kubectl apply -f <path-to-your-modified-yaml-file>.yaml
How to configure custom alerts
Prometheus provides a powerful set of metrics that can be used to monitor the status of your cluster and the functionality of your operator by creating customized alert rules.
The PrometheusRule should be created in the same namespace as the process that generates these metrics to ensure proper functionality and visibility.
Here is an example of a PrometheusRule YAML file:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: <alert-name>
namespace: <namespace>
spec:
groups:
- name: <group-name>
rules:
- alert: <alert-rule-name>
annotations:
description: <description>
summary: <summary>
expr: <expresion>
for: <duration>
labels:
severity: <severity-level>
Field Value Descriptions
In the PrometheusRule YAML file, several fields are essential for defining your alerting rules. Below is a table describing the values that can be used for each field:
| Field | Description | Example Values |
|---|---|---|
| alert | Specifies the name of the alert that will be triggered. It should be descriptive. | AssetFailure, HighCPUUsage, MemoryThresholdExceeded |
| for | Defines the duration for which the condition must be true before the alert triggers. | 5m, 1h, 30s |
| severity | Indicates the criticality of the alert. Helps prioritize alerts. | critical, warning, info |
| expr | The Prometheus expression (in PromQL) that determines the alerting condition based on metrics. | sum(rate(http_requests_total[5m])) > 100, node_memory_usage > 90 |
Apply to the cluster
Create new prometheus rule in the cluster:
oc apply -f <path-to-your-prometheus-rule-file>.yaml
Checking alerts
1. Access OpenShift Web Console:
- Open your browser and go to the OpenShift web console URL.
- Log in with your credentials.
2. Navigate to Observe:
- In the OpenShift console, go to the Observe section from the main menu.
- In the Alerts tab, you’ll find a list of active and silenced alerts.
- Check for any alerts triggered based on the custom rules you can create in Prometheus.
- Also you can see the entire list of alerting rules configurated.
3. Filter Custom Alerts:
- To filter the custom alerts, use the source field and set its value to user. This will display only the alerts that were generated based on user-defined rules. Check openshift docs about filtering.
4 - Update license key
There is no need to reinstall the operator when updating the license key.
Process
1. Update the license key
Update the Kubernetes Secret that stores the license key with the new license:
kubectl -n resiliency-operator apply -f new-license-key.yaml
oc -n resiliency-operator apply -f new-license-key.yaml
2. Restart the Resiliency Operator
Restart the Resiliency Operator Deployment to apply the new license:
kubectl -n resiliency-operator rollout restart deployment resiliency-operator-bucket-controller
kubectl -n resiliency-operator rollout restart deployment resiliency-operator-database-controller
kubectl -n resiliency-operator rollout restart deployment resiliency-operator-kubernetescluster-controller
kubectl -n resiliency-operator rollout restart deployment resiliency-operator-livesynchronization-controller
kubectl -n resiliency-operator rollout restart deployment resiliency-operator-synchronization-controller
kubectl -n resiliency-operator rollout restart deployment resiliency-operator-synchronizationplan-controller
kubectl -n resiliency-operator rollout restart deployment resiliency-operator-task-controller
kubectl -n resiliency-operator rollout restart deployment resiliency-operator-taskrun-controller
oc -n resiliency-operator rollout restart deployment resiliency-operator-bucket-controller
oc -n resiliency-operator rollout restart deployment resiliency-operator-database-controller
oc -n resiliency-operator rollout restart deployment resiliency-operator-kubernetescluster-controller
oc -n resiliency-operator rollout restart deployment resiliency-operator-livesynchronization-controller
oc -n resiliency-operator rollout restart deployment resiliency-operator-synchronization-controller
oc -n resiliency-operator rollout restart deployment resiliency-operator-synchronizationplan-controller
oc -n resiliency-operator rollout restart deployment resiliency-operator-task-controller
oc -n resiliency-operator rollout restart deployment resiliency-operator-taskrun-controller
3. Wait for the Pods restart
Wait a couple of minutes until all the Resiliency Operator Pods are restarted with the new license.
kubectl -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-bucket-controller
kubectl -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-database-controller
kubectl -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-kubernetescluster-controller
kubectl -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-livesynchronization-controller
kubectl -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-synchronization-controller
kubectl -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-synchronizationplan-controller
kubectl -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-task-controller
kubectl -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-taskrun-controller
oc -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-bucket-controller
oc -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-database-controller
oc -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-kubernetescluster-controller
oc -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-livesynchronization-controller
oc -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-synchronization-controller
oc -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-synchronizationplan-controller
oc -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-task-controller
oc -n resiliency-operator wait --for=condition=available deployment/resiliency-operator-taskrun-controller