This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Architecture

Astronetes Resiliency Operator architecture

1 - Overview

Resiliency Operator architecture

Resiliency Operator acts as the orchestrator that setup and manages the resiliency of Cloud Native platforms, automating processes and synchronizing data and configurations across multiple technologies.

It is built with a set of plugins that enables to integrate many technologies and managed services in the resiliency framework.

Key concepts

Assets

Platforms, technologies and services can be linked to the Resiliency Operator to be included in the resiliency framewor, like Kubernetes clusters and databases.

Synchronizations

The synchronization of data and configurations can be configured according to the platform requirements.

Synchronization NameDescription
SynchronizationSynchronize data and configurations only once.
SynchronizationPlanSynchronize data and configurations based on a scheduled period.
LiveSynchronizationReal-time synchronization of data and configurations.

Automation

The Resiliency Operator allows the automation of tasks to be executed when an incident or a disaster occurs.

2 - Components

Resiliency Operator Components

Astronetes Resiliency Operator is software that can be deployed on Kubernetes based clusters. It is composed by a set of controllers that automate and orchestrate the resiliency of Cloud Native platforms.

Operator

ControllersDescription
BucketOrchestrate the Bucket obejcts.
DatabaseOrchestrate the Database obejcts.
Kubernetes ClusterOrchestrate the KubernetesCluster obejcts.
Live SynchronizationOrchestrate the LiveSynchronization obejcts.
Synchronization PlanOrchestrate the SynchronizationPlan obejcts.
SynchronizationOrchestrate the Synchronization obejcts.
Task RunOrchestrate the TaskRun obejcts.
TaskOrchestrate the Task obejcts.

3 - Observability

Metrics and alerting for Astronetes

Astronete provides monitoring capabilities by exposing various performance and operational metrics. These metrics allow to gain insight into the system’s health, performance, and behavior, ensuring that you can take proactive measures to maintain system stability.

Metrics

The metrics are exposed in Prometheus format, which is a widely-adopted open-source standard for monitoring. This format enables seamless integration with Prometheus-based monitoring solutions.

Assets by status

The status of each asset managed by the operator: KubernetesClusters, Buckets and Databases.

Prometheus metric: astronetes_asset_status.

Status values: Ready, Progressing, Terminating, Unknown or Failed.

Synchronizations by status

The status of each synchronization object: Synchronization, SynchronizationPlan and LiveSynchronization.

Prometheus metric: astronetes_synchronization_status.

Status values: Ready, Progressing, Terminating, Unknown or Failed.

Total synchronized objects by status

The count of synchronized objects by status.

Prometheus metric: astronetes_total_synchronized_objects.

Status values: Sync, OutOfSync or Unknown.

Alerts

Based on the exposed metrics, alerting can be configured using the widely-adopted open-source standard PrometheusRules. This format enables seamless integration with Prometheus-based monitoring solutions.

Platform alerts

The following alerts reports a possible issue with the platform.

Alert NameDescriptionSeverityDuration
AssetFailureAt least one asset is failingcritical5 minutes
SynchronizationFailureAt least one synchronization is failingcritical5 minutes

Applications alerts

The following alerts reports a possible issue with the objects configured to be synchronized. Those alerts are usually related to applications issues.

Alert NameDescriptionSeverityDuration
SynchronizationNotInSyncThere are synchronizations items out of syncwarning1 hour
WriteOperationsFailedOne or more write operations failedwarning1 hour

4 - Audit

Parameters built into Resiliency Operator to track when a change was made and whom did it

Auditing and version control is an important step when configuring resources. Knowing when a change was made and the account that applied it can be determinative in an ongoing investigation to solve an issue or a configuration mismanagement.

Audit annotations

The following annotation are attached to every resource that belongs to Resiliency Operator Custom Resources:

apiVersion: automation.astronetes.io/v1alpha1
kind: LiveSynchronization
metadata:
  annotations:
    audit.astronetes.io/last-update-time: "<date>"         # Time at which the last update was applied.
    audit.astronetes.io/last-update-user-uid: "<uid-hash>" # Hash representing the Unique Identifier of the user that applied the change.
    audit.astronetes.io/last-update-username: "<username>" # Human readable name of the user that applied the change. 

Example:

apiVersion: automation.astronetes.io/v1alpha1
kind: LiveSynchronization
metadata:
  annotations:
    audit.astronetes.io/last-update-time: "2024-02-09T14:05:30.67520525Z"
    audit.astronetes.io/last-update-user-uid: "b3fd2a87-0547-4ff7-a49f-cce903cc2b61"
    audit.astronetes.io/last-update-username: system:serviceaccount:preproduction:microservice1

Fields are updated only when a change to the fields .spec, .labels or .annotations are detected. Status modifications by the operator are not recorded.