System monitoring (metrics for Prometheus)

In many projects, Prometheus and Grafana are used to monitor the systems set up. Some metrics can also be queried by the BPC.

Endpoint and security

The metrics can be queried by Prometheus as usual via the /metrics endpoint (http://localhost:8181/metrics). However, the endpoint can only be accessed via Basic Authentication.

BPC configuration : Basic Authentication

When the BPC is started, the user name and password for Basic Authentication are written to the BPC configuration file [karaf]/etc/de.virtimo.bpc.core.cfg if not already present. virtimo is specified as the user name and the password is randomly generated. These can be adapted as required and activated with a restart of the Karaf or just the BPC core bundle.

en.virtimo.bpc.core.cfg
...
de.virtimo.bpc.metrics.basic_auth.username = virtimo
de.virtimo.bpc.metrics.basic_auth.password = 0e6a8052eb814aaea397c45f77bffcf7
...

Use the following quick test to check whether access is possible: curl "http://virtimo:0e6a8052eb814aaea397c45f77bffcf7@localhost:8181/metrics"

Prometheus Configuration : Basic Authentication

To enable Prometheus to retrieve the BPC metrics protected by Basic Authentication, the Prometheus Scrape must be notified of this. See also Prometheus Config documentation

prometheus.yml
...
  - job_name: 'bpc'
    scrape_interval: 5s
    basic_auth:
      username: virtimo
      password: 0e6a8052eb814aaea397c45f77bffcf7
    static_configs:
      - targets: ['localhost:8181']
        labels:
          alias: bpc1
...

Available metrics

The following metrics are available.

JVM

The standard JVM metrics provided by the Prometheus Java Client.

Backups

Metric Description

bpc_backups_scheduled_jobs

Number of backup jobs to be executed.

Client Sessions

Metric Description

bpc_user_sessions_total

Number of currently active user sessions.

bpc_user_sessions_limit

Maximum number of user sessions that can be active at the same time. This is defined via the BPC license.

bpc_websocket_connections_total

Number of currently open websocket connections.

OpenSearch BPC Plugin

Metric Description

bpc_os_bpc_plugin_status_websocket

Status of the websocket connection from Karaf to the OpenSearch Plugin:

  • 0 = Disconnected

  • 1 = Connected

  • 2 = Disconnected due to error

bpc_os_bpc_plugin_status_plugin

Status of the HTTP/HTTPS connection from Karaf to the OpenSearch plugin:

  • 0 = Unknown (no call has been made yet)

  • 1 = Callable

  • 2 = Cannot be reached (reason could be that the plugin is not installed)

  • 3 = Errors occurred during the call

General

Metric Description

bpc_jvm_uptime_in_milliseconds

Uptime of the BPC JVM in milliseconds.

bpc_maintenance_mode_enabled

Status of the maintenance mode:

  • 1 = enabled

  • 0 = disabled

bpc_license_status

Status of the BPC license:

  • 0 = None available

  • 1 = Valid

  • 2 = Expired

bpc_license_expires_in_days

Number of days until the license expires.

bpc_license_expires_on_utc_timestamp_millis

UTC Timestamp in milliseconds at which the license expires.

bpc_modules_state

Status of all BPC modules:

  • 1 = Active

  • 0 = Error

It returns the status Error if at least one BPC module is in the Karaf state 'Resolved' or 'Failure'.

Replication

Metric Description

bpc_replication_number_of_jobs

Number of all replication jobs.

bpc_replication_number_of_enabled_jobs

Number of enabled replication jobs.

bpc_replication_number_of_disabled_jobs

Number of disabled replication jobs.

bpc_replication_number_of_jobs_with_errors

Number of replication jobs with errors.

Database Connection Pool

Metric Description

karaf_dbcp2_pool_connections_active

Number of active pool connections.

karaf_dbcp2_pool_connections_waiters

Number of people waiting for a free pool connection.

karaf_dbcp2_pool_connections_idle

Number of unused pool connections.

karaf_dbcp2_pool_connections_max_total

Maximum total number of pool connections.

karaf_dbcp2_pool_connections_borrowed

Number of borrowed pool connections.

karaf_dbcp2_pool_connections_returned

Number of returned pool connections.

karaf_dbcp2_pool_connections_mean_borrow_wait_time

Average wait time in milliseconds for a pool connection.

karaf_dbcp2_pool_connections_max_borrow_wait_time

Maximum wait time in milliseconds for a pool connection.

BPC modules

Metric Description

bpc_module

Information about BPC modules such as bundle ID, name, symbolic name, file name, version, build info and status.

The status can have the following values:

  • 1 = Active

  • 0 = Error

The status is set to 0 (error) if the Karaf status of the BPC module is 'RESOLVED' or 'FAILURE'.


Keywords: