Metric Server

The metric server is used in the QHB infrastructure to collect, aggregate and send metrics to various systems:

  • Graphite monitoring system;
  • Prometheus time series database;
  • Vector metric and log collection agent;
  • to save metric data in CSV files.

The QHB metric list and ways of formation are contained in the Section QHB Metrics.

The metric server must be installed and configured on each machine running QHB components (the database server itself or QCP). See Chapter Installation for details.

If you desire, you can place the metric server aggregator on a separate host. See Section Metric Server Operation in Separate Collector and Aggregator Mode for details.



Metric Server Configuration

The sample configuration file is installed at /etc/metricsd/config-example.yaml path.

For server operation, you need to copy the server to /etc/metricsd/config.yaml and adjust the necessary parameters. The section aggregation → backends requires special attention. Setting up the metric server for different consumers is presented in the corresponding sections:

To automatically run the server during the system starting, activate the corresponding systemd service:

$ sudo systemctl enable --now metricsd.service

If the connection between the metric server and QHB is lost (occurs when the metric server service is restarted), resulting in a decrease in the number of incoming metrics and "Failed to open a metric sender" notifications appearing in the log, the administrator must either restart QHB or call the following function in QHB to reconnect the metric server to the database:

SELECT metrics_reset();

After executing the command, you need to make sure that the number of incoming metrics has returned to the original volume (restoration should occur within 1 minute).

In the database configuration, you need to change the string

#metrics_collector_path = 'path/to/metrics`

to

metrics_collector_path = '@metrics-collector'

And then restart the database in the following order:

  • start metricsd;
  • start the database;
  • start the system for metric receiving (Graphite, Prometheus, etc.).

Metric Server Configuration for Graphite

Graphite installation and configuration are beyond the scope of this documentation. Please refer to the Graphite documentation.

  # Server configuration. At least one server must be configured.
  backends:
    # graphite server configuration
    - graphite:
      # TCP Graphite terminal address for the text protocol. Default port is 2003.
      # Only TCP protocol is available, so if Graphite does not accept a connection
      # on this port, it will throw an error!
      address: "graphite:2003"
      # A prefix added to each metric name. Optional; the default is empty string.
      prefix: ""
      # Connection timeout. Optional; default is 30 seconds.
      connection_timeout: "30 sec"
      # Timeout for sending data. Optional; default is 5 seconds.
      send_timeout: "5 sec"

Change the address parameter to the actual address of the Graphite server in your network. It is also recommended to change the value of the prefix parameter to, for example, the name of the machine which the server is running on. This prefix will be added to all generated metrics.


Metric Server Configuration for Generating CSV Files

  # Server configuration. At least one server must be configured.
  backends:
    # CSV server process configuration (if you need to copy metric data to CSV files)
    - csv:
      # Directory containing CSV files.
      directory: "/var/lib/qhb/csv_create"
      # QHB instance system identifier.
      qhb_instance: "instance_name"
      # Using metric name prefix as the instance identifier
      prefix_as_qhb_instance: true
      # CSV file switching interval (optional).
      rotation_age: "1 h"
      # time setting for the next switch after start (optional).
      rotation_offset: 2022-03-01T18:00:00

The CSV server process configuration parameters are required if you want to save metric data to CSV files. The following parameters are used for this:

  • directory — specifies the CSV files location.

  • qhb_instance — instance system identifier; cannot be empty and contain quotes, somewhat similar to the prefix parameter, but its value fills a separate column in the metric table when loading metric data, not related to the metric name. Can be useful in the cases when there are multiple instances of QHB on the same server.

  • prefix_as_qhb_instance — whether to use the metric name prefix as the instance identifier. If set to false, the value of the qhb_instance parameter is used for the instance identifier. If set to true, the metric name prefix is ​​used as the instance identifier. The value for qhb_instance can be left empty (qhb_instance: ""). The value true is useful when the metric aggregator processes data coming from multiple QHB instances (see Section Metric Server Operation in Separate Collector and Aggregator Mode). On the collector, for each instance, you must set the metric name prefix matching the unique identifier of the instance via the prefix parameter. In this case, the aggregator creates separate CSV files for each instance. When further processed via QDLM, each file is loaded into a separate partition of the general metric table metric_archive. The priority of this parameter is higher than the that of the qhb_instance parameter, so when prefix_as_qhb_instance is set to true, the metric name prefix will be taken as the instance identifier, even if qhb_instance is set to some value. If the metric name prefix is ​​not specified in the prefix parameter on the collector side, the part of the metric name from the beginning to the first dot will be erroneously used as the prefix. Therefore, when using the true value for this parameter, you must carefully monitor the setting of metric name prefixes for all collectors whose data is processed by the aggregator. The described configuration allows to unify the names of instances for metric data in Grafana and in the metrics_archive table.

  • rotation_age — CSV file switching period. Units of measurement can be specified, for example, as seconds (s), minutes (m), hours (h), days (d).

  • rotation_offset — this parameter specifies the time to switch to the next CSV file after starting the metric server. The time is specified for the UTC time zone. This parameter is useful for correcting the beginning of the next period, for example, at the beginning of the next hour.


Metric Server Configuration for Vector

The Vector log message collector installation and configuration are beyond the scope of this documentation. Please refer to the Vector documentation

Example of setting up the configuration of the Vector data source for the QHB metric server:

sources:
  qhb_metricsd:
    type: "http_server"
    address: "metricsd:80"
    acknowledgements: false
    encoding: "ndjson"

To set up metric server parameters:

  # Server configuration. At least one server must be configured.
  backends:
    # Vector server process configuration
    - vector.dev:
        # URL of a vector listener of type "http_server" configured to receive
        # data in "ndjson" format.
        url: "metricsd:80"

Change the metricsd value in the address and url parameters to the actual address of the metric server in your network.


Metric Server Configuration for Prometheus

Prometheus installation and configuration are beyond the scope of this documentation. Please refer to the Prometheus documentation.

Example of Prometheus configuration changes for the QHB metric server:

scrape_configs:
  # The job name (job_name) is added as a label `job=<job_name>` to all time series
  # extracted from this configuration.

  - job_name: 'metricsd'
     # metrics_path defaults to '/metrics'
     # scheme defaults to 'http'.

  static_configs:
    - targets: ['metricsd:8080']

To set up metric server parameters:

  # Server configuration. At least one server must be configured.
  backends:
    # Prometheus server process configuration:
    - prometheus:
        # The address that will be exposed for scanning by the Prometheus process.
        listening_address: "metricsd:8080"

Change the metricsd value in the targets and listening_address parameters to the actual address of the metric server in your network.

WARNING!
When sending metrics to Prometheus, all dots presented in the metric name are replaced by "_" by the metric server.