Metric Server Operation in Separate Collector and Aggregator Mode

The metric server consists of two components:

  • Collector. This part is responsible for the direct metric exchange between applications (for example, DBMS «Quantum Hybrid Base») and the metric server. Sending metrics is performed through shared memory, which is managed by the metric server. This component does not process metrics, only collects them from multiple sources. Then the metrics are sent to the second component — aggregator.
  • Aggregator. As the name suggests, this component aggregates, i.e., accumulates metrics and performs statistical calculations: averaging, calculating percentiles, maximum values, etc. This component also sends aggregated metrics further, for example, to Graphite, or writes them to a CSV file.

By default, the metric server runs in a mode that combines both a collector and an aggregator. This architecture allows minimizing the costs of data sending; it is performed through RAM.

However, you can run the metric server in a mode where only one of the components is active, and they can be distributed across different physical devices. In this case, the collected metrics will be sent to the aggregator using the UDP protocol. Note that UDP does not guarantee package delivery, contact your system administrator to assess the risks. This issue is beyond the scope of this guide.

WARNING!
The metric server in collector mode must still run on the same host as the application being analyzed, since metrics are exchanged through shared memory. There is no technical possibility to move the collector to a separate host.



Possible Reasons for Moving the Metric Server Aggregator to a Separate Host

Separation of collector and aggregator is not necessary, but in some cases their separation can be useful, namely:

  • The final recipient of aggregated metrics (e.g. Graphite) may be unavailable from the host where the analyzed application is running.
  • The analyzed application is distributed across different hosts, and you want to aggregate metrics across all hosts "through and through", without taking this difference into account.
  • To reduce the load on the host where the metric server is running. Note, however, that metric aggregation does not use many resources, so such optimization is questionable.

This list is not exhaustive, but is intended to describe possible applications.



Setting Up a Metric Server Aggregator on a Separate Host

You can start the metric server in collector mode in one of the following ways, from more to less preferable:

  • Remove or comment out the entire aggregation: section in the /etc/metricsd/config.yaml configuration file.
  • /usr/bin/metricsd --only-collector --config /etc/metricsd/config.yaml, where /etc/metricsd/config.yaml is the path to the configuration file.
  • Using an additional file /etc/systemd/system/metricsd.service.d/override.conf (instead of override.conf you can choose any other name ending with .conf) with the following content:
[Service]
ExecStart=
ExecStart=/usr/bin/metricsd --only-collector --config /etc/metricsd/config.yaml
  • By modifying the metricsd.service file and adding the --only-collector flag to the metricsd call in the ExecStart= field. This method is not recommended. Use it only if you know exactly what you are doing, and at your own risk.

Collector in this case is simply a channel for transmitting metrics to another host to the aggregator. Aggregator is configured to connect to the corresponding server (Prometheus, Graphite, etc.).

The metric server, operating in aggregator mode, is installed on a separate server, see Chapter Installation.

You can start the metric server in aggregator mode in one of the following ways, from more to less preferable:

  • Remove or comment out the entire collection: section in the /etc/metricsd/config.yaml configuration file.
  • /usr/bin/metricsd --only-aggregator --config /etc/metricsd/config.yaml, where /etc/metricsd/config.yaml is the path to the configuration file.
  • Using an additional file /etc/systemd/system/metricsd.service.d/override.conf (instead of override.conf you can choose any other name ending with .conf) with the following content:
[Service]
ExecStart=
ExecStart=/usr/bin/metricsd --only-aggregator --config /etc/metricsd/config.yaml
  • By modifying the metricsd.service file and adding the --only-aggregator flag to the metricsd call in the ExecStart= field. This method is not recommended. Use it only if you know exactly what you are doing, and at your own risk.

    IMPORTANT!
    Make sure that:

    • in the "aggregation" section of the metric server configuration, operating in the aggregator mode, such listen_address is specified that data retrieval to it will be available from the metric server operating in the collector mode;
    • in the "collection" section of the metric server configuration, operating in the collector mode, the correct aggregator_address is specified, matching to the address from the previous point;
    • to avoid fragmentation of UDP packages, the "collection" section of the metric server configuration operating in collector mode contains an appropriate udp_payload packet size. Please contact your system/network administrator to determine the appropriate size;
    • your network administrator has not disabled UDP packet transmission over the network.

Network configuration details are beyond the scope of this guide. If you have any questions, please contact your system and/or network administrator.


Example of the Collector Configuration File

# Verbosity level on the server.
verbosity: "warn"

# Metric collection configuration.
#
# If you don't want to start metric collection, you can skip this section.
collection:
  # The address and port (UDP) of the metric aggregator.
  # Optional if metric aggregation is performed on the same server instance;
  # otherwise required.
  aggregator_address: "127.0.0.1:5400"

  # Unix address of the collector's handshake server.
  # The `@` symbol at the beginning means an abstract socket (i.e. no socket is
  # created in the file system).
  bind_addr: "@metrics-collector"

  # The path where spsc queries are created
  queue_path: "/metrics-collector-queues"

  # Query receiver capacity.
  # Optional. Default is 1024.
  queue_capacity: 1024

  # The number of elements processed after which the shared counters are updated.
  # Optional. Default is up to 10.
  batch_size: 10

  # Number of threads processed.
  # Optional. Default is up to 2.
  threads: 2

  # UDP payload size.
  # Optional. Default is 1024.
  udp_payload: 1024

  # Maximum interval between sendings.
  # Optional. Default is "1s" (1 sec).
  send_interval: "1s"

  # Number of worker process threads.
  # Optional. Default is – 2.
  workers: 2

Example of Aggregator Configuration File

# Verbosity level on the server.
verbosity: "info"

# Metric aggregation configuration.
#
# If you don't want to start metric aggregation, you can skip this section.
aggregation:
  # Aggregator listening address
  # Optional if metrics are collected on the same server instance; otherwise required.
  listen_address: "0.0.0.0:5400"

  # Interval between each aggregation
  send_interval: 10s

  # List of percentiles to use for sync values. Defaults ​​are  [50, 90, 95, 99, 999].
  # Values ​​greater than 1000 are not supported.
  percentiles_list: [50, 90, 95, 99, 999]

  # The lifetime of gauge values. Optional; default is 5 minutes.
  gauge_lifetime: 5min

  # Server configuration: At least one server must be configured.
  backends:
    # graphite server configuration
    - graphite:
      # TCP Graphite endpoint address for the text protocol. Port by default 2003.
      # Only TCP protocol is available, so if Graphite is not expecting data on
      # this port, you will get an error!
      address: "Address:2003"
      # Prefix to start all metric names with. Optional; default is empty string.
      prefix: "Address"
      # Maximum connection timeout. Optional; default is 30 seconds.
      connection_timeout: "30 sec"
      # Maximum timeout for data sending. Optional; default is 5 seconds.
      send_timeout: "5 sec"