QDLM Loading Metric Data into a Database Module

The QDLM (Quantum Data Loader for Metrics) module is used to load metric data into the database. It is based on the direct data loading feature (pre-installation of QDL for QDLM operation is not required). The direct data loading feature works bypassing the standard mechanism and allows to reduce the time required for data loading several times. However, you should remember that this type of loading does not write data into the standard WAL files, and thus the data will not appear when restoring the database from a backup and then replaying the WAL. However, when using incremental backup, this data will be copied automatically.

Setting up Loading of Metric Data into a Database Table

Currently, it is possible to arrange both local processing of metric CSV files on the same QHB instance, and centralized processing of CSV files from multiple sources. The second case implies either writing files to the CSV file directory from several metric servers located on one host with several QHB instances, or copying CSV files from remote hosts using any operating system tools. A combination of these options is also possible.

Installing QDLM

Installation of QDLM is similar to other QHB modules. See Step-by-Step Guides for Initial Download, Installation and Launch for details.

Setting Parameters in the QDLM Parameter File

In the file /etc/qdlm/config.yml you need to specify values for the following parameters:

# maximum period for switching csv files (in minutes) coming from different
# metric servers is set to check previously generated csv files that
# could have been skipped during processing, and skip those that may not have been
# fully written yet
max_rotation_age: 60  

# CSV file directory
csv_directory: "/var/lib/qhb/csv_files"

# database connection parameters
qhb_connection: "host=localhost port=5432 user=qhb dbname=qhb"

# file containing direct data loading parameters
qdl_config: "/etc/qdlm/qdl.yml"

# data directory
qhb_data: "/var/lib/qhb/data"

Connection parameters are described in Section Connection Strings. QDLM unique feature is the ability to connect to the database only on the local host. In the current release this is caused by copying the generated table files directly to the database directory on the local file system. If you desire to use a separate database (for example, metrics) and a separate user for the database metric data (for example, metrics_user), you can execute the following set of commands from psql as a superuser:

CREATE DATABASE metrics;
CREATE USER metrics_user WITH login;
GRANT all ON database metrics to metrics_user;
\c metrics;
GRANT usage ON schema pg_toast to metrics_user;
GRANT all privileges ON all tables IN schema pg_toast to metrics_user;

After this, you can write the corresponding usernames and databases to the connection string:

# database connection parameters
qhb_connection: "host=localhost port=5432 user=metrics_user dbname=metrics"

File Containing Parameters for Metric Data Direct Load

The pre-configured file containing parameters for direct load of metric data is copied during QDLM installing and does not require changes. The default path to it is /etc/qdlm/qdl.yml.

Running QDLM

The metric data direct load is enable using the following command:

qdlm load -c /etc/qdlm/config.yml

Starting metric data direct load with the parameter -D /<PATH>/<QHB-DATA> reassigns the data directory.

When launched, the utility checks the directory specified in the csv_directory parameter and starts processing csv files that have been updated no later than max_rotation_age minutes ago. If several metric servers files are recorded to a directory, administrator sets maximum value in minutes from all values of the corresponding metric servers's rotation_age parameter as max_rotation_age. Thanks to this limit, all new files which may be written to will not be processed. After processing accumulated files, the utility becomes idle until the event of recording next file in the directory is finished, after which the new file is processed and its data are written to a new partition of the metric_archive table. In the current release the table scheme is predefined and does not change. After this, a check is executed to see if there are any files which events could have been accidentally missed (for example, this is possible with high system workload). If such files are found, they are processed, then the utility again becomes idle until the event of recording next csv file ends. Processed csv files are transferred to the qdlm_swap subdirectory of the directory specified in csv_directory. Processed csv files can be deleted or moved to some archive later.

Processing Results

After starting the CSV files processing, the first step is to check whether partitioned table metric_archive and sequence seq_metric_chunk in the public database schema are specified in the parameters. If these objects have not yet been created, they are created automatically.

Table 1. metric_archive Columns

Column Type	Description
instance_id text	The value specified in the qhb_instance parameter of the metric server. In general, data can come from different sources and belong to different databases.
metric_dt timestamp	Date and time of the metric generation
metric_type_id smallint	Metric type: 0 — Counter, 1 — Gauge, 2 — Timer
metric_name text	Metric name
metric_name_ext text	Aggregate names (std, max, min, sum, count, median, and percentiles) for metrics of Timer type
metric_value double precision	Metric value

Each CSV file's data is processed in the same way as data is processed during loading a table via QDL. After processing the next file in the database the table metric_archive_N appears, where N is the number from sequence seq_metric_chunk. After processing is over, the table becomes a partition of the partitioned table metric_archive. The partition proceeds over two columns: instance_id and metric_dt. Upon joining a table as partition, the minimum and maximum boundaries of the metric_dt column values are specified. Later, when accessing the metric_archive table, specifying conditions for instance_id and metric_dt will automatically select the corresponding partitions.