Rbytea
Rbytea — external storage of binary data.
rbytea Data Type
The rbytea data type is intended for storing binary data. It is similar to the bytea type, with the only difference being that the data itself is stored not in the database tablespace, but in external storage. In the current QHB version, the file system acts as external storage. This can be an external volume mounted at a specific point in the server file system or a symbolic link.
The main purpose of the extension is to move binary data from database tables to non-transactional storage, thereby unloading the database (binary data often has a large size, which fills a significant part of the total database size, complicating administration and maintenance).
The rbytea data type in the database record leaves only a small header, that contains service fields and a link to a file in external storage. The uuid data type is used as a link, and random IDs are generated using the capabilities of the pgcrypto module.
Data in external storage can be encrypted using the "Kuznechik" algorithm.
Installing the Extension
The extension is installed using the CREATE EXTENSION command:
CREATE EXTENSION rbytea CASCADE;
The installation must be run by the database superuser. This command must be run in the database in which the module is to be used. For the background process to work, it is necessary to ensure that the extension's shared library is preloaded by specifying the following parameter to the configuration:
shared_preload_libraries = 'librbytea'
See section below for description of the extension parameters.
Configuration
For the extension to run and work, you need to set several parameters in the configuration file.
rbytea.filesystem_storage_path (string)
Specifies the catalog (file system/volume mounting points) for the data image
storage.
The binary data will be stored in this server directory. For each database there
will be created subdirectory (according to the database OID), and it will include
multiple subdirectories with data files.
The subdirectory names from the database directory to the file will be rbytea
type uuid, and the file extension will be the number of the transaction in
which the data was first appeared in the system.
The directory should be available for user that has started database server to
read/write into.
By default, if the parameter is omitted or empty, binary data will be stored in
'<database_catalog>/rbytea' directory.
shared_preload_libraries (string)
Shared library loading at QHB start.
This parameter provides shared library loading at QHB start and
initializing of the background process
for old image cleanup. Otherwise old image automatic cleanup will not be performing.
The value 'librbytea' should be added.
Note
If shared_preload_libraries has already been set to other libraries loading, there is no need to overwrite its value. You must to add librbytea (with an appropriate delimiter) instead.
rbytea.worker_restart_time (integer)
Sets the interval between cleaning background process runs.
Background process do not run
continuously. Data are static enough; therefore there is no need to run the
cleaning process too often. This parameter specifies the delay between its runs.
The value is specified in seconds. The default is 86400 seconds (24-hour day).
rbytea.databases_for_vacuuming (string)
Databases for cleaning background process.
This parameter determines which databases will be subject to cleaning. Database
names is listed separated by commas. qhbmaster will start just as many cleaning
background processes as number of the databases listed in this parameter. The
default is qhb.
rbytea.filesystem_qss_mode (integer)
This parameter controls encryption of all rbytea columns, not individually for each table. However, it is possible to have both encrypted and unencrypted rbytea values. Each individual value is encrypted (or not) depending on the value of the parameter at a particular point in time. The default value is 0 (encryption disabled). To enable encryption, the parameter must be set to 2. Thus, it is possible to enable and disable rbytea encryption. But enabling will not encrypt old rbytea values.
ATTENTION!
When encrypting rbytea, it is not possible to use values of this type in such a mode, when data of the rbytea type is stored in a single copy for several cluster nodes on a network storage, since different nodes will use different QSS keys.
Note
When qss_mode = 0 the value of the parameter rbytea.filesystem_qss_mode is ignored, see Section QSS Configuration Parameters.
Functions for rbytea Type
| Name | Output Type | Description |
|---|---|---|
| uuid(rbytea) | uuid | Returns the data ID |
| len(rbytea) | bigint | Returns the data length in bytes. |
| qss_mode(rbytea) | bigint | Returns whether the data is encrypted or not |
| md5(rbytea) | text | Returns md5 data sum |
| sha256(rbytea) | text | Returns sha256 data sum |
| ext_file_dir(rbytea) | text | Returns the data storage directory (absolute path or from $PGDATA) |
| ext_file_path(rbytea) | text | Returns the full name of the data storage file (absolute path or from $PGDATA) |
| trash_file_dir(rbytea) | text | Returns the remote data storage directory (absolute path or from $PGDATA) |
| trash_file_path(rbytea) | text | Returns the full name of the remote data storage file (absolute path or from $PGDATA) |
| txid_created(rbytea) | bigint | Returns the transaction XID during which the data was created. |
| rvacuum() | bigint | Performs vacuum of obsolete data in storage |
Background Vacuum Process for Old Copies
Since the file system is not subject to database transactionality, there may be data in the external storage from rbytea table fields that were deleted or stored in unfinished, rolled-back transactions. Therefore, a background vacuum process is periodically started, that runs through a range of transactions, vacuuming the data. The files are moved to the TRASH directory, that is created for each database specified in the rbytea.databases_for_vacuuming.
After each run, the maximum XID is remembered and used as the lower boundary of the scanning range at the next run. The last completed transaction is used as the upper boundary of the scanning range.
The run parameters are specified in the paragraph above.
Features of the Rbytea Extension
Rbytea in a Cluster and at Using Encryption
There are some limitations when running the extension in a cluster.
The directory (mount points of the file system/volume) for storing data images, specified in rbytea.filesystem_storage_path (see Section Configuration), can:
- point to a directory within a single local host and be used exclusively by that single cluster host,
- or to a network file storage directory shared by all hosts in the cluster.
In the first case, the data images will be available only on this specific host where they were created, and accessing the data on other hosts in the cluster will result in an error. But, in this case, it is possible to use data encryption using QSS.
In the second case, the data images will be available on all cluster hosts, but the use of encryption in the current version of QHB and the Rbytea extension is prohibited, since each cluster host is required to use its own working keys QSS, and data encrypted with one key will not be available on other hosts (an attempt to read will result in an error).
Updating the Extension from Previous Versions
The rbytea extension version 1.2 is significantly different from its previous versions. Because of this, it is not possible to update the extension from earlier versions to the current one using the command
ALTER EXTENSION rbytea UPDATE TO '1.2';
It is only possible to remove the previous extension version and create it with the current version using the command
CREATE EXTENSION rbytea CASCADE;
It is necessary to take care of preserving data images if there is no goal of completely recreating them with loss.
This can be done by migration
- via the bytea data type,
- by flushing the images on the disk where they were located before migration (if they were not encrypted),
- or via dump if the data images were encrypted.
The detailed steps of such migration are left to the discretion of the DB administrator.
The QHB developers plan to develop tools to support such migrations in one of the future releases.