pg_repack — module for reorganizing tables with minimal locks
pg_repack — utility and QHB extension to reorganize tables
Synopsis
pg_repack [option...] [dbname]
Description
pg_repack — is a QHB extension which lets you remove
bloat from tables and indexes, and optionally restore the physical order of
clustered indexes. Unlike CLUSTER and VACUUM FULL it
works online, without holding an exclusive lock on the processed tables during
processing. pg_repack is efficient to boot, with performance comparable
to using CLUSTER directly.
You can choose one of the following methods to reorganize data:
- Online
CLUSTER(ordered by cluster index) - Ordering by specified columns
- Online
VACUUM FULL(packing rows only) - Rebuild or relocate only the indexes of a table
Note
Only superusers or owners of tables and indexes can use the utility. To run pg_repack as an owner you need to use the option --no-superuser-check.
Note
Target table must have a PRIMARY KEY, or at least a UNIQUE total index on a NOT NULL column.
Requirements
QHB version 1.5.1 and later.
Disks: Performing a full-table repack requires free disk space about twice as large as the target table(s) and its indexes. For example, if the total size of the tables and indexes to be reorganized is 1GB, an additional 2GB of disk space is required.
Installation
The module for QHB that reorganizes tables holding minimum locks is available as package qhb-1.5.3-pg-repack.
Add QHB package repository and install the extension package for chosen platform from the download page.
After installation, load the pg_repack extension in the database you want to process. pg_repack is packaged as an extension, so you can execute:
$ psql -c "CREATE EXTENSION pg_repack" -d ваша_бд
You can remove pg_repack using DROP EXTENSION pg_repack or just dropping
the repack schema.
If you are upgrading from a previous version of pg_repack, just drop the old version from the database as explained above and install the new version.
Options
Reorganization Options
-a
--all
Attempt to repack all the databases of the cluster. Databases where the
pg_repack extension is not installed will be skipped.
-t table
--table=table
Reorganize the specified table(s) only. Multiple tables may be reorganized by
writing multiple -t switches. By default, all eligible tables in the target
databases are reorganized.
-I table
--parent-table=table
Reorganize the specified parent table(s) and its inheritors. Multiple hierarchy
trees may be reorganized by writing multiple -I switches.
-c schema
--schema=schema
Repack the tables in the specified schema(s) only. Multiple schemas may be
repacked by writing multiple -c switches. Can be used in conjunction with
--tablespace to move tables to a different tablespace.
-o column[,...]
--order-by=column[,...]
Perform an online CLUSTER ordered by the specified columns.
-n
--no-order
Perform an online VACUUM FULL. This is the default for non-clustered tables.
-N
--dry-run
Show what would be repacked and exit.
-j num_jobs
--jobs=num_jobs
Create the specified number of extra connections to QHB and use
these extra connections to parallelize the rebuild of indexes on each table.
Parallel index builds are only supported for full-table repacks, not with
--index or --only-indexes options. If your server has extra cores and
disk I/O available, this can be a useful way to speed up pg_repack.
-s tablespace
--tablespace=tablespace
Move the repacked tables to the specified tablespace: essentially an online
version of ALTER TABLE ... SET TABLESPACE. The tables' indexes are left in the
original tablespace unless --moveidx is specified too.
-S
--moveidx
Also move the indexes of the repacked tables to the tablespace specified by the
--tablespace option.
-i index
--index=index
Repack the specified index(es) only. Multiple indexes may be repacked by writing
multiple -i switches. May be used in conjunction with --tablespace to
move the index(es) to a different tablespace.
-x
--only-indexes
Repack only the indexes of the specified table(s), which must be specified with
the --table or --parent-table option.
-T secs
--wait-timeout=secs
pg_repack needs to take an exclusive lock at the end of the reorganization.
This setting controls how many seconds pg_repack will wait to acquire this
lock. If the lock cannot be taken after this duration, pg_repack will
forcibly cancel the conflicting queries. pg_repack will fall back to using
pg_terminate_backend() to disconnect any remaining backends after this
timeout has passed twice. The default is 60 seconds.
-D
--no-kill-backend
Skip to repack table if the lock cannot be taken for duration specified
--wait-timeout, instead of cancelling conflicting queries. The default is
false.
-Z
--no-analyze
Disable ANALYZE after a full-table reorganization. If not specified,
ANALYZE is executed after the reorganization.
-k
--no-superuser-check
Skip the superuser checks in the client. This setting is useful for using
pg_repack on platforms that support running it as non-superusers.
-C
--exclude-extension
Skip tables that belong to the specified extension(s). Some extensions may heavily
depend on such tables at planning time etc.
--no-error-on-invalid-index
Repack even though invalid index is found.
--error-on-invalid-index
Don't repack when invalid index is found (deprecated, as this is the default
behavior now).
--apply-count=num_tuples
Number of tuples to apply in one transaction during replay
--switch-threshold=num_tuples
Switch tables when that many tuples are left in log table. This setting can be
used to avoid the inability to catchup with write-heavy tables.
Connection Options
Options to connect to servers. You cannot use --all and --dbname or --table or --parent-table together.
-a
--all
Reorganize all databases.
-d dbname
--dbname=dbname
Specifies the name of the database to be reorganized. If this is not specified
and -a (or --all) is not used, the database name is read from the
environment variable PGDATABASE. If that is not set, the user name specified
for the connection is used.
-h host
--host=host
Specifies the host name of the machine on which the server is running. If the
value begins with a slash, it is used as the directory for the Unix domain socket.
-p port
--port=port
Specifies the TCP port or local Unix domain socket file extension on which the
server is listening for connections.
-U username
--username=username
User name to connect as.
-w
--no-password
Never issue a password prompt. If the server requires password authentication
and a password is not available by other means such as a .pgpass file, the
connection attempt will fail. This option can be useful in batch jobs and scripts
where no user is present to enter a password.
-W
--password
Force pg_repack to prompt for a password before connecting to a database.
This option is never essential, since pg_repack will automatically prompt
for a password if the server demands password authentication. However,
pg_repack will waste a connection attempt finding out that the server
wants a password. In some cases it is worth typing -W to avoid the extra
connection attempt.
Generic Options
-e
--echo
Echo the commands that pg_repack generates and sends to the server.
-E level
--elevel=level
Choose the output message level from DEBUG, INFO, NOTICE, WARNING, ERROR,
LOG, FATAL, and PANIC. The default is INFO.
--help
Show help about pg_repack command line arguments, and exit.
--version
Print the pg_repack version and exit.
Environment
PGDATABASE
PGHOST
PGPORT
PGUSER
Default connection parameters
This utility, like most other QHB utilities, also uses the environment variables supported by libpq (see Section Environment Variables).
Examples
Perform an online CLUSTER of all the clustered tables in the database test,
and perform an online VACUUM FULL of all the non-clustered tables:
$ pg_repack test
Perform an online VACUUM FULL on the tables foo and bar in the database
test (an eventual cluster index is ignored):
$ pg_repack --no-order --table foo --table bar test
Move all indexes of table foo to tablespace tbs:
$ pg_repack -d test --table foo --only-indexes --tablespace tbs
Move the specified index to tablespace tbs:
$ pg_repack -d test --index idx --tablespace tbs
Diagnostics
Error messages are reported when pg_repac fails. The following list shows the cause of errors.
You need to cleanup by hand after fatal errors. To cleanup, just remove pg_repack from the database and install it again. Execute:
DROP EXTENSION pg_repack CASCADE
in the database where the error occurred, followed by
CREATE EXTENSION pg_repack
INFO: database "db" skipped: pg_repack VER is not installed in the database
pg_repack is not installed in the database when the --all option is specified.
Create the pg_repack extension in the database.
ERROR: pg_repack VER is not installed in the database
pg_repack is not installed in the database specified by --dbname.
Create the pg_repack extension in the database.
ERROR: program 'pg_repack V1' does not match database library 'pg_repack V2'
There is a mismatch between the pg_repack binary and the database library (.so or .dll).
The mismatch could be due to the wrong binary in the $PATH or the wrong database being addressed. Check the program directory and the database; if they are what expected you may need to repeat pg_repack installation.
ERROR: extension 'pg_repack V1' required, found 'pg_repack V2'
The SQL extension found in the database does not match the version required by the pg_repack program.
You should drop the extension from the database and reload it as described in Section Installation.
ERROR: relation "table" must have a primary key or not-null unique keys
The target table doesn't have a PRIMARY KEY or any UNIQUE constraints defined.
Define a PRIMARY KEY or a UNIQUE constraint on the table.
ERROR: query failed: ERROR: column "col" does not exist
The target table doesn't have columns specified by --order-by option.
Specify existing columns.
WARNING: the table "tbl" already has a trigger called repack_trigger
The trigger was probably installed during a previous attempt to run pg_repack on the table which was interrupted and for some reason failed to clean up the temporary objects.
You can remove all the temporary objects by dropping and re-creating the extension: see Section Installation for the details.
WARNING: trigger "trg" conflicting on table "tbl"
The target table has a trigger whose name follows z_repack_trigger in alphabetical order.
The z_repack_trigger should be the last BEFORE trigger to fire. Please rename your trigger so that it sorts alphabetically before pg_repack one; you can use:
ALTER TRIGGER zzz_my_trigger
ON sometable RENAME TO yyy_my_trigger;
ERROR: Another pg_repack command may be running on the table. Please try again later.
There is a chance of deadlock when two concurrent pg_repack commands are run
on the same table. So, try to run the command after some time.
WARNING: Cannot create index "schema"."index_xxxx", already exists
DETAIL: An invalid index may have been left behind by a previous pg_repack on the
table which was interrupted. Please use DROP INDEX "schema"."index_xxxxx" to remove
this index and try again.
A temporary index apparently created by pg_repack has been left behind,
and we do not want to risk dropping this index ourselves. If the index was in
fact created by an old pg_repack job which didn't get cleaned up, you
should just use DROP INDEX and try the repack command again.
Restrictions
pg_repack comes with the following restrictions.
-
pg_repack cannot reorganize temporary tables.
-
pg_repack cannot cluster tables by GiST indexes.
-
You will not be able to perform DDL commands of the target table(s) except
VACUUMorANALYZEwhile pg_repack is working. pg_repack will hold an ACCESS SHARE lock on the target table during a full-table repack, to enforce this restriction.
Details
Full Table Repacks
To perform a full-table repack, pg_repack will:
- create a log table to record changes made to the original table
- add a trigger onto the original table, logging
INSERTs,UPDATEs andDELETEs into our log table - create a new table containing all the rows in the old table
- build indexes on this new table
- apply all changes which have accrued in the log table to the new table
- swap the tables, including indexes and toast tables, using the system catalogs
- drop the original table
pg_repack will only hold an ACCESS EXCLUSIVE lock for a short period
during initial setup (steps 1 and 2 above) and during the final swap-and-drop
phase (steps 6 and 7). For the rest of its time, pg_repack only needs to
hold an ACCESS SHARE lock on the original table, meaning INSERTs, UPDATEs,
and DELETEs may proceed as usual.
Index Only Repacks
To perform an index-only repack, pg_repack will:
- create new indexes on the table using CONCURRENTLY matching the definitions of the old indexes
- swap out the old for the new indexes in the catalogs
- drop the old indexes
Creating indexes concurrently comes with a few caveats, see Section Building Indexes Concurrently.