Binary String Functions and Operators
This section describes functions and operators for examining and manipulating binary strings, that is values of type bytea. Many of these are equivalent, in purpose and syntax, to the text-string functions described in the previous section.
SQL defines some string functions that use key words, rather than commas, to separate arguments. Details are in Section SQL Binary String Functions and Operators. QHB also provides versions of these functions that use the regular function invocation syntax (see Section Other Binary String Functions).
SQL Binary String Functions and Operators
bytea || bytea → bytea
Concatenates the two binary strings.
'\x123456'::bytea || '\x789a00bcde'::bytea → \x123456789a00bcde
bit_length
bit_length ( bytea ) → integer
Returns number of bits in the binary string (8 times the octet_length).
bit_length('\x123456'::bytea) → 24
btrim
btrim ( bytesbytea,bytesremoved bytea ) → bytea
Removes the longest string containing only bytes appearing in bytesremoved from the start and end of bytes.
btrim('\x1234567890'::bytea, '\x9012'::bytea) → \x345678
ltrim
ltrim ( bytesbytea,bytesremoved bytea ) → bytea
Removes the longest string containing only bytes appearing in bytesremoved from the start of bytes.
ltrim('\x1234567890'::bytea, '\x9012'::bytea) → \x34567890
octet_length
octet_length ( bytea ) → integer
Returns number of bytes in the binary string.
octet_length('\x123456'::bytea) → 3
overlay
overlay ( bytesbytea PLACINGnewsubstringbytea FROMstartinteger [ FORcount integer ] ) → bytea
Replaces the substring of bytes that starts at the start'th byte and extends for count bytes with newsubstring. If count is omitted, it defaults to the length of newsubstring.
overlay('\x1234567890'::bytea placing '\002\003'::bytea from 2 for 3) → \x12020390
position
position ( substringbytea INbytes bytea ) → integer
Returns first starting index of the specified substring within bytes, or zero if it's not present.
position('\x5678'::bytea in '\x1234567890'::bytea) → 3
rtrim
rtrim ( bytesbytea,bytesremoved bytea ) → bytea
Removes the longest string containing only bytes appearing in bytesremoved from the end of bytes.
rtrim('\x1234567890'::bytea, '\x9012'::bytea) → \x12345678
substring
substring ( bytesbytea [ FROMstartinteger ] [ FORcount integer ] ) → bytea
Extracts the substring of bytes starting at the start'th byte if that is specified, and stopping after count bytes if that is specified. Provide at least one of start and count.
substring('\x1234567890'::bytea from 3 for 2) → \x5678
trim
trim ( [ LEADING | TRAILING | BOTH ] bytesremovedbytea FROMbytes bytea ) → bytea
Removes the longest string containing only bytes appearing in bytesremoved from the start, end, or both ends (BOTH is the default) of bytes.
trim('\x9012'::bytea from '\x1234567890'::bytea) → \x345678
trim ( [ LEADING | TRAILING | BOTH ] [ FROM ] bytesbytea,bytesremoved bytea ) → bytea
This is a non-standard syntax for trim().
trim(both from '\x1234567890'::bytea, '\x9012'::bytea) → \x345678
Other Binary String Functions
Additional binary string manipulation functions are available and are listed in this section. Some of them are used internally to implement the SQL-standard string functions listed in Section SQL Binary String Functions and Operators.
bit_count
bit_count ( bytes bytea ) → bigint
Returns the number of bits set in the binary string (also known as “popcount”).
bit_count('\x1234567890'::bytea) → 15
get_bit
get_bit ( bytesbytea,n bigint ) → integer
Extracts n'th bit from binary string.
get_bit('\x1234567890'::bytea, 30) → 1
get_byte
get_byte ( bytesbytea,n integer ) → integer
Extracts n'th byte from binary string.
get_byte('\x1234567890'::bytea, 4) → 144
length
length ( bytea ) → integer
Returns the number of bytes in the binary string.
length('\x1234567890'::bytea) → 5
length ( bytesbytea,encoding name ) → integer
Returns the number of characters in the binary string, assuming that it is text in the given encoding.
length('jose'::bytea, 'UTF8') → 4
md5
md5 ( bytea ) → text
Computes the MD5 hash of the binary string, with the result written in hexadecimal.
md5('Th\000omas'::bytea) → 8ab2d3c9689aaf18b4958c334c82d8b1
set_bit
set_bit ( bytesbytea,nbigint,newvalue integer ) → bytea
Sets n'th bit in binary string to newvalue.
set_bit('\x1234567890'::bytea, 30, 0) → \x1234563890
set_byte
set_byte ( bytesbytea,ninteger,newvalue integer ) → bytea
Sets n'th byte in binary string to newvalue.
set_byte('\x1234567890'::bytea, 4, 64) → \x1234567840
sha224
sha224 ( bytea ) → bytea
Computes the SHA-224 hash of the binary string.
sha224('abc'::bytea) → \x23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7
sha256
sha256 ( bytea ) → bytea
Computes the SHA-256 hash of the binary string.
sha256('abc'::bytea) → \xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
sha384
sha384 ( bytea ) → bytea
Computes the SHA-384 hash of the binary string.
sha384('abc'::bytea) → \xcb00753f45a35e8bb5a03d699ac65007272c32ab0eded1631a8b605a43ff5bed8086072ba1e7cc2358baeca134c825a7
sha512
sha512 ( bytea ) → bytea
Computes the SHA-512 hash of the binary string.
sha512('abc'::bytea) → \xddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a9eeee64b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f
substr
substr ( bytesbytea,startinteger [,count integer ] ) → bytea
Extracts the substring of byte starting at the start'th byte, and extending for count bytes if that is specified. (Same as substring(bytes from start for count).)
substr('\x1234567890'::bytea, 3, 2) → \x5678
Notes
Functions get_byte and set_byte number the first byte of a binary string as byte 0. Functions get_bit and set_bit number bits from the right within each byte; for example bit 0 is the least significant bit of the first byte, and bit 15 is the most significant bit of the second byte.
For historical reasons, the function md5 returns a hex-encoded value of type
text whereas the SHA-2 functions return type bytea. Use the functions
encode and decode to convert between the two. For example write
encode(sha256('abc'), 'hex') to get a hex-encoded text representation, or
decode(md5('abc'), 'hex') to get a bytea value.
Text/Binary String Conversion Functions
Functions for converting strings between different character sets (encodings), and for representing arbitrary binary data in textual form, are shown in this section. For these functions, an argument or result of type text is expressed in the database's default encoding, while arguments or results of type bytea are in an encoding named by another argument.
convert
convert ( bytesbytea,src_encodingname,dest_encoding name ) → bytea
Converts a binary string representing text in encoding src_encoding to a binary string in encoding dest_encoding (see Section Available Character Set Conversions for available conversions).
convert('text_in_utf8', 'UTF8', 'LATIN1') → \x746578745f696e5f75746638
convert_from
convert_from ( bytesbytea,src_encoding name ) → text
Converts a binary string representing text in encoding src_encoding to text in the database encoding (see Section Available Character Set Conversions for available conversions).
convert_from('text_in_utf8', 'UTF8') → text_in_utf8
convert_to
convert_to ( stringtext,dest_encoding name ) → bytea
Converts a text string (in the database encoding) to a binary string encoded in encoding dest_encoding (see Section Available Character Set Conversions for available conversions).
convert_to('some_text', 'UTF8') → \x736f6d655f74657874
encode
encode ( bytesbytea,format text ) → text
Encodes binary data into a textual representation; supported format values are: base64, escape, hex.
encode('123\000\001', 'base64') → MTIzAAE=
decode
decode ( stringtext,format text ) → bytea
Decodes binary data from a textual representation; supported format values are the same as for encode.
decode('MTIzAAE=', 'base64') → \x3132330001
Note
The encode and decode functions support the following textual formats:
base64
The base64 format is that of RFC 2045 Section 6.8. As per the RFC, encoded
lines are broken at 76 characters. However instead of the MIME CRLF end-of-line
marker, only a newline is used for end-of-line. The decode function ignores
carriage-return, newline, space, and tab characters. Otherwise, an error is raised
when decode is supplied invalid base64 data — including when trailing padding is
incorrect.
escape
The escape format converts zero bytes and bytes with the high bit set into
octal escape sequences (\nnn), and it doubles backslashes. Other byte
values are represented literally. The decode function will raise an error if a
backslash is not followed by either a second backslash or three octal digits; it
accepts other byte values unchanged.
hex
The hex format represents each 4 bits of data as one hexadecimal digit, 0
through f, writing the higher-order digit of each byte first. The encode
function outputs the a-f hex digits in lower case. Because the smallest unit
of data is 8 bits, there are always an even number of characters returned by
encode. The decode function accepts the a-f characters in either upper or
lower case. An error is raised when decode is given invalid hex data — including
when given an odd number of characters.
See also the aggregate function string_agg in Section Aggregate Functions and the large object functions in Section Server-Side Functions.