| This page is user contributed documentation. See the bottom of the page for information about the author. | |
An attacker able to submit crafted strings to an application that will embed those strings in SQL commands can use invalidly-encoded multibyte characters to bypass standard string-escaping methods, resulting in possible injection of hostile SQL commands into the database. The attacks covered here work in any multibyte encoding (but see also CVE-2006-2314). Affects all PostgreSQL versions before 8.1.4, 8.0.8, 7.4.13, 7.3.15.
The widely-used practice of escaping ASCII single quote "'" by turning it into "\'" is unsafe when operating in multibyte encodings that allow 0x5c (ASCII code for backslash) as the trailing byte of a multibyte character; this includes at least SJIS, BIG5, GBK, GB18030, and UHC. An application that uses this conversion while embedding untrusted strings in SQL commands is vulnerable to SQL-injection attacks if it communicates with the server in one of these encodings. While the standard client libraries used with PostgreSQL have escaped "'" in the safe, SQL-standard way of "''" for some time, the older practice remains common. As of PostgreSQL versions 8.1.4, 8.0.8, 7.4.13, 7.3.15, the server has been modified to reject "\'" when the client is using one of these encodings. This does NOT in itself fix all variants of the problem, but it will make it obvious that such a client is broken and in need of repair. A possible workaround for affected clients is to avoid use of the vulnerable character encodings.
The fundamental problem is that client-side and server-side code may have different behaviors when presented invalidly-encoded multibyte data. Given that client-side escaping has often been done by ad hoc code, the solution of "make sure they always act the same" seems unworkable, and we have instead tried to close off the problem at the server side by rejecting potentially-corrupted queries.
The particular issue described as CVE-2006-2313 stems from the
fact that client-side escaping code commonly doesn't use any
encoding knowledge at all, just treating any byte with the high
bit set as a single non-ASCII character. In encodings such as UTF8
there is not anything obviously wrong with this practice (but see
below for other encodings). However, the PostgreSQL server behaves
differently, and that mismatch opens a hole. As an example,
suppose that the client is operating in UTF8 encoding and an
attacker submits a data string containing
0xc8 ' some text
(where 0xc8 represents a single byte with that hex value). In UTF8
0xc8 begins a two-byte character, but the second byte is supposed
to have a value within 0xA0-0xFF; the above string is therefore
not validly encoded. A non-encoding-aware client will ignore this
problem and try to escape the quote mark, producing either
0xc8 ' ' some text
0xc8 \ ' some text
Previous versions of the PostgreSQL server would accept 0xc8
followed by any byte value at all as a two-byte character, so that
the server would see the above as a two-byte character followed by
a string-literal-ending quote mark. The text "some text" would
then be successfully injected into the SQL query string supplied
to the server. (In some contexts the server would warn about the
invalid two-byte character, and then drop it, but a mere warning
does not stop the attack.)
Our solution for CVE-2006-2313 is to modify the server to check multibyte encoding more carefully, and to treat invalid input as a query-stopping error rather than merely a warning. It will now reject 0xc8 followed by a byte that is not within 0xA0-0xFF. This closes off the problem without needing any assumptions about whether the client has been patched to understand about multibyte encodings explicitly. In "safe" encodings such as UTF8, we can now be sure that the client and server agree about which characters in the string are quotes and backslashes.
However, there are several Far Eastern character encodings in which 0x5c (the ASCII code for backslash) is a valid second byte for a two-byte character. For example, the two-byte sequence 0x95 0x5c is a valid character in SJIS. In these encodings, there are two additional risks which we have described as CVE-2006-2314.
First, a non-encoding-aware client is likely to try to escape what
it thinks is a backslash. For example, the attacker sends the
entirely legal SJIS string
0x95 0x5c ' some text
If the client is not encoding-aware it will take the 0x5c as a
separate backslash character and will double it. It then also
escapes the quote mark, producing either
0x95 0x5c \ ' ' some text
0x95 0x5c \ \ ' some text
Since the server will correctly perceive 0x95 0x5c as a single
character, either case succeeds in injecting "some text": the last
quote mark will appear unescaped to the server. A more subtle case
is where 0x95 0x5c appears at the end of the string: the added
backslash will effectively quote the intended string-ending quote
mark. If the attacker can also control the next string literal in
the query, he wins, eg
WHERE key1 = '0x95 0x5c \' AND key2 = 'injected text
here' ...
Second, if the client escapes "'" as "\'", it can produce a valid
multibyte character where there was none before. Again considering
SJIS, suppose the attacker is able to send the (invalid) string
0x95 ' some text
If the client is not encoding-aware, and sees the quote mark as a
separate character to be escaped, it will produce
0x95 \ ' some text
which is seen by the server as
0x95 0x5c ' some text
and thus again the injection attack succeeds.
The only real solution for these problems is to fix the client: escaping has to be done with awareness of the character set encoding (so that "backslashes" that are part of a multibyte character won't be escaped), and quotes need to be escaped using the SQL-standard representation "''" rather than "\'". Note that in none of these encodings is 0x27 a valid trailing byte, so that there is no corresponding problem for "''" escaping so long as the server checks multibyte encoding validity.
However, waiting around for clients to get fixed isn't a very workable security approach. We have instead modified the server so that (by default) it will reject "\'" as a representation of quote mark when operating in an encoding that allows embedded 0x5c. This change closes the security holes associated with a non-encoding-aware client trying to double an embedded 0x5c, so long as the client escapes "'" as "''"; note that in the cases above where the client did not use "\'", the server sees "\'" as a result of the mistaken doubling of 0x5c. Notice however that in the examples where the client did try to use "\'", the server sees a perfectly valid query with no apparent use of "\'". Therefore, rejecting "\'" will NOT by itself prevent injection attacks. Rather, this is a partial solution that will also have the effect of making it obvious in normal use that clients using this representation in unsafe encodings are broken. This should allow them to get fixed before they are attacked.
(Another reason for rejecting "\'" is that it fits into our long-term plan to transition to SQL-standard string literal rules, wherein backslash is not a special character. Clients that are still using "\'" at that point would have a new SQL-injection risk to contend with.)
There are a number of mitigating factors that may keep particular applications from being subject to these security risks:
The PostgreSQL Global Development Group thanks Akio Ishida and Yasuo Ohgaki for identifying and reporting these issues.