com.continuent.tungsten.common.parsing.bytes
Class MySQLStatementTranslator
java.lang.Object
com.continuent.tungsten.common.parsing.bytes.MySQLStatementTranslator
public class MySQLStatementTranslator
- extends java.lang.Object
Utility class to translate MySQL statements from a native charset to Java
Unicode strings accounting for introducers for binary and alternative
character sets. Syntax for MySQL strings is described in MySQL
online documentation. A typical example looks like the following:
INSERT INTO `storage'_binary'` VALUES (25, 'col_binary', _binary'\0\0\0\0')
This string illustrates some of the potential ambiguities in
string translation. To avoid confusion we implement a full tokenizer that
ignores data embedded in normal strings or comments. We thus translate the
preceding string into the following, where binary data are replaced by a
hexadecimal string format:
INSERT INTO `storage'_binary'` VALUES (25, 'col_binary', _binary x'00000000')
The translation is based on state machines according to the
following principles.
- Binary and alternative charset strings denoted by introducers of the form
_<introducer>'value' or _<introducer>"value" are converted to hex
strings that can translate safely to Unicode.
- All bytes outside of such strings are translated using the character set
assigned when creating the MySQLStatementTranslator instance.
- Values within ordinary strings starting with backtick (`), single, and
double quotes excluded from parsing for introducers. The same applies to
values within comments.
Performance is an important consideration in the translation algorithm as
binary strings in particular are potentially quite large. The parsing and
translation processing uses pointers into the byte string to minimize object
creation. The translation values for byte strings are pre-computed strings.
The performance overhead of parsing + translation is about 10% over unparsed
string translation.
Finally, it should be noted that translation into the safe hex format doubles
the size of binary strings. Users should expect to double memory allocations
accordingly, including MySQL specific settings like max_packet_size, which
sets the maximum size of a single client request.
- Version:
- 1.0
- Author:
- Robert Hodges
|
Method Summary |
java.lang.String |
toJavaString(byte[] bytes,
int offset,
int length)
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
MySQLStatementTranslator
public MySQLStatementTranslator(java.lang.String charset)
throws java.io.UnsupportedEncodingException
- Throws:
java.io.UnsupportedEncodingException
toJavaString
public java.lang.String toJavaString(byte[] bytes,
int offset,
int length)
throws java.io.UnsupportedEncodingException
- Throws:
java.io.UnsupportedEncodingException