Class Utf8.Processor
java.lang.Object
com.google.protobuf.Utf8.Processor
- Direct Known Subclasses:
Utf8.SafeProcessor, Utf8.UnsafeProcessor
- Enclosing class:
Utf8
A processor of UTF-8 strings, providing methods for checking validity and encoding.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) abstract StringdecodeUtf8(byte[] bytes, int index, int size) Decodes the given byte array slice into aString.(package private) final StringdecodeUtf8(ByteBuffer buffer, int index, int size) Decodes the given portion of theByteBufferinto aString.(package private) final StringdecodeUtf8Default(ByteBuffer buffer, int index, int size) DecodesByteBufferinstances using theByteBufferAPI rather than potentially faster approaches.(package private) abstract StringdecodeUtf8Direct(ByteBuffer buffer, int index, int size) Decodes directByteBufferinstances intoString.(package private) abstract intencodeUtf8(String in, byte[] out, int offset, int length) Encodes an input character sequence (in) to UTF-8 in the target array (out).(package private) final voidencodeUtf8(String in, ByteBuffer out) Encodes an input character sequence (in) to UTF-8 in the target buffer (out).protected abstract voidencodeUtf8Internal(String in, ByteBuffer out) Encodes the input character sequence to a directByteBufferinstance.protected intencodeUtf8Naive(String in, byte[] out, int offset, int length) protected voidencodeUtf8Naive(String in, ByteBuffer out) (package private) abstract booleanisValidUtf8(byte[] bytes, int index, int limit) Returnstrueif the given byte array slice is a well-formed UTF-8 byte sequence.protected booleanisValidUtf8BufferDefault(ByteBuffer buffer, int index, int limit) Returnstrueif the given portion of theByteBufferis a well-formed UTF-8 byte sequence.protected booleanisValidUtf8BufferDirect(ByteBuffer buffer, int index, int limit) Must only be called on Direct buffers.
-
Constructor Details
-
Processor
Processor()
-
-
Method Details
-
isValidUtf8
abstract boolean isValidUtf8(byte[] bytes, int index, int limit) Returnstrueif the given byte array slice is a well-formed UTF-8 byte sequence. The range of bytes to be checked extends from indexindex, inclusive, tolimit, exclusive. -
isValidUtf8BufferDirect
Must only be called on Direct buffers. This exists as a separate method only so that the UnsafeProcessor can optimize specially for that case. -
isValidUtf8BufferDefault
Returnstrueif the given portion of theByteBufferis a well-formed UTF-8 byte sequence. The range of bytes to be checked extends from indexindex, inclusive, tolimit, exclusive. -
decodeUtf8
Decodes the given byte array slice into aString.- Throws:
InvalidProtocolBufferException- if the byte array slice is not valid UTF-8
-
decodeUtf8
final String decodeUtf8(ByteBuffer buffer, int index, int size) throws InvalidProtocolBufferException Decodes the given portion of theByteBufferinto aString.- Throws:
InvalidProtocolBufferException- if the portion of the buffer is not valid UTF-8
-
decodeUtf8Direct
abstract String decodeUtf8Direct(ByteBuffer buffer, int index, int size) throws InvalidProtocolBufferException Decodes directByteBufferinstances intoString.- Throws:
InvalidProtocolBufferException
-
decodeUtf8Default
final String decodeUtf8Default(ByteBuffer buffer, int index, int size) throws InvalidProtocolBufferException DecodesByteBufferinstances using theByteBufferAPI rather than potentially faster approaches.- Throws:
InvalidProtocolBufferException
-
encodeUtf8Naive
-
encodeUtf8Naive
-
encodeUtf8
Encodes an input character sequence (in) to UTF-8 in the target array (out). For a string, this method is functionally identical to
but may be implemented differently for efficiency purposes.byte[] a = string.getBytes(UTF_8); System.arraycopy(a, 0, bytes, offset, a.length); return offset + a.length;Matching
String.getBytes(UTF_8)this replaces unpaired surrogates with a replacement character.To ensure sufficient space in the output buffer, either call
Utf8.encodedLength(String)to compute the exact amount needed, or leave room forUtf8.MAX_BYTES_PER_CHAR * sequence.length(), which is the largest possible number of bytes that any input can be encoded to.- Parameters:
in- the input character sequence to be encodedout- the target arrayoffset- the starting offset inbytesto start writing atlength- the length of thebytes, starting fromoffset- Returns:
- the new offset, equivalent to
offset + Utf8.encodedLength(sequence) - Throws:
ArrayIndexOutOfBoundsException- ifsequenceencoded in UTF-8 is longer thanbytes.length - offset
-
encodeUtf8
Encodes an input character sequence (in) to UTF-8 in the target buffer (out). Upon returning from this method, theoutposition will point to the position after the last encoded byte. This method requires paired surrogates, and therefore does not support chunking.To ensure sufficient space in the output buffer, either call
Utf8.encodedLength(String)to compute the exact amount needed, or leave room forUtf8.MAX_BYTES_PER_CHAR * in.length(), which is the largest possible number of bytes that any input can be encoded to.- Parameters:
in- the source character sequence to be encodedout- the target buffer- Throws:
ArrayIndexOutOfBoundsException- ifinencoded in UTF-8 is longer thanout.remaining()
-
encodeUtf8Internal
Encodes the input character sequence to a directByteBufferinstance.
-