Class Utf8.Processor

java.lang.Object
com.google.protobuf.Utf8.Processor
Direct Known Subclasses:
Utf8.SafeProcessor, Utf8.UnsafeProcessor
Enclosing class:
Utf8

abstract static class Utf8.Processor extends Object
A processor of UTF-8 strings, providing methods for checking validity and encoding.
  • Constructor Details

    • Processor

      Processor()
  • Method Details

    • isValidUtf8

      abstract boolean isValidUtf8(byte[] bytes, int index, int limit)
      Returns true if the given byte array slice is a well-formed UTF-8 byte sequence. The range of bytes to be checked extends from index index, inclusive, to limit, exclusive.
    • isValidUtf8BufferDirect

      protected boolean isValidUtf8BufferDirect(ByteBuffer buffer, int index, int limit)
      Must only be called on Direct buffers. This exists as a separate method only so that the UnsafeProcessor can optimize specially for that case.
    • isValidUtf8BufferDefault

      protected boolean isValidUtf8BufferDefault(ByteBuffer buffer, int index, int limit)
      Returns true if the given portion of the ByteBuffer is a well-formed UTF-8 byte sequence. The range of bytes to be checked extends from index index, inclusive, to limit, exclusive.
    • decodeUtf8

      abstract String decodeUtf8(byte[] bytes, int index, int size) throws InvalidProtocolBufferException
      Decodes the given byte array slice into a String.
      Throws:
      InvalidProtocolBufferException - if the byte array slice is not valid UTF-8
    • decodeUtf8

      final String decodeUtf8(ByteBuffer buffer, int index, int size) throws InvalidProtocolBufferException
      Decodes the given portion of the ByteBuffer into a String.
      Throws:
      InvalidProtocolBufferException - if the portion of the buffer is not valid UTF-8
    • decodeUtf8Direct

      abstract String decodeUtf8Direct(ByteBuffer buffer, int index, int size) throws InvalidProtocolBufferException
      Decodes direct ByteBuffer instances into String.
      Throws:
      InvalidProtocolBufferException
    • decodeUtf8Default

      final String decodeUtf8Default(ByteBuffer buffer, int index, int size) throws InvalidProtocolBufferException
      Decodes ByteBuffer instances using the ByteBuffer API rather than potentially faster approaches.
      Throws:
      InvalidProtocolBufferException
    • encodeUtf8Naive

      protected int encodeUtf8Naive(String in, byte[] out, int offset, int length)
    • encodeUtf8Naive

      protected void encodeUtf8Naive(String in, ByteBuffer out)
    • encodeUtf8

      abstract int encodeUtf8(String in, byte[] out, int offset, int length)
      Encodes an input character sequence (in) to UTF-8 in the target array (out). For a string, this method is functionally identical to
      byte[] a = string.getBytes(UTF_8);
      System.arraycopy(a, 0, bytes, offset, a.length);
      return offset + a.length;
      
      but may be implemented differently for efficiency purposes.

      Matching String.getBytes(UTF_8) this replaces unpaired surrogates with a replacement character.

      To ensure sufficient space in the output buffer, either call Utf8.encodedLength(String) to compute the exact amount needed, or leave room for Utf8.MAX_BYTES_PER_CHAR * sequence.length(), which is the largest possible number of bytes that any input can be encoded to.

      Parameters:
      in - the input character sequence to be encoded
      out - the target array
      offset - the starting offset in bytes to start writing at
      length - the length of the bytes, starting from offset
      Returns:
      the new offset, equivalent to offset + Utf8.encodedLength(sequence)
      Throws:
      ArrayIndexOutOfBoundsException - if sequence encoded in UTF-8 is longer than bytes.length - offset
    • encodeUtf8

      final void encodeUtf8(String in, ByteBuffer out)
      Encodes an input character sequence (in) to UTF-8 in the target buffer (out). Upon returning from this method, the out position will point to the position after the last encoded byte. This method requires paired surrogates, and therefore does not support chunking.

      To ensure sufficient space in the output buffer, either call Utf8.encodedLength(String) to compute the exact amount needed, or leave room for Utf8.MAX_BYTES_PER_CHAR * in.length(), which is the largest possible number of bytes that any input can be encoded to.

      Parameters:
      in - the source character sequence to be encoded
      out - the target buffer
      Throws:
      ArrayIndexOutOfBoundsException - if in encoded in UTF-8 is longer than out.remaining()
    • encodeUtf8Internal

      protected abstract void encodeUtf8Internal(String in, ByteBuffer out)
      Encodes the input character sequence to a direct ByteBuffer instance.