Class XmlScanner
java.lang.Object
com.fasterxml.aalto.in.XmlScanner
- All Implemented Interfaces:
XmlConsts, NamespaceContext, XMLStreamConstants
- Direct Known Subclasses:
ByteBasedScanner, ReaderScanner
public abstract class XmlScanner
extends Object
implements XmlConsts, XMLStreamConstants, NamespaceContext
This is the abstract base class for all scanner implementations,
defining operations the actual parser requires from the low-level
scanners.
Scanners are encoding and input type (byte, char / stream, block)
specific, so there are many implementations.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final AttributeCollectorprotected intprotected final booleanprotected booleanprotected final ReaderConfigprotected ElementScopeInformation about the current element on the stackprotected intThis is a temporary state variable, valid during START_ELEMENT event.protected intThe row on which the character to read next is on.protected intprotected NsBindingDefault namespace binding is a per-document singleton, like explicit bindings, and used for elements (never for attributes).protected intNumber of START_ELEMENT events returned for which no END_ELEMENT has been returned; including current event.protected booleanFlag set to indicate that an entity is pendingprotected booleanFlag that is used if the current state isSTART_ELEMENTorEND_ELEMENT, to indicate if the underlying physical tag is a so-called empty tag (one ending with "/>")protected FixedNsContextLast returnedNamespaceContext, created for a call togetNonTransientNamespaceContext(), iff this would still be a valid context.protected NsDeclarationPointer to the last namespace declaration encountered.protected char[]Similarly, need a char buffer for actual String construction (in future, could perhaps use StringBuilder?).protected PName[]Although unbound pname instances can be easily and safely reused, bound ones are per-document.protected intprotected NsBinding[]Array containing all prefix bindings needed within the current document, so far (if any).protected intprotected longNumber of bytes that were read and processed before the contents of the current buffer; used for calculating absolute offsets.protected StringPublic id of the current event (DTD), if any.protected intOffset used to calculate the column value given current input buffer pointer.protected longCurrent column at start of current (last returned) tokenprotected longOffset (in chars or bytes) at start of current tokenprotected longCurrent row at start of current (last returned) tokenprotected StringSystem id of the current event (DTD), if any.protected final TextBuilderTextual content of the current eventprotected booleanprotected PNameCurrent name associated with the token, if any.protected final booleanWhether validity checks (wrt.private static final intprivate static final intSize of the bind cache can be reasonably small, and should still get high enough hit rateprivate static final intLet's activate cache quite soon, no need to wait for hundreds of misses; just try to avoid cache construction if all we get is soap envelope element or such.protected final StringString that identifies CDATA section (after "<![" prefix)protected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intprotected static final intThis constant defines the highest Unicode character allowed in XML content.static final intThis token type signifies end-of-input, in cases where it can be returned.Fields inherited from interface XmlConsts
CHAR_CR, CHAR_LF, CHAR_NULL, CHAR_SPACE, STAX_DEFAULT_OUTPUT_ENCODING, STAX_DEFAULT_OUTPUT_VERSION, XML_DECL_KW_ENCODING, XML_DECL_KW_STANDALONE, XML_DECL_KW_VERSION, XML_SA_NO, XML_SA_YES, XML_V_10, XML_V_10_STR, XML_V_11, XML_V_11_STR, XML_V_UNKNOWNFields inherited from interface XMLStreamConstants
ATTRIBUTE, CDATA, CHARACTERS, COMMENT, DTD, END_DOCUMENT, END_ELEMENT, ENTITY_DECLARATION, ENTITY_REFERENCE, NAMESPACE, NOTATION_DECLARATION, PROCESSING_INSTRUCTION, SPACE, START_DOCUMENT, START_ELEMENT -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract voidprotected voidprotected final PNameThis method is called to find/create a fully qualified (bound) name (element / attribute), for a name with prefix.protected final voidMethod called when we are ready to bind a declared namespace.protected final voidcheckImmutableBinding(String prefix, String uri) Method called when an immutable ns prefix (xml, xmlns) is encountered.final voidclose(boolean forceCloseSource) Method called at point when the parsing process has ended (either by encountering end of the input, or via explicit close), and buffers can and should be released.final byte[]decodeAttrBinaryValue(int index, org.codehaus.stax2.typed.Base64Variant v, org.codehaus.stax2.ri.typed.CharArrayBase64Decoder dec) final voiddecodeAttrValue(int index, org.codehaus.stax2.typed.TypedValueDecoder tvd) final intdecodeAttrValues(int index, org.codehaus.stax2.typed.TypedArrayDecoder tad) Method called to decode the attribute value that consists of zero or more space-separated tokens.final intdecodeElements(org.codehaus.stax2.typed.TypedArrayDecoder tad, boolean reset) Method called by the stream reader to decode space-separated tokens that are part of the current text event, using given decoder.final intfindAttrIndex(String nsURI, String localName) private NsDeclarationfindCurrNsDecl(int index) protected final NsBindingfindOrCreateBinding(String prefix) Method called when a namespace declaration needs to find the binding object (essentially a per-prefix-per-document canonical container object)protected abstract voidprotected abstract voidprotected abstract voidprotected abstract voidfinishDTD(boolean copyContents) protected abstract voidfinishPI()protected abstract voidprotected abstract voidThis method is called to ensure that the current token/event has been completely parsed, such that we have all the data needed to return it (textual content, PI data, comment text etc)voidvoidvoidvoidvoidvoidfireSaxStartElement(ContentHandler h, Attributes attrs) final intfinal StringgetAttrLocalName(int index) final StringgetAttrNsURI(int index) final StringgetAttrPrefix(int index) final StringgetAttrPrefixedName(int index) final QNamegetAttrQName(int index) final StringgetAttrType(int index) final StringgetAttrValue(int index) final StringgetAttrValue(String nsURI, String localName) abstract intfinal intabstract org.codehaus.stax2.XMLStreamLocation2final intgetDepth()final Stringfinal Stringabstract longabstract longorg.codehaus.stax2.XMLStreamLocation2final Stringfinal Stringfinal PNamegetName()final StringgetNamespacePrefix(int index) final Stringfinal StringgetNamespaceURI(int index) getNamespaceURI(String prefix) final NamespaceContextfinal intgetPrefixes(String nsURI) final QNamegetQName()abstract longabstract longfinal org.codehaus.stax2.XMLStreamLocation2final StringgetText()final intfinal char[]final intgetTextCharacters(int srcStart, char[] target, int targetStart, int len) final intprotected charhandleInvalidXmlChar(int i) final booleanfinal booleanisAttrSpecified(int index) final booleanfinal booleanprotected abstract booleanloadMore()protected final voidMethod that tries to load at least one more byte into buffer; and if that fails, throws an appropriate EOI exception.protected final voidloadMoreGuaranteed(int tt) abstract intnextFromProlog(boolean isProlog) abstract intprotected voidprotected voidreportDuplicateNsDecl(String prefix) protected voidprotected voidreportEofInName(char[] cbuf, int clen) protected voidprotected voidreportIllegalNsDecl(String prefix) protected voidreportIllegalNsDecl(String prefix, String uri) protected voidreportInputProblem(String msg) protected voidreportInvalidNameChar(int ch, int index) protected voidreportInvalidNsIndex(int index) protected voidreportInvalidXmlChar(int ch) protected voidreportMissingPISpace(int ch) Called when there's an unexpected char after PI target (non-ws, not part of'?>'end markerprotected voidprotected voidreportPrologProblem(boolean isProlog, String msg) protected voidreportPrologUnexpChar(boolean isProlog, int ch, String msg) protected voidreportPrologUnexpElement(boolean isProlog, int ch) protected voidreportTreeUnexpChar(int ch, String msg) protected voidreportUnboundPrefix(PName name, boolean isAttr) protected voidreportUnexpandedEntityInAttr(PName name, boolean isNsDecl) Method called when a call to expand an entity within attribute value fails to expand it.protected voidreportUnexpectedEndTag(String expName) final voidresetForDecoding(org.codehaus.stax2.typed.Base64Variant v, org.codehaus.stax2.ri.typed.CharArrayBase64Decoder dec, boolean firstChunk) Method called by the stream reader to reset given base64 decoder with data from the current text event.protected abstract voidprotected abstract booleanprotected abstract booleanSecondary skip method called after primary text segment has been skipped, and we are in coalescing mode.protected abstract voidprotected abstract voidskipPI()protected abstract voidprotected final booleanThis method is called to essentially skip remaining of the current token (data of PI etc)protected voidthrowInvalidSpace(int i) protected voidprotected voidthrowUnexpectedChar(int i, String msg) protected final voidverifyXmlChar(int value)
-
Field Details
-
CDATA_STR
String that identifies CDATA section (after "<![" prefix)- See Also:
-
TOKEN_EOI
public static final int TOKEN_EOIThis token type signifies end-of-input, in cases where it can be returned. In other cases, an exception may be thrown.- See Also:
-
MAX_UNICODE_CHAR
protected static final int MAX_UNICODE_CHARThis constant defines the highest Unicode character allowed in XML content.- See Also:
-
INT_NULL
protected static final int INT_NULL- See Also:
-
INT_CR
protected static final int INT_CR- See Also:
-
INT_LF
protected static final int INT_LF- See Also:
-
INT_TAB
protected static final int INT_TAB- See Also:
-
INT_SPACE
protected static final int INT_SPACE- See Also:
-
INT_HYPHEN
protected static final int INT_HYPHEN- See Also:
-
INT_QMARK
protected static final int INT_QMARK- See Also:
-
INT_AMP
protected static final int INT_AMP- See Also:
-
INT_LT
protected static final int INT_LT- See Also:
-
INT_GT
protected static final int INT_GT- See Also:
-
INT_QUOTE
protected static final int INT_QUOTE- See Also:
-
INT_APOS
protected static final int INT_APOS- See Also:
-
INT_EXCL
protected static final int INT_EXCL- See Also:
-
INT_COLON
protected static final int INT_COLON- See Also:
-
INT_LBRACKET
protected static final int INT_LBRACKET- See Also:
-
INT_RBRACKET
protected static final int INT_RBRACKET- See Also:
-
INT_SLASH
protected static final int INT_SLASH- See Also:
-
INT_EQ
protected static final int INT_EQ- See Also:
-
INT_A
protected static final int INT_A- See Also:
-
INT_F
protected static final int INT_F- See Also:
-
INT_a
protected static final int INT_a- See Also:
-
INT_f
protected static final int INT_f- See Also:
-
INT_z
protected static final int INT_z- See Also:
-
INT_0
protected static final int INT_0- See Also:
-
INT_9
protected static final int INT_9- See Also:
-
BIND_MISSES_TO_ACTIVATE_CACHE
private static final int BIND_MISSES_TO_ACTIVATE_CACHELet's activate cache quite soon, no need to wait for hundreds of misses; just try to avoid cache construction if all we get is soap envelope element or such.- See Also:
-
BIND_CACHE_SIZE
private static final int BIND_CACHE_SIZESize of the bind cache can be reasonably small, and should still get high enough hit rate- See Also:
-
BIND_CACHE_MASK
private static final int BIND_CACHE_MASK- See Also:
-
_config
-
_xml11
protected final boolean _xml11Whether validity checks (wrt. name and text characters) and normalization (linefeeds) is to be done using xml 1.1 rules, or basic xml 1.0 rules. Default is 1.0. -
_cfgCoalescing
protected final boolean _cfgCoalescing -
_cfgLazyParsing
protected boolean _cfgLazyParsing -
_currToken
protected int _currToken -
_tokenIncomplete
protected boolean _tokenIncomplete -
_depth
protected int _depthNumber of START_ELEMENT events returned for which no END_ELEMENT has been returned; including current event. -
_textBuilder
Textual content of the current event -
_entityPending
protected boolean _entityPendingFlag set to indicate that an entity is pending -
_nameBuffer
protected char[] _nameBufferSimilarly, need a char buffer for actual String construction (in future, could perhaps use StringBuilder?). It is used for holding things like names (element, attribute), and attribute values. -
_tokenName
Current name associated with the token, if any. Name of the current element, target of processing instruction, or name of an unexpanded entity. -
_isEmptyTag
protected boolean _isEmptyTagFlag that is used if the current state isSTART_ELEMENTorEND_ELEMENT, to indicate if the underlying physical tag is a so-called empty tag (one ending with "/>") -
_currElem
Information about the current element on the stack -
_publicId
Public id of the current event (DTD), if any. -
_systemId
System id of the current event (DTD), if any. -
_lastNsDecl
Pointer to the last namespace declaration encountered. Because of backwards linking, it also serves as the head of the linked list of all active namespace declarations starting from the most recent one. -
_currNsCount
protected int _currNsCountThis is a temporary state variable, valid during START_ELEMENT event. For those events, contains number of namespace declarations available. For END_ELEMENT, this count is computed on the fly. -
_defaultNs
Default namespace binding is a per-document singleton, like explicit bindings, and used for elements (never for attributes). -
_nsBindings
Array containing all prefix bindings needed within the current document, so far (if any). These bindings are not in a particular order, and they specifically do NOT represent actual namespace declarations parsed from xml content. -
_nsBindingCount
protected int _nsBindingCount -
_nsBindingCache
Although unbound pname instances can be easily and safely reused, bound ones are per-document. However, it makes sense to try to reuse them too; at least using a minimal static cache, activate only after certain number of cache misses (to avoid overhead for tiny documents, or documents with few or no namespace prefixes). -
_nsBindMisses
protected int _nsBindMisses -
_lastNsContext
Last returnedNamespaceContext, created for a call togetNonTransientNamespaceContext(), iff this would still be a valid context. -
_attrCollector
-
_attrCount
protected int _attrCount -
_pastBytesOrChars
protected long _pastBytesOrCharsNumber of bytes that were read and processed before the contents of the current buffer; used for calculating absolute offsets. -
_currRow
protected int _currRowThe row on which the character to read next is on. Note that it is 0-based, so API will generally add one to it before returning the value -
_rowStartOffset
protected int _rowStartOffsetOffset used to calculate the column value given current input buffer pointer. May be negative, if the first character of the row was contained within an earlier buffer. -
_startRawOffset
protected long _startRawOffsetOffset (in chars or bytes) at start of current token -
_startRow
protected long _startRowCurrent row at start of current (last returned) token -
_startColumn
protected long _startColumnCurrent column at start of current (last returned) token
-
-
Constructor Details
-
XmlScanner
-
-
Method Details
-
close
Method called at point when the parsing process has ended (either by encountering end of the input, or via explicit close), and buffers can and should be released.- Parameters:
forceCloseSource- True if the underlying input source is to be closed, independent of whether auto-close has been set to true via configuration (or if the scanner manages the input source)- Throws:
XMLStreamException
-
_releaseBuffers
protected void _releaseBuffers() -
_closeSource
- Throws:
IOException
-
getConfig
-
getAttrCollector
-
nextFromProlog
- Throws:
XMLStreamException
-
nextFromTree
- Throws:
XMLStreamException
-
finishToken
This method is called to ensure that the current token/event has been completely parsed, such that we have all the data needed to return it (textual content, PI data, comment text etc)- Throws:
XMLStreamException
-
skipToken
This method is called to essentially skip remaining of the current token (data of PI etc)- Returns:
- True If by skipping we also figured out following event type (and assigned its type to _currToken); false if that remains to be done
- Throws:
XMLStreamException
-
getCurrentLocation
public abstract org.codehaus.stax2.XMLStreamLocation2 getCurrentLocation()- Returns:
- Current input location
-
getStartLocation
public final org.codehaus.stax2.XMLStreamLocation2 getStartLocation() -
getStartingByteOffset
public abstract long getStartingByteOffset() -
getStartingCharOffset
public abstract long getStartingCharOffset() -
getEndingByteOffset
- Throws:
XMLStreamException
-
getEndingCharOffset
- Throws:
XMLStreamException
-
getEndLocation
- Throws:
XMLStreamException
-
getCurrentLineNr
public final int getCurrentLineNr() -
getCurrentColumnNr
public abstract int getCurrentColumnNr() -
getInputSystemId
-
getInputPublicId
-
hasEmptyStack
public final boolean hasEmptyStack() -
getDepth
public final int getDepth() -
isEmptyTag
public final boolean isEmptyTag() -
getName
-
getQName
-
getDTDPublicId
-
getDTDSystemId
-
getText
- Throws:
XMLStreamException
-
getTextLength
- Throws:
XMLStreamException
-
getTextCharacters
- Throws:
XMLStreamException
-
getTextCharacters
public final int getTextCharacters(int srcStart, char[] target, int targetStart, int len) throws XMLStreamException - Throws:
XMLStreamException
-
getText
- Throws:
XMLStreamException
-
isTextWhitespace
- Throws:
XMLStreamException
-
decodeElements
public final int decodeElements(org.codehaus.stax2.typed.TypedArrayDecoder tad, boolean reset) throws XMLStreamException Method called by the stream reader to decode space-separated tokens that are part of the current text event, using given decoder.- Parameters:
reset- If true, need to tell text buffer to reset its decoding state; if false, shouldn't- Throws:
XMLStreamException
-
resetForDecoding
public final void resetForDecoding(org.codehaus.stax2.typed.Base64Variant v, org.codehaus.stax2.ri.typed.CharArrayBase64Decoder dec, boolean firstChunk) throws XMLStreamException Method called by the stream reader to reset given base64 decoder with data from the current text event.- Throws:
XMLStreamException
-
fireSaxStartElement
- Throws:
SAXException
-
fireSaxEndElement
- Throws:
SAXException
-
fireSaxCharacterEvents
- Throws:
XMLStreamExceptionSAXException
-
fireSaxSpaceEvents
- Throws:
XMLStreamExceptionSAXException
-
fireSaxCommentEvent
- Throws:
XMLStreamExceptionSAXException
-
fireSaxPIEvent
- Throws:
XMLStreamExceptionSAXException
-
getAttrCount
public final int getAttrCount() -
getAttrLocalName
-
getAttrQName
-
getAttrPrefixedName
-
getAttrNsURI
-
getAttrPrefix
-
getAttrValue
-
getAttrValue
-
decodeAttrValue
public final void decodeAttrValue(int index, org.codehaus.stax2.typed.TypedValueDecoder tvd) throws XMLStreamException - Throws:
XMLStreamException
-
decodeAttrValues
public final int decodeAttrValues(int index, org.codehaus.stax2.typed.TypedArrayDecoder tad) throws XMLStreamException Method called to decode the attribute value that consists of zero or more space-separated tokens. Decoding is done using the decoder provided.- Returns:
- Number of tokens decoded
- Throws:
XMLStreamException
-
decodeAttrBinaryValue
public final byte[] decodeAttrBinaryValue(int index, org.codehaus.stax2.typed.Base64Variant v, org.codehaus.stax2.ri.typed.CharArrayBase64Decoder dec) throws XMLStreamException - Throws:
XMLStreamException
-
findAttrIndex
-
getAttrType
-
isAttrSpecified
public final boolean isAttrSpecified(int index) -
getNsCount
public final int getNsCount() -
getNamespacePrefix
-
getNamespaceURI
-
findCurrNsDecl
-
getNamespaceURI
-
getNonTransientNamespaceContext
-
getNamespaceURI
- Specified by:
getNamespaceURIin interfaceNamespaceContext
-
getPrefix
- Specified by:
getPrefixin interfaceNamespaceContext
-
getPrefixes
- Specified by:
getPrefixesin interfaceNamespaceContext
-
finishCharacters
- Throws:
XMLStreamException
-
finishCData
- Throws:
XMLStreamException
-
finishComment
- Throws:
XMLStreamException
-
finishDTD
- Throws:
XMLStreamException
-
finishPI
- Throws:
XMLStreamException
-
finishSpace
- Throws:
XMLStreamException
-
skipCharacters
- Returns:
- True, if an unexpanded entity was encountered (and is now pending)
- Throws:
XMLStreamException
-
skipCData
- Throws:
XMLStreamException
-
skipComment
- Throws:
XMLStreamException
-
skipPI
- Throws:
XMLStreamException
-
skipSpace
- Throws:
XMLStreamException
-
skipCoalescedText
Secondary skip method called after primary text segment has been skipped, and we are in coalescing mode.- Returns:
- True, if an unexpanded entity was encountered (and is now pending)
- Throws:
XMLStreamException
-
loadMore
- Throws:
XMLStreamException
-
bindName
-
findOrCreateBinding
Method called when a namespace declaration needs to find the binding object (essentially a per-prefix-per-document canonical container object)- Throws:
XMLStreamException
-
bindNs
Method called when we are ready to bind a declared namespace.- Throws:
XMLStreamException
-
checkImmutableBinding
Method called when an immutable ns prefix (xml, xmlns) is encountered.- Throws:
XMLStreamException
-
loadMoreGuaranteed
Method that tries to load at least one more byte into buffer; and if that fails, throws an appropriate EOI exception.- Throws:
XMLStreamException
-
loadMoreGuaranteed
- Throws:
XMLStreamException
-
verifyXmlChar
- Throws:
XMLStreamException
-
reportInputProblem
- Throws:
XMLStreamException
-
reportUnexpandedEntityInAttr
Method called when a call to expand an entity within attribute value fails to expand it.- Throws:
XMLStreamException
-
reportPrologUnexpElement
- Throws:
XMLStreamException
-
reportPrologUnexpChar
protected void reportPrologUnexpChar(boolean isProlog, int ch, String msg) throws XMLStreamException - Throws:
XMLStreamException
-
reportPrologProblem
- Throws:
XMLStreamException
-
reportTreeUnexpChar
- Throws:
XMLStreamException
-
reportInvalidNameChar
- Throws:
XMLStreamException
-
reportInvalidXmlChar
- Throws:
XMLStreamException
-
reportEofInName
- Throws:
XMLStreamException
-
reportMissingPISpace
Called when there's an unexpected char after PI target (non-ws, not part of'?>'end marker- Throws:
XMLStreamException
-
reportDoubleHyphenInComments
- Throws:
XMLStreamException
-
reportMultipleColonsInName
- Throws:
XMLStreamException
-
reportEntityOverflow
- Throws:
XMLStreamException
-
reportInvalidNsIndex
protected void reportInvalidNsIndex(int index) -
reportUnboundPrefix
- Throws:
XMLStreamException
-
reportDuplicateNsDecl
- Throws:
XMLStreamException
-
reportIllegalNsDecl
- Throws:
XMLStreamException
-
reportIllegalNsDecl
- Throws:
XMLStreamException
-
reportUnexpectedEndTag
- Throws:
XMLStreamException
-
reportIllegalCDataEnd
- Throws:
XMLStreamException
-
throwUnexpectedChar
- Throws:
XMLStreamException
-
throwNullChar
- Throws:
XMLStreamException
-
handleInvalidXmlChar
- Throws:
XMLStreamException
-
throwInvalidSpace
- Throws:
XMLStreamException
-