Tokenizes a string based based on delimiters (separators) and supporting quoting and ignored character concepts.
This class can split a String into many smaller strings. It aims to do a similar job to {@link java.util.StringTokenizer StringTokenizer}, however it offers much more control and flexibility including implementing the ListIterator
interface. By default, it is set up like StringTokenizer
.
The input String is split into a number of tokens. Each token is separated from the next String by a delimiter. One or more delimiter characters must be specified.
Each token may be surrounded by quotes. The quote matcher specifies the quote character(s). A quote may be escaped within a quoted section by duplicating itself.
Between each token and the delimiter are potentially characters that need trimming. The trimmer matcher specifies these characters. One usage might be to trim whitespace characters.
At any point outside the quotes there might potentially be invalid characters. The ignored matcher specifies these characters to be removed. One usage might be to remove new line characters.
Empty tokens may be removed or returned as null.
"a,b,c" - Three tokens "a","b","c" (comma delimiter) " a, b , c " - Three tokens "a","b","c" (default CSV processing trims whitespace) "a, ", b ,", c" - Three tokens "a, " , " b ", ", c" (quoted text untouched)
This tokenizer has the following properties and options:
Property | Type | Default |
delim | CharSetMatcher | { \t\n\r\f} |
quote | NoneMatcher | {} |
ignore | NoneMatcher | {} |
emptyTokenAsNull | boolean | false |
ignoreEmptyTokens | boolean | true |
@since 2.2
@version $Id: StrTokenizer.java 1153241 2011-08-02 18:49:52Z ggregory $