The string tokenizer class allows an application to break a string into tokens by performing code point comparison. The StringTokenizer
methods do not distinguish among identifiers, numbers, and quoted strings, nor do they recognize and skip comments.
The set of delimiters (the codepoints that separate tokens) may be specified either at creation time or on a per-token basis.
An instance of StringTokenizer
behaves in one of three ways, depending on whether it was created with the returnDelims
and coalesceDelims
flags having the value true
or false
:
false
, delimiter code points serve to separate tokens. A token is a maximal sequence of consecutive code points that are not delimiters. true
, delimiter code points are themselves considered to be tokens. In this case, if coalesceDelims is true
, such tokens will be the maximal sequence of consecutive code points that are delimiters. If coalesceDelims is false, a token will be received for each delimiter code point. A token is thus either one delimiter code point, a maximal sequence of consecutive code points that are delimiters, or a maximal sequence of consecutive code points that are not delimiters.
A StringTokenizer object internally maintains a current position within the string to be tokenized. Some operations advance this current position past the code point processed.
A token is returned by taking a substring of the string that was used to create the StringTokenizer object.
Example of the use of the default delimiter tokenizer.
StringTokenizer st = new StringTokenizer("this is a test"); while (st.hasMoreTokens()) { println(st.nextToken()); }
prints the following output:
this is a test
Example of the use of the tokenizer with user specified delimiter.
StringTokenizer st = new StringTokenizer("this is a test with supplementary characters \ud800\ud800\udc00\udc00", " \ud800\udc00"); while (st.hasMoreTokens()) { println(st.nextToken()); }
prints the following output:
@author syn wee @stable ICU 2.4this is a test with supplementary characters \ud800 \udc00
StreamTokenizer
class. The StringTokenizer
methods do not distinguish among identifiers, numbers, and quoted strings, nor do they recognize and skip comments. The set of delimiters (the characters that separate tokens) may be specified either at creation time or on a per-token basis.
An instance of StringTokenizer
behaves in one of two ways, depending on whether it was created with the returnDelims
flag having the value true
or false
:
false
, delimiter characters serve to separate tokens. A token is a maximal sequence of consecutive characters that are not delimiters. true
, delimiter characters are themselves considered to be tokens. A token is thus either one delimiter character, or a maximal sequence of consecutive characters that are not delimiters. A StringTokenizer object internally maintains a current position within the string to be tokenized. Some operations advance this current position past the characters processed.
A token is returned by taking a substring of the string that was used to create the StringTokenizer object.
The following is one example of the use of the tokenizer. The code:
StringTokenizer st = new StringTokenizer("this is a test"); while (st.hasMoreTokens()) { System.out.println(st.nextToken()); }
prints the following output:
this is a test
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
The following example illustrates how the String.split method can be used to break up a string into its basic tokens:
String[] result = "this is a test".split("\\s"); for (int x=0; x<result.length; x++) System.out.println(result[x]);
prints the following output:
@author unascribed @version 1.35, 11/17/05 @see java.io.StreamTokenizer @since JDK1.0this is a test
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|