ragel -J Scanner.java.rl -o ../java/se/fishtank/css/selectors/scanner/Scanner.java@author Christer Sandberg
The lexical structure of Dart is ambiguous without knowledge of the context in which a token is being scanned. For example, without context we cannot determine whether source of the form "<<" should be scanned as a single left-shift operator or as two left angle brackets. This scanner does not have any context, so it always resolves such conflicts by scanning the longest possible token. @coverage dart.engine.parser
This is NOT part of any supported API. If you write code that depends on this, you do so at your own risk. This code and its internal interfaces are subject to change or deletion without notice.
A Scanner
breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods.
For example, this code allows a user to read a number from System.in:
Scanner sc = new Scanner(System.in); int i = sc.nextInt();
As another example, this code allows long
types to be assigned from entries in a file myNumbers
:
Scanner sc = new Scanner(new File("myNumbers")); while (sc.hasNextLong()) { long aLong = sc.nextLong(); }
The scanner can also use delimiters other than whitespace. This example reads several items in from a string:
String input = "1 fish 2 fish red fish blue fish"; Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*"); System.out.println(s.nextInt()); System.out.println(s.nextInt()); System.out.println(s.next()); System.out.println(s.next()); s.close();
prints the following output:
1 2 red blue
The same output can be generated with this code, which uses a regular expression to parse all four tokens at once:
String input = "1 fish 2 fish red fish blue fish"; Scanner s = new Scanner(input); s.findInLine("(\\d+) fish (\\d+) fish (\\w+) fish (\\w+)"); MatchResult result = s.match(); for (int i=1; i<=result.groupCount(); i++) System.out.println(result.group(i)); s.close();
The default whitespace delimiter used by a scanner is as recognized by {@link java.lang.Character}. {@link java.lang.Character#isWhitespace(char) isWhitespace}. The {@link #reset}method will reset the value of the scanner's delimiter to the default whitespace delimiter regardless of whether it was previously changed.
A scanning operation may block waiting for input.
The {@link #next} and {@link #hasNext} methods and their primitive-type companion methods (such as {@link #nextInt} and {@link #hasNextInt}) first skip any input that matches the delimiter pattern, and then attempt to return the next token. Both hasNext and next methods may block waiting for further input. Whether a hasNext method blocks has no connection to whether or not its associated next method will block.
The {@link #findInLine}, {@link #findWithinHorizon}, and {@link #skip}methods operate independently of the delimiter pattern. These methods will attempt to match the specified pattern with no regard to delimiters in the input and thus can be used in special circumstances where delimiters are not relevant. These methods may block waiting for more input.
When a scanner throws an {@link InputMismatchException}, the scanner will not pass the token that caused the exception, so that it may be retrieved or skipped via some other method.
Depending upon the type of delimiting pattern, empty tokens may be returned. For example, the pattern "\\s+" will return no empty tokens since it matches multiple instances of the delimiter. The delimiting pattern "\\s" could return empty tokens since it only passes one space at a time.
A scanner can read text from any object which implements the {@link java.lang.Readable} interface. If an invocation of the underlyingreadable's {@link java.lang.Readable#read} method throws an {@link java.io.IOException} then the scanner assumes that the end of the inputhas been reached. The most recent IOException thrown by the underlying readable can be retrieved via the {@link #ioException} method.
When a Scanner
is closed, it will close its input source if the source implements the {@link java.io.Closeable} interface.
A Scanner
is not safe for multithreaded use without external synchronization.
Unless otherwise mentioned, passing a null
parameter into any method of a Scanner
will cause a NullPointerException
to be thrown.
A scanner will default to interpreting numbers as decimal unless a different radix has been set by using the {@link #useRadix} method. The{@link #reset} method will reset the value of the scanner's radix to An instance of this class is capable of scanning numbers in the standard formats as well as in the formats of the scanner's locale. A scanner's initial locale is the value returned by the {@link java.util.Locale#getDefault} method; it may be changed via the {@link #useLocale} method. The {@link #reset} method will reset the value of thescanner's locale to the initial locale regardless of whether it was previously changed. The localized formats are defined in terms of the following parameters, which for a particular locale are taken from that locale's {@link java.text.DecimalFormat DecimalFormat} object, df, and its and{@link java.text.DecimalFormatSymbols DecimalFormatSymbols} object,dfs. The strings that can be parsed as numbers by an instance of this class are specified in terms of the following regular-expression grammar, where Rmax is the highest digit in the radix being used (for example, Rmax is 9 in base 10). 10
regardless of whether it was previously changed. Localized numbers
LocalGroupSeparator The character used to separate thousands groups, i.e., dfs. {@link java.text.DecimalFormatSymbols#getGroupingSeparator getGroupingSeparator()} LocalDecimalSeparator The character used for the decimal point, i.e., dfs. {@link java.text.DecimalFormatSymbols#getDecimalSeparator getDecimalSeparator()} LocalPositivePrefix The string that appears before a positive number (may be empty), i.e., df. {@link java.text.DecimalFormat#getPositivePrefix getPositivePrefix()} LocalPositiveSuffix The string that appears after a positive number (may be empty), i.e., df. {@link java.text.DecimalFormat#getPositiveSuffix getPositiveSuffix()} LocalNegativePrefix The string that appears before a negative number (may be empty), i.e., df. {@link java.text.DecimalFormat#getNegativePrefix getNegativePrefix()} LocalNegativeSuffix The string that appears after a negative number (may be empty), i.e., df. {@link java.text.DecimalFormat#getNegativeSuffix getNegativeSuffix()} LocalNaN The string that represents not-a-number for floating-point values, i.e., dfs. {@link java.text.DecimalFormatSymbols#getNaN getNaN()} LocalInfinity The string that represents infinity for floating-point values, i.e., dfs. {@link java.text.DecimalFormatSymbols#getInfinity getInfinity()} Number syntax
NonASCIIDigit :: | = A non-ASCII character c for which {@link java.lang.Character#isDigit Character.isDigit}(c) returns true | ||||
Non0Digit :: | = [1-Rmax] | NonASCIIDigit | ||||
Digit :: | = [0-Rmax] | NonASCIIDigit | ||||
GroupedNumeral :: |
| ||||
Numeral :: | = ( ( Digit+ ) | GroupedNumeral ) | ||||
Integer :: | = ( [-+]? ( Numeral ) ) | ||||
| LocalPositivePrefix Numeral LocalPositiveSuffix | |||||
| LocalNegativePrefix Numeral LocalNegativeSuffix | |||||
DecimalNumeral :: | = Numeral | ||||
| Numeral LocalDecimalSeparator Digit* | |||||
| LocalDecimalSeparator Digit+ | |||||
Exponent :: | = ( [eE] [+-]? Digit+ ) | ||||
Decimal :: | = ( [-+]? DecimalNumeral Exponent? ) | ||||
| LocalPositivePrefix DecimalNumeral LocalPositiveSuffix Exponent? | |||||
| LocalNegativePrefix DecimalNumeral LocalNegativeSuffix Exponent? | |||||
HexFloat :: | = [-+]? 0[xX][0-9a-fA-F]*\.[0-9a-fA-F]+ ([pP][-+]?[0-9]+)? | ||||
NonNumber :: | = NaN | LocalNan | Infinity | LocalInfinity | ||||
SignedNonNumber :: | = ( [-+]? NonNumber ) | ||||
| LocalPositivePrefix NonNumber LocalPositiveSuffix | |||||
| LocalNegativePrefix NonNumber LocalNegativeSuffix | |||||
Float :: | = Decimal | ||||
| HexFloat | |||||
| SignedNonNumber |
Whitespace is not significant in the above regular expressions. @version 1.27, 06/28/06 @since 1.5
lr_parser.scan()
. Integration of scanners implementing Scanner
is facilitated.
@version last updated 23-Jul-1999
@author David MacMahon lr_parser.scan()
. Integration of scanners implementing Scanner
is facilitated.
@version last updated 23-Jul-1999
@author David MacMahon The optionalFileName
parameter passed to many constructors should point
These criteria consist of a set of include and exclude patterns. With these patterns, you can select which files you want to have included, and which files you want to have excluded.
The idea is simple. A given directory is recursively scanned for all files and directories. Each file/directory is matched against a set of include and exclude patterns. Only files/directories that match at least one pattern of the include pattern list, and don't match a pattern of the exclude pattern list will be placed in the list of files/directories found.
When no list of include patterns is supplied, "**" will be used, which means that everything will be matched. When no list of exclude patterns is supplied, an empty list is used, such that nothing will be excluded.
The pattern matching is done as follows: The name to be matched is split up in path segments. A path segment is the name of a directory or file, which is bounded by File.separator
('/' under UNIX, '\' under Windows). E.g. "abc/def/ghi/xyz.java" is split up in the segments "abc", "def", "ghi" and "xyz.java". The same is done for the pattern against which should be matched.
Then the segments of the name and the pattern will be matched against each other. When '**' is used for a path segment in the pattern, then it matches zero or more path segments of the name.
There are special case regarding the use of File.separator
s at the beginningof the pattern and the string to match:
When a pattern starts with a File.separator
, the string to match must also start with a File.separator
. When a pattern does not start with a File.separator
, the string to match may not start with a File.separator
. When one of these rules is not obeyed, the string will not match.
When a name path segment is matched against a pattern path segment, the following special characters can be used: '*' matches zero or more characters, '?' matches one character.
Examples:
"**\*.class" matches all .class files/dirs in a directory tree.
"test\a??.java" matches all files/dirs which start with an 'a', then two more characters and then ".java", in a directory called test.
"**" matches everything in a directory tree.
"**\test\**\XYZ*" matches all files/dirs that start with "XYZ" and where there is a parent directory called test (e.g. "abc\test\def\ghi\XYZ123").
Example of usage:
String[] includes = {"**\\*.class"}; String[] excludes = {"modules\\*\\**"}; ds.setIncludes(includes); ds.setExcludes(excludes); ds.setBasedir(new File("test")); ds.scan(); System.out.println("FILES:"); String[] files = ds.getIncludedFiles(); for (int i = 0; i < files.length;i++) { System.out.println(files[i]); }This will scan a directory called test for .class files, but excludes all .class files in all directories under a directory called "modules"
The optionalFileName
parameter passed to many constructors should point
This class is not synchronized as it's expected to be used from a single thread at a time. It's rarely (if ever?) useful to scan concurrently from a shared scanner using multiple threads. If you want to optimize large table scans using extra parallelism, create a few scanners and give each of them a partition of the table to scan. Or use MapReduce.
Unlike HBase's traditional client, there's no method in this class to explicitly open the scanner. It will open itself automatically when you start scanning by calling {@link #nextRows()}. Also, the scanner will automatically call {@link #close} when it reaches the end key. If, however,you would like to stop scanning before reaching the end key, you must call {@link #close} before disposing of the scanner. Note thatit's always safe to call {@link #close} on a scanner.
If you keep your scanner open and idle for too long, the RegionServer will close the scanner automatically for you after a timeout configured on the server side. When this happens, you'll get an {@link UnknownScannerException} when you attempt to use the scanner again.Also, if you scan too slowly (e.g. you take a long time between each call to {@link #nextRows()}), you may prevent HBase from splitting the region if the region is also actively being written to while you scan. For heavy processing you should consider using MapReduce.
A {@code Scanner} is not re-usable. Should you want to scan the same rowsor the same table again, you must create a new one.
The lightweight calls do not create any objects or use heavy amount storage and provide a quick way to determine which headers exists within a packet. The lightweight methods are invoked using a call to {@link #quickScan()}which relies on {@link Scandec} interface to provide lightweight, in possiblynon-java way. Scandecs can be implemented using native or BPF byte code librariers which do nor rely on need for creating and accessing java objects.
The heavyweight class produce header objects and if fully decoded sub header and field objects as well. This is much heavier way of accessing packet content but is required by the general jNetStream public API. For example invoking the method call {@link Packet#format()} forces entire packetcontents to be fully decoded so that information about each piece of the packet nicely formatted and displayed to the user.
@author Mark Bednarczyk @author Sly Technologies, Inc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|