io.netty.util.internal.chmv8.ForkJoinWorkerThread
Queues supporting work-stealing as well as external task submission. See above for main rationale and algorithms. Implementation relies heavily on "Unsafe" intrinsics and selective use of "volatile": Field "base" is the index (mod array.length) of the least valid queue slot, which is always the next position to steal (poll) from if nonempty. Reads and writes require volatile orderings but not CAS, because updates are only performed after slot CASes. Field "top" is the index (mod array.length) of the next queue slot to push to or pop from. It is written only by owner thread for push, or under lock for external/shared push, and accessed by other threads only after reading (volatile) base. Both top and base are allowed to wrap around on overflow, but (top - base) (or more commonly -(base - top) to force volatile read of base before top) still estimates size. The lock ("qlock") is forced to -1 on termination, causing all further lock attempts to fail. (Note: we don't need CAS for termination state because upon pool shutdown, all shared-queues will stop being used anyway.) Nearly all lock bodies are set up so that exceptions within lock bodies are "impossible" (modulo JVM errors that would cause failure anyway.) The array slots are read and written using the emulation of volatiles/atomics provided by Unsafe. Insertions must in general use putOrderedObject as a form of releasing store to ensure that all writes to the task object are ordered before its publication in the queue. All removals entail a CAS to null. The array is always a power of two. To ensure safety of Unsafe array operations, all accesses perform explicit null checks and implicit bounds checks via power-of-two masking. In addition to basic queuing support, this class contains fields described elsewhere to control execution. It turns out to work better memory-layout-wise to include them in this class rather than a separate class. Performance on most platforms is very sensitive to placement of instances of both WorkQueues and their arrays -- we absolutely do not want multiple WorkQueue instances or multiple queue arrays sharing cache lines. (It would be best for queue objects and their arrays to share, but there is nothing available to help arrange that). Unfortunately, because they are recorded in a common array, WorkQueue instances are often moved to be adjacent by garbage collectors. To reduce impact, we use field padding that works OK on common platforms; this effectively trades off slightly slower average field access for the sake of avoiding really bad worst-case access. (Until better JVM support is in place, this padding is dependent on transient properties of JVM field layout rules.) We also take care in allocating, sizing and resizing the array. Non-shared queue arrays are initialized by workers before use. Others are allocated on first use.