Cache-friendly implementation of FieldMatrix using a flat arrays to store square blocks of the matrix.
This implementation is specially designed to be cache-friendly. Square blocks are stored as small arrays and allow efficient traversal of data both in row major direction and columns major direction, one block at a time. This greatly increases performances for algorithms that use crossed directions loops like multiplication or transposition.
The size of square blocks is a static parameter. It may be tuned according to the cache size of the target computer processor. As a rule of thumbs, it should be the largest value that allows three blocks to be simultaneously cached (this is necessary for example for matrix multiplication). The default value is to use 36x36 blocks.
The regular blocks represent {@link #BLOCK_SIZE} x {@link #BLOCK_SIZE} squares. Blocksat right hand side and bottom side which may be smaller to fit matrix dimensions. The square blocks are flattened in row major order in single dimension arrays which are therefore {@link #BLOCK_SIZE}2 elements long for regular blocks. The blocks are themselves organized in row major order.
As an example, for a block size of 36x36, a 100x60 matrix would be stored in 6 blocks. Block 0 would be a Field[1296] array holding the upper left 36x36 square, block 1 would be a Field[1296] array holding the upper center 36x36 square, block 2 would be a Field[1008] array holding the upper right 36x28 rectangle, block 3 would be a Field[864] array holding the lower left 24x36 rectangle, block 4 would be a Field[864] array holding the lower center 24x36 rectangle and block 5 would be a Field[672] array holding the lower right 24x28 rectangle.
The layout complexity overhead versus simple mapping of matrices to java arrays is negligible for small matrices (about 1%). The gain from cache efficiency leads to up to 3-fold improvements for matrices of moderate to large size.
@param < T> the type of the field elements
@since 2.0