![]() |
Eigen
3.4.90 (git rev 9589cc4e7fd8e4538bedef80dd36c7738977a8be)
|
Classes | |
class | gemm |
class | transB |
class | trsm |
This namespace contains various classes used to generate compile-time unrolls which are used throughout the trsm/gemm kernels. The unrolls are characterized as for-loops (1-D), nested for-loops (2-D), or triple nested for-loops (3-D). Unrolls are generated using template recursion
Example, the 2-D for-loop is unrolled recursively by first flattening to a 1-D loop.
for(startI = 0; startI < endI; startI++) for(startC = 0; startC < endI*endJ; startC++) for(startJ = 0; startJ < endJ; startJ++) ----> startI = (startC)/(endJ) func(startI,startJ) startJ = (startC)%(endJ) func(...)
The 1-D loop can be unrolled recursively by using enable_if and defining an auxiliary function with a template parameter used as a counter.
template <endI, endJ, counter> std::enable_if_t<(counter <= 0)> <-— tail case. aux_func {}
template <endI, endJ, counter> std::enable_if_t<(counter > 0)> <-— actual for-loop aux_func { startC = endI*endJ - counter startI = (startC)/(endJ) startJ = (startC)%(endJ) func(startI, startJ) aux_func<endI, endJ, counter-1>() }
Note: Additional wrapper functions are provided for aux_func which hides the counter template parameter since counter usually depends on endI, endJ, etc...
Conventions: 1) endX: specifies the terminal value for the for-loop, (ex: for(startX = 0; startX < endX; startX++))
2) rem, remM, remK template parameters are used for deciding whether to use masked operations for handling remaining tails (when sizes are not multiples of PacketSize or EIGEN_AVX_MAX_NUM_ROW)