Eigen2 benchmark Intel

Out of curiosity, I have performed BTL tests with Eigen2 compiled with 4 different compilers on Intel Pentium D CPU:

  • GCC 4.3.3: -O3 -march=native -DNDEBUG
  • GCC 4.1.3: -O3 -march=nocona -msse2 -msse3 -DNDEBUG
  • GCC 4.4.0: -O3 -march=native -DNDEBUG
  • Intel(R) C++ 11.0: -O3 -DNDEBUG -no-ipo -xHOST -ip -static -no-prec-div

Although from on my experience the -ipo option (interprocedural optimization) provides good performance benefits, it was explicitly disabled for Intel, because it failed to work (numerically).


Rookie conclusions:

  1. The benefit of using newer GCC versions is pretty clear.
  2. In most cases gcc 4.4 is comparable with gcc 4.3, but in some it’s almost 2 times faster. (For my experience gcc 4.2 performs as well as 4.4, and gcc 4.3 is known to miss an optimization in some matrix-scalar products: the copy of the scalar to a four scalar register is not removed out of the inner loop)
  3. Except (anomalous) LU decomposition, gcc 4.1 is nowhere near newer versions of gcc: this is in part because Eigen automatically disable vectorization for gcc < 4.2, but the difference is still huge without that as soon as complex expressions are involved.
  4. Intel C++ does not provide any performance benefits here. This is somewhat surprising as I was expecting at least some advantage on this CPU. That could be due to disabled IPO, though. However, speaking from experience I had with Intel Fortran, -ipo would give about 10-15% speedup. But this can be totally unrelated to C++.

Axpy_compare_intel.png

Axpby_compare_intel.png

Atv_compare_intel.png

Matrix_vector_compare_intel.png

Matrix_matrix_compare_intel.png

Symv_compare_intel.png

Syr2_compare_intel.png

Aat_compare_intel.png

Ata_compare_intel.png

Trisolve_compare_intel.png

Cholesky_compare_intel.png

Hessenberg_compare_intel.png

Tridiagonalization_compare_intel.png

Lu_decomp_compare_intel.png