Eigen2 benchmark Intel

Out of curiosity, I have performed BTL tests with Eigen2 compiled with 4 different compilers on Intel Pentium D CPU:

GCC 4.3.3: -O3 -march=native -DNDEBUG
GCC 4.1.3: -O3 -march=nocona -msse2 -msse3 -DNDEBUG
GCC 4.4.0: -O3 -march=native -DNDEBUG
Intel(R) C++ 11.0: -O3 -DNDEBUG -no-ipo -xHOST -ip -static -no-prec-div

Although from on my experience the -ipo option (interprocedural optimization) provides good performance benefits, it was explicitly disabled for Intel, because it failed to work (numerically).

Rookie conclusions:

The benefit of using newer GCC versions is pretty clear.
In most cases gcc 4.4 is comparable with gcc 4.3, but in some it’s almost 2 times faster. (For my experience gcc 4.2 performs as well as 4.4, and gcc 4.3 is known to miss an optimization in some matrix-scalar products: the copy of the scalar to a four scalar register is not removed out of the inner loop)
Except (anomalous) LU decomposition, gcc 4.1 is nowhere near newer versions of gcc: this is in part because Eigen automatically disable vectorization for gcc < 4.2, but the difference is still huge without that as soon as complex expressions are involved.
Intel C++ does not provide any performance benefits here. This is somewhat surprising as I was expecting at least some advantage on this CPU. That could be due to disabled IPO, though. However, speaking from experience I had with Intel Fortran, -ipo would give about 10-15% speedup. But this can be totally unrelated to C++.