Comment by Tvisitor

A few observations:

so for mat*mat operation we would get n*n*n and not 2*n*n*n as used in the source code of the benchmark

 double nb_op_base( void ){
   return 2.0*_size*_size*_size;
 }

 n=2000: ATLAS 5250 mflops, eigen 4840 mflops, non-optimised BLAS 860 mflops

 n=2000: ATLAS threaded 12000 mflops, ATLAS 8870 mflops, eigen 840 mflops, non-optimised BLAS 1960 mflops
 n=5000: ATLAS threaded 21100 mflops, ATLAS 9700 mflops, eigen 800 mflops

So I’m very disappointed with the LU decomposition performance but otherwise brilliant package!

Comment by Bjacob

First of all, I don’t check wiki updates very often so it’s only by chance that I found this! Please use the mailing list or forum.

Are you talking about Eigen 2 or 3 (the development branch)? This benchmark refers to an old state of the development branch, so it’s somewhere halfway between Eigen 2 and 3.

To answer your points:

Please take this discussion to the mailing list if you want to continue it.

Comment by Tvisitor

Thanks, you are right, I’ve used dgesv() which does partial pivoting and version 2 of eigen and A.lu().solve which probably did full pivoting so my mflops calculation for eigen was indeed incorrect. With the current development version and A.lu().solve which now does partial pivoting if I understand correctly, I get very fast results indeed:

n=2000: eigen 7700 mflops
n=5000: eigen 8700 mflops

Great work!

And sorry, you’re right as far as FLOP counts are concerned as well, my fault.

Comment by Bjacob

Indeed, lu() does full-pivoting in Eigen2 while it does partial-pivoting in Eigen3. In Eigen3 we also have more explicitly named variants fullPivLu() and partialPivLu().