Comment by Tvisitor

A few observations:

some (if not all?) of the benchmark calculations are done in single precission, ideally that should be changed to double
to calculate flops I would not count + and - operations but only * and /

so for mat*mat operation we would get n*n*n and not 2*n*n*n as used in the source code of the benchmark

double nb_op_base( void ){
return 2.0*_size*_size*_size;
}

ATLAS outperforms eigen on level 3 BLAS: on my system (Core i5 750) I get for mat*mat operation (n*n*n flops, double prec)

n=2000: ATLAS 5250 mflops, eigen 4840 mflops, non-optimised BLAS 860 mflops

eigen is extremely slow for solving equation systems using LU (2/3*n*n*n flops, double prec)

n=2000: ATLAS threaded 12000 mflops, ATLAS 8870 mflops, eigen 840 mflops, non-optimised BLAS 1960 mflops
n=5000: ATLAS threaded 21100 mflops, ATLAS 9700 mflops, eigen 800 mflops

So I’m very disappointed with the LU decomposition performance but otherwise brilliant package!

Comment by Bjacob

First of all, I don’t check wiki updates very often so it’s only by chance that I found this! Please use the mailing list or forum.

Are you talking about Eigen 2 or 3 (the development branch)? This benchmark refers to an old state of the development branch, so it’s somewhere halfway between Eigen 2 and 3.

To answer your points:

Lots of people use single precision so I don’t know that it should be changed to double. Ideally we’d benchmark all types but that would take long.
About the flop count, + and - are not any cheaper than * on modern CPUs. / is more expensive but there are much fewer / operations so almost every algorithm spends most of its time in + and *.
about ATLAS outperforming Eigen on level 3 blas, please try the development branch (which this benchmark is about). These days, both ATLAS and Eigen are very fast i.e. very close to the fastest libs (MKL, Goto)
About LU performance, are you talking about full-pivoting or partial-pivoting LU ? Don’t compare apples and oranges. If you want partial-pivoting LU (it’s much faster but less general/reliable) you have it in the development branch.

Please take this discussion to the mailing list if you want to continue it.

Comment by Tvisitor

Thanks, you are right, I’ve used dgesv() which does partial pivoting and version 2 of eigen and A.lu().solve which probably did full pivoting so my mflops calculation for eigen was indeed incorrect. With the current development version and A.lu().solve which now does partial pivoting if I understand correctly, I get very fast results indeed:

n=2000: eigen 7700 mflops
n=5000: eigen 8700 mflops

Great work!

And sorry, you’re right as far as FLOP counts are concerned as well, my fault.

Comment by Bjacob

Indeed, lu() does full-pivoting in Eigen2 while it does partial-pivoting in Eigen3. In Eigen3 we also have more explicitly named variants fullPivLu() and partialPivLu().