Overview
SGEMM and DGEMM compute, in single and double precision, respectively:
C := alpha*op( A )*op( B ) + beta*C
- A is M-by-N, B is N-by-K and C is M-by-K
- alpha and beta are scalars
- where:
In the following tests, we use:
op( A ) = A op( B ) = B
All cases were run on a single processor on one of the Hoffman2 Cluster compute nodes. The code is single-threaded and statically linked.
MFLOPS is calculated as:
MFLOPS = (1×10**-6) * 2 * N**2 / (CPU seconds)
Results
Versions of BLAS compared: BLAS library from the Netlib Repository, ATLAS library, Intel-MKL library, AMD ACML Library and Goto BLAS.