Basic Linear Algebra Subroutines (BLAS-3)  axe building blocks to solve a lot of numerical problems (Cholesky factorization, Gram-Schmidt ortonormalization, LU decomposition,...). Their efficient implementation on a given parallel machine is a key issue for the maximal exploitation of the system’s computational power. In this work we refer to a massively parallel processing SIMD machine (the APEIOO/Quadrics ) and to the adoption of the hyper-systolic method [3, 6,4] to efficiently implement BLAS-3 on such a machine. The results we achieved (nearly 60-70% of the peak performances for large matrices) demonstrate the validity of the proposed approach. The work is structured as follows: section 1 is devoted to review BLAS-3, in section 2 we recall the hyper-systolic method, subsequently (section 3), the target machine is described and (section 4) the HS implementation is shown. Finally (section 5), some experimental results are given. © Springer-Verlag Berlin Heidelberg 1998.
|Titolo:||Hyper-systolic implementation of BLAS-3 routines on the APE100/quadrics machine|
|Data di pubblicazione:||1998|
|Appare nelle tipologie:||4.1 Contributo in Atti di convegno|